Skip to main content

Showing 1–50 of 504 results for author: Zhang, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.13197  [pdf, other

    stat.ME

    Representation Transfer Learning for Semiparametric Regression

    Authors: Baihua He, Huihang Liu, Xinyu Zhang, Jian Huang

    Abstract: We propose a transfer learning method that utilizes data representations in a semiparametric regression model. Our aim is to perform statistical inference on the parameter of primary interest in the target model while accounting for potential nonlinear effects of confounding variables. We leverage knowledge from source domains, assuming that the sample size of the source data is substantially larg… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 42 pages, 11 figures, 5 tables

    MSC Class: 62F99

  2. arXiv:2406.13060  [pdf, other

    cs.LG cs.AI stat.AP

    Scale-Translation Equivariant Network for Oceanic Internal Solitary Wave Localization

    Authors: Zhang Wan, Shuo Wang, Xudong Zhang

    Abstract: Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. They hold significant importance due to their capacity to carry substantial energy, thus influence pollutant transport, oil platform operations, submarine navigation, etc. Researchers have studied ISWs through optical images, synthetic aperture radar (SAR) images, and altimeter d… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages, 5 figures

  3. arXiv:2406.11043  [pdf, other

    stat.AP stat.ME

    Statistical Considerations for Evaluating Treatment Effect under Various Non-proportional Hazard Scenarios

    Authors: Xinyu Zhang, Erich J. Greene, Ondrej Blaha, Wei Wei

    Abstract: We conducted a systematic comparison of statistical methods used for the analysis of time-to-event outcomes under various proportional and nonproportional hazard (NPH) scenarios. Our study used data from recently published oncology trials to compare the Log-rank test, still by far the most widely used option, against some available alternatives, including the MaxCombo test, the Restricted Mean Sur… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2406.09665  [pdf, other

    math.ST math.OC math.PR stat.ML

    New algorithms for sampling and diffusion models

    Authors: Xicheng Zhang

    Abstract: Drawing from the theory of stochastic differential equations, we introduce a novel sampling method for known distributions and a new algorithm for diffusion generative models with unknown distributions. Our approach is inspired by the concept of the reverse diffusion process, widely adopted in diffusion generative models. Additionally, we derive the explicit convergence rate based on the smooth OD… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 24pages

    MSC Class: 60H10

  5. arXiv:2406.08180  [pdf, other

    stat.CO stat.ME

    Stochastic Process-based Method for Degree-Degree Correlation of Evolving Networks

    Authors: Yue Xiao, Xiaojun Zhang

    Abstract: Existing studies on the degree correlation of evolving networks typically rely on differential equations and statistical analysis, resulting in only approximate solutions due to inherent randomness. To address this limitation, we propose an improved Markov chain method for modeling degree correlation in evolving networks. By redesigning the network evolution rules to reflect actual network dynamic… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2405.19649  [pdf, ps, other

    cs.LG cs.SI stat.ML

    Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological Perspective

    Authors: Xingyi Zhang, Zixuan Weng, Sibo Wang

    Abstract: Node embedding learns low-dimensional vectors for nodes in the graph. Recent state-of-the-art embedding approaches take Personalized PageRank (PPR) as the proximity measure and factorize the PPR matrix or its adaptation to generate embeddings. However, little previous work analyzes what information is encoded by these approaches, and how the information correlates with their superb performance in… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  7. arXiv:2405.16577  [pdf, other

    stat.ML cs.LG

    Reflected Flow Matching

    Authors: Tianyu Xie, Yu Zhu, Longlin Yu, Tong Yang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, Cheng Zhang

    Abstract: Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural sampl… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: ICML 2024 camera-ready

  8. arXiv:2405.16413  [pdf, other

    cs.AI cs.CL cs.LG stat.AP

    Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

    Authors: Jiankun Wang, Sumyeong Ahn, Taykhoom Dalal, Xiaodan Zhang, Weishen Pan, Qiannan Zhang, Bin Chen, Hiroko H. Dodge, Fei Wang, Jiayu Zhou

    Abstract: Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for develo** ADRD screening tools such as machine learning bas… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  9. arXiv:2405.14892  [pdf, other

    cs.DC stat.CO

    Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications

    Authors: Xiran Zhang, Sameh Abdulah, Jian Cao, Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes

    Abstract: Addressing the statistical challenge of computing the multivariate normal (MVN) probability in high dimensions holds significant potential for enhancing various applications. One common way to compute high-dimensional MVN probabilities is the Separation-of-Variables (SOV) algorithm. This algorithm is known for its high computational complexity of O(n^3) and space complexity of O(n^2), mainly due t… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  10. arXiv:2405.11431  [pdf, other

    cs.LG q-fin.ST stat.ML

    Review of deep learning models for crypto price prediction: implementation and evaluation

    Authors: **gyang Wu, Xinyi Zhang, Fangyixuan Huang, Haochen Zhou, Rohtiash Chandra

    Abstract: There has been much interest in accurate cryptocurrency price forecast models by investors and researchers. Deep Learning models are prominent machine learning techniques that have transformed various fields and have shown potential for finance and economics. Although various deep learning models have been explored for cryptocurrency price forecasting, it is not clear which models are suitable due… ▽ More

    Submitted 2 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  11. arXiv:2405.10991  [pdf, other

    cs.LG cs.AI stat.ME

    Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection

    Authors: Jiarui Zhang, Shaojuan Wu, Xiaowang Zhang, Zhiyong Feng

    Abstract: Stance detection classifies stance relations (namely, Favor, Against, or Neither) between comments and targets. Pretrained language models (PLMs) are widely used to mine the stance relation to improve the performance of stance detection through pretrained knowledge. However, PLMs also embed ``bad'' pretrained knowledge concerning stance into the extracted stance relation semantics, resulting in pr… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  12. arXiv:2405.08699  [pdf

    stat.ML cs.LG

    Weakly-supervised causal discovery based on fuzzy knowledge and complex data complementarity

    Authors: Wenrui Li, Wei Zhang, Qinghao Zhang, Xuegong Zhang, Xiaowo Wang

    Abstract: Causal discovery based on observational data is important for deciphering the causal mechanism behind complex systems. However, the effectiveness of existing causal discovery methods is limited due to inferior prior knowledge, domain inconsistencies, and the challenges of high-dimensional datasets with small sample sizes. To address this gap, we propose a novel weakly-supervised fuzzy knowledge an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  13. arXiv:2405.06613  [pdf, other

    stat.ME

    Simultaneously detecting spatiotemporal changes with penalized Poisson regression models

    Authors: Zerui Zhang, Xin Wang, Xin Zhang, **g Zhang

    Abstract: In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. This study aims to address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data. We introduce an innovative method based on the Poisson regression model, employing doubly fused penalization to unveil the u… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  14. arXiv:2404.19118  [pdf, other

    stat.ME

    Identification and estimation of causal effects using non-concurrent controls in platform trials

    Authors: Michele Santacatterina, Federico Macchiavelli Giron, Xinyi Zhang, Ivan Diaz

    Abstract: Platform trials are multi-arm designs that simultaneously evaluate multiple treatments for a single disease within the same overall trial structure. Unlike traditional randomized controlled trials, they allow treatment arms to enter and exit the trial at distinct times while maintaining a control arm throughout. This control arm comprises both concurrent controls, where participants are randomized… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    MSC Class: 62P10

  15. arXiv:2404.11579  [pdf, other

    stat.ME

    Spatial Heterogeneous Additive Partial Linear Model: A Joint Approach of Bivariate Spline and Forest Lasso

    Authors: Xin Zhang, Shan Yu, Zhengyuan Zhu, Xin Wang

    Abstract: Identifying spatial heterogeneous patterns has attracted a surge of research interest in recent years, due to its important applications in various scientific and engineering fields. In practice the spatially heterogeneous components are often mixed with components which are spatially smooth, making the task of identifying the heterogeneous regions more challenging. In this paper, we develop an ef… ▽ More

    Submitted 3 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  16. arXiv:2404.05976  [pdf, other

    cs.LG eess.SY stat.ME

    A Cyber Manufacturing IoT System for Adaptive Machine Learning Model Deployment by Interactive Causality Enabled Self-Labeling

    Authors: Yutian Ren, Yuqi He, Xuyin Zhang, Aaron Yen, G. P. Li

    Abstract: Machine Learning (ML) has been demonstrated to improve productivity in many manufacturing applications. To host these ML applications, several software and Industrial Internet of Things (IIoT) systems have been proposed for manufacturing applications to deploy ML applications and provide real-time intelligence. Recently, an interactive causality enabled self-labeling method has been proposed to ad… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  17. arXiv:2404.05933  [pdf, other

    stat.ME stat.CO

    fastcpd: Fast Change Point Detection in R

    Authors: Xingchi Li, Xianyang Zhang

    Abstract: Change point analysis is concerned with detecting and locating structure breaks in the underlying model of a sequence of observations ordered by time, space or other variables. A widely adopted approach for change point analysis is to minimize an objective function with a penalty term on the number of change points. This framework includes several well-established procedures, such as the penalized… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 53 pages, 16 figures

  18. arXiv:2404.05808  [pdf, other

    stat.ME

    Replicability analysis of high dimensional data accounting for dependence

    Authors: Pengfei Lyu, Xianyang Zhang, Hongyuan Cao

    Abstract: Replicability is the cornerstone of scientific research. We study the replicability of data from high-throughput experiments, where tens of thousands of features are examined simultaneously. Existing replicability analysis methods either ignore the dependence among features or impose strong modelling assumptions, producing overly conservative or overly liberal results. Based on $p$-values from two… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2403.18540  [pdf, other

    stat.ML cs.LG stat.CO

    skscope: Fast Sparsity-Constrained Optimization in Python

    Authors: Zezhi Wang, ** Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Yu Zheng, Junxian Zhu, Xueqin Wang

    Abstract: Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 4 pages

  20. arXiv:2403.13260  [pdf, other

    stat.ME

    A Bayesian Approach for Selecting Relevant External Data (BASE): Application to a study of Long-Term Outcomes in a Hemophilia Gene Therapy Trial

    Authors: Tianyu Pan, Xiang Zhang, Weining Shen, Ting Ye

    Abstract: Gene therapies aim to address the root causes of diseases, particularly those stemming from rare genetic defects that can be life-threatening or severely debilitating. While there has been notable progress in the development of gene therapies in recent years, understanding their long-term effectiveness remains challenging due to a lack of data on long-term outcomes, especially during the early sta… ▽ More

    Submitted 9 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  21. arXiv:2403.13081  [pdf, other

    stat.AP math.PR q-bio.PE

    Parameter Estimation from Single Patient, Single Time-Point Sequencing Data of Recurrent Tumors

    Authors: Kevin Leder, Ru** Sun, Zicheng Wang, Xuanming Zhang

    Abstract: In this study, we develop consistent estimators for key parameters that govern the dynamics of tumor cell populations when subjected to pharmacological treatments. While these treatments often lead to an initial reduction in the abundance of drug-sensitive cells, a population of drug-resistant cells frequently emerges over time, resulting in cancer recurrence. Samples from recurrent tumors present… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  22. arXiv:2403.07431  [pdf, other

    stat.ML cs.LG

    Knowledge Transfer across Multiple Principal Component Analysis Studies

    Authors: Zeyu Li, Kangxiang Qin, Yong He, Wang Zhou, Xinsheng Zhang

    Abstract: Transfer learning has aroused great interest in the statistical community. In this article, we focus on knowledge transfer for unsupervised learning tasks in contrast to the supervised learning tasks in the literature. Given the transferable source populations, we propose a two-step transfer learning algorithm to extract useful information from multiple source principal component analysis (PCA) st… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  23. arXiv:2403.06783  [pdf

    stat.ME

    A doubly robust estimator for the Mann Whitney Wilcoxon Rank Sum Test when applied for causal inference in observational studies

    Authors: Ruohui Chen, Tuo Lin, Lin Liu, **yuan Liu, Ruifeng Chen, **g**g Zou, Chenyu Liu, Loki Natarajan, Tang Wang, Xinlian Zhang, Xin Tu

    Abstract: The Mann-Whitney-Wilcoxon rank sum test (MWWRST) is a widely used method for comparing two treatment groups in randomized control trials, particularly when dealing with highly skewed data. However, when applied to observational study data, the MWWRST often yields invalid results for causal inference. To address this limitation, Wu et al. (2014) introduced an approach that incorporates inverse prob… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  24. arXiv:2403.05647  [pdf, other

    stat.ME stat.CO

    Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix

    Authors: Xuekui Zhang, Li Xing, **g Zhang, Soojeong Kim

    Abstract: In the big data era, the need to reevaluate traditional statistical methods is paramount due to the challenges posed by vast datasets. While larger samples theoretically enhance accuracy and hypothesis testing power without increasing false positives, practical concerns about inflated Type-I errors persist. The prevalent belief is that larger samples can uncover subtle effects, necessitating dual… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  25. arXiv:2403.04873  [pdf, other

    stat.AP

    The SIDO Performance Model for League of Legends

    Authors: Amy X. Zhang, Parth Naidu

    Abstract: League of Legends (LoL) has been a dominant esport for a decade, yet the inherent complexity of the game has stymied the creation of analytical measures of player skill and performance. Current industry standards are limited to easy-to-procure individual player statistics that are incomplete and lacking context as they do not take into account teamplay or game state. We present a unified performan… ▽ More

    Submitted 6 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  26. arXiv:2403.01717  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Soft-constrained Schrodinger Bridge: a Stochastic Control Approach

    Authors: Jhanvi Garg, Xianyang Zhang, Quan Zhou

    Abstract: Schrödinger bridge can be viewed as a continuous-time stochastic control problem where the goal is to find an optimally controlled diffusion process whose terminal distribution coincides with a pre-specified target distribution. We propose to generalize this problem by allowing the terminal distribution to differ from the target but penalizing the Kullback-Leibler divergence between the two distri… ▽ More

    Submitted 22 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Made minor changes about the references. 38 pages, 7 figures. Accepted by AISTATS 2024

    MSC Class: 60J60; 60J70; 93E20

  27. arXiv:2402.13933  [pdf, other

    stat.ME

    Powerful Large-scale Inference in High Dimensional Mediation Analysis

    Authors: Asmita Roy, Xianyang Zhang

    Abstract: In genome-wide epigenetic studies, exposures (e.g., Single Nucleotide Polymorphisms) affect outcomes (e.g., gene expression) through intermediate variables such as DNA methylation. Mediation analysis offers a way to study these intermediate variables and identify the presence or absence of causal mediation effects. Testing for mediation effects lead to a composite null hypothesis. Existing methods… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  28. arXiv:2401.17473  [pdf, other

    stat.ME math.ST

    Adaptive Matrix Change Point Detection: Leveraging Structured Mean Shifts

    Authors: Xinyu Zhang, Kung-Sik Chan

    Abstract: In high-dimensional time series, the component processes are often assembled into a matrix to display their interrelationship. We focus on detecting mean shifts with unknown change point locations in these matrix time series. Series that are activated by a change may cluster along certain rows (columns), which forms mode-specific change point alignment. Leveraging mode-specific change point alignm… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  29. arXiv:2401.16410  [pdf, other

    stat.ML cs.LG

    ReTaSA: A Nonparametric Functional Estimation Approach for Addressing Continuous Target Shift

    Authors: Hwanwoo Kim, Xin Zhang, Jiwei Zhao, Qinglong Tian

    Abstract: The presence of distribution shifts poses a significant challenge for deploying modern machine learning models in real-world applications. This work focuses on the target shift problem in a regression setting (Zhang et al., 2013; Nguyen et al., 2016). More specifically, the target variable y (also known as the response variable), which is continuous, has different marginal distributions in the tra… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024

  30. arXiv:2401.14655  [pdf, other

    stat.ME

    Distributionally Robust Optimization and Robust Statistics

    Authors: Jose Blanchet, Jia** Li, Sirui Lin, Xuhui Zhang

    Abstract: We review distributionally robust optimization (DRO), a principled approach for constructing statistical estimators that hedge against the impact of deviations in the expected loss between the training and deployment environments. Many well-known estimators in statistics and machine learning (e.g. AdaBoost, LASSO, ridge regression, dropout training, etc.) are distributionally robust in a precise s… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  31. arXiv:2401.14343  [pdf, other

    cs.LG cs.CY stat.ML

    Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

    Authors: Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

    Abstract: Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting,… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures

  32. arXiv:2401.08173  [pdf, other

    stat.ME

    Simultaneous Change Point Detection and Identification for High Dimensional Linear Models

    Authors: Bin Liu, Xinsheng Zhang, Yufeng Liu

    Abstract: In this article, we consider change point inference for high dimensional linear models. For change point detection, given any subgroup of variables, we propose a new method for testing the homogeneity of corresponding regression coefficients across the observations. Under some regularity conditions, the proposed new testing procedure controls the type I error asymptotically and is powerful against… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  33. arXiv:2401.03987  [pdf, other

    stat.AP

    Navigating the Congestion Maze: Geospatial Analysis and Travel Behavior Insights for Dockless Bike-Sharing Systems in Xiamen

    Authors: Xuxilu Zhang, Lingqi Gu, Nan Zhao

    Abstract: Shared bicycles have emerged as a transformative force in urban transportation, effectively addressing the perennial 'last mile' challenge faced by commuters. The limitations of station-based bike-sharing systems, constrained by point-to-point travel, have spurred the popularity of the dockless model, offering flexible rentals and eliminating docking infrastructure constraints. However, the rapid… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: 17 pages, 8 figures

  34. arXiv:2401.01294  [pdf, other

    stat.ML cs.LG stat.ME

    Efficient Sparse Least Absolute Deviation Regression with Differential Privacy

    Authors: Weidong Liu, Xiaojun Mao, Xiaofei Zhang, Xin Zhang

    Abstract: In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss).… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: IEEE Transactions on Information Forensics and Security, 2024

    MSC Class: 62J07

  35. arXiv:2312.15566  [pdf, other

    stat.ML cs.LG

    Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

    Authors: Weijia Zhang, Chun Kai Ling, Xuanhui Zhang

    Abstract: Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since onl… ▽ More

    Submitted 27 December, 2023; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: To appear in AAAI 2024

  36. arXiv:2312.15373  [pdf, other

    eess.SY stat.ME

    A Multi-day Needs-based Modeling Approach for Activity and Travel Demand Analysis

    Authors: Kexin Chen, **** Guan, Ravi Seshadri, Varun Pattabhiraman, Youssef Medhat Aboutaleb, Ali Shamshiripour, Chen Liang, Xiaochun Zhang, Moshe Ben-Akiva

    Abstract: This paper proposes a multi-day needs-based model for activity and travel demand analysis. The model captures the multi-day dynamics in activity generation, which enables the modeling of activities with increased flexibility in time and space (e.g., e-commerce and remote working). As an enhancement to activity-based models, the proposed model captures the underlying decision-making process of acti… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 38 pages, 11 figures

  37. arXiv:2312.09862  [pdf, other

    math.ST stat.ME

    Wasserstein-based Minimax Estimation of Dependence in Multivariate Regularly Varying Extremes

    Authors: Xuhui Zhang, Jose Blanchet, Youssef Marzouk, Viet Anh Nguyen, Sven Wang

    Abstract: We study minimax risk bounds for estimators of the spectral measure in multivariate linear factor models, where observations are linear combinations of regularly varying latent factors. Non-asymptotic convergence rates are derived for the multivariate Peak-over-Threshold estimator in terms of the $p$-th order Wasserstein distance, and information-theoretic lower bounds for the minimax risks are es… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  38. arXiv:2312.04026  [pdf, other

    stat.ME

    Independent-Set Design of Experiments for Estimating Treatment and Spillover Effects under Network Interference

    Authors: Chencheng Cai, Xu Zhang, Edoardo M. Airoldi

    Abstract: Interference is ubiquitous when conducting causal experiments over networks. Except for certain network structures, causal inference on the network in the presence of interference is difficult due to the entanglement between the treatment assignments and the interference levels. In this article, we conduct causal inference under interference on an observed, sparse but connected network, and we pro… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  39. arXiv:2312.02905  [pdf, other

    stat.ME math.ST

    E-values, Multiple Testing and Beyond

    Authors: Guanxun Li, Xianyang Zhang

    Abstract: We discover a connection between the Benjamini-Hochberg (BH) procedure and the recently proposed e-BH procedure [Wang and Ramdas, 2022] with a suitably defined set of e-values. This insight extends to a generalized version of the BH procedure and the model-free multiple testing procedure in Barber and Candès [2015] (BC) with a general form of rejection rules. The connection provides an effective w… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  40. arXiv:2312.01386  [pdf, ps, other

    cs.LG stat.ML

    Regret Optimality of GP-UCB

    Authors: Wenjia Wang, Xiaowei Zhang, Lu Zou

    Abstract: Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesi… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: 23 pages

  41. arXiv:2311.17797  [pdf, other

    cs.LG stat.ME

    Learning to Simulate: Generative Metamodeling via Quantile Regression

    Authors: L. Jeff Hong, Yanxi Hou, Qingkai Zhang, Xiaowei Zhang

    Abstract: Stochastic simulation models, while effective in capturing the dynamics of complex systems, are often too slow to run for real-time decision-making. Metamodeling techniques are widely used to learn the relationship between a summary statistic of the outputs (e.g., the mean or quantile) and the inputs of the simulator, so that it can be used in real time. However, this methodology requires the know… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Main body: 36 pages, 7 figures; supplemental material: 12 pages

  42. arXiv:2311.17303  [pdf, other

    cs.LG cs.AI stat.ME

    Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge

    Authors: Xiaoge Zhang, Xiao-Lin Wang, Fenglei Fan, Yiu-Ming Cheung, Indranil Bose

    Abstract: In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while… ▽ More

    Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  43. arXiv:2311.13958  [pdf, other

    stat.ML cs.CV cs.LG

    Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework

    Authors: **g**g Zheng, Wanglong Lu, Wenzhe Wang, Yankai Cao, Xiaoqin Zhang, Xianta Jiang

    Abstract: Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-b… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  44. arXiv:2311.08504  [pdf, ps, other

    stat.ML cs.LG

    On semi-supervised estimation using exponential tilt mixture models

    Authors: Ye Tian, Xinwei Zhang, Zhiqiang Tan

    Abstract: Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only the predictors. Logistic regression is equivalent to an exponential tilt model in the labeled population. For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models and maximum nonpar… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  45. arXiv:2311.07876  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

    Authors: Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li

    Abstract: In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability kernel admits a low-rank matrix decomposition \citep{REPUCB22}, and the loss functions may change adversarially but are revealed to the learner at the end of each episode. We propose a policy optimization-based algorithm POLO, and we… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  46. arXiv:2311.06968  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Physics-Informed Data Denoising for Real-Life Sensing Systems

    Authors: Xiyuan Zhang, Xiaohan Fu, Diyan Teng, Chengyu Dong, Keerthivasan Vijayakumar, Jiayun Zhang, Ranak Roy Chowdhury, Junsheng Han, Dezhi Hong, Rashmi Kulkarni, **gbo Shang, Rajesh Gupta

    Abstract: Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically r… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: SenSys 2023

  47. arXiv:2311.03289  [pdf, other

    stat.ME

    Batch effect correction with sample remeasurement in highly confounded case-control studies

    Authors: Hanxuan Ye, Xianyang Zhang, Chen Wang, Ellen L. Goode, Jun Chen

    Abstract: Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch effect correction with remeasured samples are severely under-developed. In this study, we developed a framework for batch e… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 45 pages

  48. arXiv:2310.17153  [pdf, other

    cs.LG stat.ME

    Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration

    Authors: Longlin Yu, Tianyu Xie, Yu Zhu, Tong Yang, Xiangyu Zhang, Cheng Zhang

    Abstract: Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variationa… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 25 pages, 13 figures, NeurIPS 2023

  49. arXiv:2310.07990  [pdf

    q-bio.GN cs.IR cs.LG stat.AP

    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

    Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

    Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  50. arXiv:2310.04457  [pdf, other

    math.OC cs.LG stat.ML

    ProGO: Probabilistic Global Optimizer

    Authors: Xinyu Zhang, Sujit Ghosh

    Abstract: In the field of global optimization, many existing algorithms face challenges posed by non-convex target functions and high computational complexity or unavailability of gradient information. These limitations, exacerbated by sensitivity to initial conditions, often lead to suboptimal solutions or failed convergence. This is true even for Metaheuristic algorithms designed to amalgamate different o… ▽ More

    Submitted 12 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.