Skip to main content

Showing 1–50 of 585 results for author: wang, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.04248  [pdf, other

    stat.ML cs.LG

    Machine Learning for Complex Systems with Abnormal Pattern by Exception Maximization Outlier Detection Method

    Authors: Zhikun Zhang, Yiting Duan, Xiangjun Wang, Mingyuan Zhang

    Abstract: This paper proposes a novel fast online methodology for outlier detection called the exception maximization outlier detection method(EMODM), which employs probabilistic models and statistical algorithms to detect abnormal patterns from the outputs of complex systems. The EMODM is based on a two-state Gaussian mixture model and demonstrates strong performance in probability anomaly detection workin… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I

    Authors: Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang, Michael Bendersky

    Abstract: The traditional evaluation of information retrieval (IR) systems is generally very costly as it requires manual relevance annotation from human experts. Recent advancements in generative artificial intelligence -- specifically large language models (LLMs) -- can generate relevance annotations at an enormous scale with relatively small computational costs. Potentially, this could alleviate the cost… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: KDD '24

  3. arXiv:2406.15523  [pdf, other

    cs.LG stat.ML

    Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

    Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, Xin Wang

    Abstract: To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.12017  [pdf, other

    stat.ML cs.LG stat.CO

    Sparsity-Constraint Optimization via Splicing Iteration

    Authors: Zezhi Wang, ** Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang

    Abstract: Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEratio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 34 pages

  5. arXiv:2406.10473  [pdf, other

    stat.ME

    Design-based variance estimation of the Hájek effect estimator in stratified and clustered experiments

    Authors: Xinhe Wang, Ben B. Hansen

    Abstract: Randomized controlled trials (RCTs) are used to evaluate treatment effects. When individuals are grouped together, clustered RCTs are conducted. Stratification is recommended to reduce imbalance of baseline covariates between treatment and control. In practice, this can lead to comparisons between clusters of very different sizes. As a result, direct adjustment estimators that average differences… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  6. arXiv:2406.08097  [pdf, other

    cs.LG stat.AP stat.ME

    Inductive Global and Local Manifold Approximation and Projection

    Authors: Jungeum Kim, Xiao Wang

    Abstract: Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.06426  [pdf, other

    stat.ME

    Biomarker-Guided Adaptive Enrichment Design with Threshold Detection for Clinical Trials with Time-to-Event Outcome

    Authors: Kaiyuan Hua, Hwanhee Hong, Xiaofei Wang

    Abstract: Biomarker-guided designs are increasingly used to evaluate personalized treatments based on patients' biomarker status in Phase II and III clinical trials. With adaptive enrichment, these designs can improve the efficiency of evaluating the treatment effect in biomarker-positive patients by increasing their proportion in the randomized trial. While time-to-event outcomes are often used as the prim… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  8. arXiv:2406.03849  [pdf

    cs.LG stat.AP stat.ML

    A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM

    Authors: Yongan Zhang, Junfeng Zhao, Jian Li, Xuanran Wang, Youzhuang Sun, Yuntian Chen, Dongxiao Zhang

    Abstract: The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  9. arXiv:2405.20970  [pdf, other

    stat.ML cs.LG

    PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

    Authors: Xiaoke Wang, Xiaochen Yang, Rui Zhu, **g-Hao Xue

    Abstract: Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 24 pages, 6 figures

  10. arXiv:2405.18932  [pdf, other

    stat.ML cs.LG

    A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

    Authors: Gaoxiang Zhao, Lu Wang, Xiaoqiang Wang

    Abstract: The effectiveness of anomaly signal detection can be significantly undermined by the inherent uncertainty of relying on one specified model. Under the framework of model average methods, this paper proposes a novel criterion to select the weights on aggregation of multiple models, wherein the focal loss function accounts for the classification of extremely imbalanced data. This strategy is further… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  11. arXiv:2405.15991  [pdf, other

    cs.LG cs.AI stat.ML

    Rényi Neural Processes

    Authors: Xuesong Wang, He Zhao, Edwin V. Bonilla

    Abstract: Neural Processes (NPs) are variational frameworks that aim to represent stochastic processes with deep neural networks. Despite their obvious benefits in uncertainty estimation for complex distributions via data-driven priors, NPs enforce network parameter sharing between the conditional prior and posterior distributions, thereby risking introducing a misspecified prior. We hereby propose Rényi Ne… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  12. arXiv:2405.12838  [pdf, ps, other

    quant-ph stat.CO

    Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits

    Authors: Jiachen Hu, Tongyang Li, Xinzhao Wang, Yecheng Xue, Chenyi Zhang, Han Zhong

    Abstract: We systematically investigate quantum algorithms and lower bounds for mean estimation given query access to non-identically distributed samples. On the one hand, we give quantum mean estimators with quadratic quantum speed-up given samples from different bounded or sub-Gaussian random variables. On the other hand, we prove that, in general, it is impossible for any quantum algorithm to achieve qua… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 31 pages, 0 figure. To appear in the 19th Theory of Quantum Computation, Communication and Cryptography (TQC 2024)

  13. arXiv:2405.08699  [pdf

    stat.ML cs.LG

    Weakly-supervised causal discovery based on fuzzy knowledge and complex data complementarity

    Authors: Wenrui Li, Wei Zhang, Qinghao Zhang, Xuegong Zhang, Xiaowo Wang

    Abstract: Causal discovery based on observational data is important for deciphering the causal mechanism behind complex systems. However, the effectiveness of existing causal discovery methods is limited due to inferior prior knowledge, domain inconsistencies, and the challenges of high-dimensional datasets with small sample sizes. To address this gap, we propose a novel weakly-supervised fuzzy knowledge an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  14. arXiv:2405.07138  [pdf, other

    stat.ME

    Large-dimensional Robust Factor Analysis with Group Structure

    Authors: Yong He, Xiaoyang Ma, Xingheng Wang, Yalin Wang

    Abstract: In this paper, we focus on exploiting the group structure for large-dimensional factor models, which captures the homogeneous effects of common factors on individuals within the same group. In view of the fact that datasets in macroeconomics and finance are typically heavy-tailed, we propose to identify the unknown group structure using the agglomerative hierarchical clustering algorithm and an in… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  15. arXiv:2405.06613  [pdf, other

    stat.ME

    Simultaneously detecting spatiotemporal changes with penalized Poisson regression models

    Authors: Zerui Zhang, Xin Wang, Xin Zhang, **g Zhang

    Abstract: In the realm of large-scale spatiotemporal data, abrupt changes are commonly occurring across both spatial and temporal domains. This study aims to address the concurrent challenges of detecting change points and identifying spatial clusters within spatiotemporal count data. We introduce an innovative method based on the Poisson regression model, employing doubly fused penalization to unveil the u… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  16. arXiv:2405.03734  [pdf, other

    cs.HC cs.AI stat.AP

    FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering

    Authors: Silan Hu, Xiaoning Wang

    Abstract: Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduce… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  17. arXiv:2404.19242  [pdf, other

    cs.CV eess.IV stat.ME

    A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems

    Authors: Xin Ma, Puchen Zhu, Xiao Li, Xiaoyin Zheng, Jianshu Zhou, Xuchen Wang, Kwok Wai Samuel Au

    Abstract: Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Instrumentation and Measurement

  18. arXiv:2404.18008  [pdf, other

    cs.LG stat.AP

    Implicit Generative Prior for Bayesian Neural Networks

    Authors: Yijia Liu, Xiao Wang

    Abstract: Predictive uncertainty quantification is crucial for reliable decision-making in various applied domains. Bayesian neural networks offer a powerful framework for this task. However, defining meaningful priors and ensuring computational efficiency remain significant challenges, especially for complex real-world applications. This paper addresses these challenges by proposing a novel neural adaptive… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  19. arXiv:2404.14786  [pdf, other

    cs.AI cs.LG stat.ME

    RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model

    Authors: Peiwen Li, Xin Wang, Zeyang Zhang, Yuan Meng, Fang Shen, Yue Li, Jialong Wang, Yang Li, Wenweu Zhu

    Abstract: In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  20. arXiv:2404.13557  [pdf, other

    stat.ML cs.LG

    Preconditioned Neural Posterior Estimation for Likelihood-free Inference

    Authors: Xiaoyu Wang, Ryan P. Kelly, David J. Warne, Christopher Drovandi

    Abstract: Simulation based inference (SBI) methods enable the estimation of posterior distributions when the likelihood function is intractable, but where model simulation is feasible. Popular neural approaches to SBI are the neural posterior estimator (NPE) and its sequential version (SNPE). These methods can outperform statistical SBI approaches such as approximate Bayesian computation (ABC), particularly… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 31 pages, 11 figures

  21. arXiv:2404.11579  [pdf, other

    stat.ME

    Spatial Heterogeneous Additive Partial Linear Model: A Joint Approach of Bivariate Spline and Forest Lasso

    Authors: Xin Zhang, Shan Yu, Zhengyuan Zhu, Xin Wang

    Abstract: Identifying spatial heterogeneous patterns has attracted a surge of research interest in recent years, due to its important applications in various scientific and engineering fields. In practice the spatially heterogeneous components are often mixed with components which are spatially smooth, making the task of identifying the heterogeneous regions more challenging. In this paper, we develop an ef… ▽ More

    Submitted 3 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  22. arXiv:2404.08667  [pdf, other

    eess.SY stat.AP

    Traffic State Estimation and Uncertainty Quantification at Signalized Intersections with Low Penetration Rate Vehicle Trajectory Data

    Authors: Xingmin Wang, Zihao Wang, Zachary Jerome, Henry X. Liu

    Abstract: This paper studies the traffic state estimation problem at signalized intersections with low penetration rate vehicle trajectory data. While many existing studies have proposed different methods to estimate unknown traffic states and parameters (e.g., penetration rate, queue length) with this data, most of them only provide a point estimation without knowing the uncertainty of these estimated valu… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  23. arXiv:2404.08128  [pdf, other

    stat.ME

    Inference of treatment effect and its regional modifiers using restricted mean survival time in multi-regional clinical trials

    Authors: Kaiyuan Hua, Hwanhee Hong, Xiaofei Wang

    Abstract: Multi-regional clinical trials (MRCTs) play an increasingly crucial role in global pharmaceutical development by expediting data gathering and regulatory approval across diverse patient populations. However, differences in recruitment practices and regional demographics often lead to variations in study participant characteristics, potentially biasing treatment effect estimates and undermining tre… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  24. arXiv:2404.01191  [pdf, other

    stat.ME

    A Semiparametric Approach for Robust and Efficient Learning with Biobank Data

    Authors: Molei Liu, Xinyi Wang, Chuan Hong

    Abstract: With the increasing availability of electronic health records (EHR) linked with biobank data for translational research, a critical step in realizing its potential is to accurately classify phenotypes for patients. Existing approaches to achieve this goal are based on error-prone EHR surrogate outcomes, assisted and validated by a small set of labels obtained via medical chart review, which may al… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  25. arXiv:2403.18540  [pdf, other

    stat.ML cs.LG stat.CO

    skscope: Fast Sparsity-Constrained Optimization in Python

    Authors: Zezhi Wang, ** Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Yu Zheng, Junxian Zhu, Xueqin Wang

    Abstract: Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 4 pages

  26. Green's matching: an efficient approach to parameter estimation in complex dynamic systems

    Authors: Jianbin Tan, Guoyu Zhang, Xueqin Wang, Hui Huang, Fang Yao

    Abstract: Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 40 pages, 4 figures

    Journal ref: Journal of the Royal Statistical Society: Series B, 2024

  27. arXiv:2403.06942  [pdf, other

    eess.SY cs.LG stat.ML

    Grid Monitoring and Protection with Continuous Point-on-Wave Measurements and Generative AI

    Authors: Lang Tong, Xinyi Wang, Qing Zhao

    Abstract: Purpose This article presents a case for a next-generation grid monitoring and control system, leveraging recent advances in generative artificial intelligence (AI), machine learning, and statistical inference. Advancing beyond earlier generations of wide-area monitoring systems built upon supervisory control and data acquisition (SCADA) and synchrophasor technologies, we argue for a monitoring an… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  28. arXiv:2403.04015  [pdf, other

    cs.LG cs.AI stat.ML

    Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

    Authors: Xinyuan Wang, Dongjie Wang, Wangyang Ying, Rui Xie, Haifeng Chen, Yanjie Fu

    Abstract: Feature selection prepares the AI-readiness of data by eliminating redundant features. Prior research falls into two primary categories: i) Supervised Feature Selection, which identifies the optimal feature subset based on their relevance to the target variable; ii) Unsupervised Feature Selection, which reduces the feature space dimensionality by capturing the essential information within the feat… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  29. arXiv:2402.14438  [pdf, ps, other

    stat.ME

    Efficiency-improved doubly robust estimation with non-confounding predictive covariates

    Authors: Shanshan Luo, Mengchen Shi, Wei Li, Xueli Wang, Zhi Geng

    Abstract: In observational studies, covariates with substantial missing data are often omitted, despite their strong predictive capabilities. These excluded covariates are generally believed not to simultaneously affect both treatment and outcome, indicating that they are not genuine confounders and do not impact the identification of the average treatment effect (ATE). In this paper, we introduce an altern… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  30. arXiv:2402.13870  [pdf, ps, other

    cs.LG eess.SP stat.AP

    Generative Probabilistic Time Series Forecasting and Applications in Grid Operations

    Authors: Xinyi Wang, Lang Tong, Qing Zhao

    Abstract: Generative probabilistic forecasting produces future time series samples according to the conditional probability distribution given past time series observations. Such techniques are essential in risk-based decision-making and planning under uncertainty with broad applications in grid operations, including electricity price forecasting, risk-based economic dispatch, and stochastic optimizations.… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted at CISS 2024. arXiv admin note: text overlap with arXiv:2306.03782

  31. arXiv:2402.11283  [pdf, other

    math.NA stat.ML

    Deep adaptive sampling for surrogate modeling without labeled data

    Authors: Xili Wang, Kejun Tang, Jiayu Zhai, Xiaoliang Wan, Chao Yang

    Abstract: Surrogate modeling is of great practical significance for parametric differential equation systems. In contrast to classical numerical methods, using physics-informed deep learning methods to construct simulators for such systems is a promising direction due to its potential to handle high dimensionality, which requires minimizing a loss over a training set of random samples. However, the random s… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  32. arXiv:2402.10797  [pdf, other

    cs.MS cs.LG stat.CO stat.ML

    BlackJAX: Composable Bayesian inference in JAX

    Authors: Alberto Cabezas, Adrien Corenflos, Junpeng Lao, Rémi Louf, Antoine Carnec, Kaustubh Chaudhari, Reuben Cohn-Gordon, Jeremie Coullon, Wei Deng, Sam Duffield, Gerardo Durán-Martín, Marcin Elantkowski, Dan Foreman-Mackey, Michele Gregori, Carlos Iguaran, Ravin Kumar, Martin Lysy, Kevin Murphy, Juan Camilo Orduz, Karm Patel, Xi Wang, Rob Zinkov

    Abstract: BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well w… ▽ More

    Submitted 22 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Companion paper for the library https://github.com/blackjax-devs/blackjax Update: minor changes and updated the list of authors to include technical contributors

  33. arXiv:2402.08412  [pdf, other

    stat.ML cs.LG math.DS math.ST

    Interacting Particle Systems on Networks: joint inference of the network and the interaction kernel

    Authors: Quanjun Lang, Xiong Wang, Fei Lu, Mauro Maggioni

    Abstract: Modeling multi-agent systems on networks is a fundamental challenge in a wide variety of disciplines. We jointly infer the weight matrix of the network and the interaction kernel, which determine respectively which agents interact with which others and the rules of such interactions from data consisting of multiple trajectories. The estimator we propose leads naturally to a non-convex optimization… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 53 pages, 17 figures

    MSC Class: 62F12; 82C22

  34. arXiv:2402.02687  [pdf, other

    cs.LG cs.AI stat.ML

    Poisson Process for Bayesian Optimization

    Authors: Xiaoxing Wang, Jiaxing Li, Chao Xue, Wei Liu, Weifeng Liu, Xiaokang Yang, Junchi Yan, Dacheng Tao

    Abstract: BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidat… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  35. arXiv:2402.01121  [pdf, other

    stat.ME

    Non-linear Mendelian randomization with Two-stage prediction estimation and Control function estimation

    Authors: Xinpei Wang, Tao Huang, **zhu Jia

    Abstract: Most of the existing Mendelian randomization (MR) methods are limited by the assumption of linear causality between exposure and outcome, and the development of new non-linear MR methods is highly desirable. We introduce two-stage prediction estimation and control function estimation from econometrics to MR and extend them to non-linear causality. We give conditions for parameter identification an… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 9 pages, 4 figures

  36. arXiv:2401.15680  [pdf, other

    stat.ME

    How to achieve model-robust inference in stepped wedge trials with model-based methods?

    Authors: Bingkai Wang, Xueqi Wang, Fan Li

    Abstract: A stepped wedge design is a unidirectional crossover design where clusters are randomized to distinct treatment sequences. While model-based analysis of stepped wedge designs -- via linear mixed models or generalized estimating equations -- is standard practice to evaluate treatment effects accounting for clustering and adjusting for baseline covariates, their properties under misspecification hav… ▽ More

    Submitted 27 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  37. arXiv:2401.11070  [pdf, other

    stat.ME

    Efficient Data Reduction Strategies for Big Data and High-Dimensional LASSO Regressions

    Authors: Xin Wang, Min Yang, William Li

    Abstract: The IBOSS approach proposed by Wang et al. (2019) selects the most informative subset of n points. It assumes that the ordinary least squares method is used and requires that the number of variables, p, is not large. However, in many practical problems, p is very large and penalty-based model fitting methods such as LASSO is used. We study the big data problems, in which both n and p are large. In… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  38. arXiv:2311.17303  [pdf, other

    cs.LG cs.AI stat.ME

    Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge

    Authors: Xiaoge Zhang, Xiao-Lin Wang, Fenglei Fan, Yiu-Ming Cheung, Indranil Bose

    Abstract: In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while… ▽ More

    Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  39. arXiv:2311.16852  [pdf, other

    math.ST math.PR stat.ML

    Optimal minimax rate of learning interaction kernels

    Authors: Xiong Wang, Inbar Seroussi, Fei Lu

    Abstract: Nonparametric estimation of nonlocal interaction kernels is crucial in various applications involving interacting particle systems. The inference challenge, situated at the nexus of statistical learning and inverse problems, comes from the nonlocal dependency. A central question is whether the optimal minimax rate of convergence for this problem aligns with the rate of $M^{-\frac{2β}{2β+1}}$ in cl… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 42 pages, 1 figure

    MSC Class: 62G08; 62G20; 60B20

  40. arXiv:2311.14332  [pdf, other

    cs.LG stat.ML

    GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation

    Authors: Yakun Chen, Xianzhi Wang, Guandong Xu

    Abstract: The analysis of spatiotemporal data is increasingly utilized across diverse domains, including transportation, healthcare, and meteorology. In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors. The objective of spatiotemporal imputation is to estimate these missing values by understanding the inherent spatial and tempo… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  41. arXiv:2311.02618  [pdf, other

    stat.AP

    Regionalization of China's PM2.5 through Robust Spatio temporal Functional Clustering Method

    Authors: Tingyin Wang, Xueqin Wang, Xiaobo Guo, He** Zhang

    Abstract: The patterns of particulate matter with diameters that are generally 2.5 micrometers and smaller (PM2.5) are heterogeneous in China nationwide but can be homogeneous region-wide. To reduce the adverse effects from PM2.5, policymakers need to develop location-specific regulations based on nationwide clustering analysis of PM2.5 concentrations. However, such an analysis is challenging because the da… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  42. arXiv:2311.02574  [pdf, other

    stat.ME

    Semi-supervised Estimation of Event Rate with Doubly-censored Survival Data

    Authors: Yang Wang, Qingning Zhou, Tianxi Cai, Xuan Wang

    Abstract: Electronic Health Record (EHR) has emerged as a valuable source of data for translational research. To leverage EHR data for risk prediction and subsequently clinical decision support, clinical endpoints are often time to onset of a clinical condition of interest. Precise information on clinical event times is often not directly available and requires labor-intensive manual chart review to ascerta… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 44 pages, 9 figures

  43. arXiv:2310.13162  [pdf, other

    stat.ME

    Network Meta-Analysis of Time-to-Event Endpoints with Individual Participant Data using Restricted Mean Survival Time Regression

    Authors: Kaiyuan Hua, Xiaofei Wang, Hwanhee Hong

    Abstract: Restricted mean survival time (RMST) models have gained popularity when analyzing time-to-event outcomes because RMST models offer more straightforward interpretations of treatment effects with fewer assumptions than hazard ratios commonly estimated from Cox models. However, few network meta-analysis (NMA) methods have been developed using RMST. In this paper, we propose advanced RMST NMA models w… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  44. arXiv:2310.09257  [pdf, other

    stat.ME

    A SIMPLE Approach to Provably Reconstruct Ising Model with Global Optimality

    Authors: Junxian Zhu, Xuanyu Chen, ** Zhu, Xueqin Wang, He** Zhang

    Abstract: Reconstruction of interaction network between random events is a critical problem arising from statistical physics and politics to sociology, biology, and psychology, and beyond. The Ising model lays the foundation for this reconstruction process, but finding the underlying Ising model from the least amount of observed samples in a computationally efficient manner has been historically challenging… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  45. arXiv:2309.13557  [pdf, other

    stat.CO math.NA

    Bayesian Parameter Inference for Partially Observed Diffusions using Multilevel Stochastic Runge-Kutta Methods

    Authors: Pierre Del Moral, Shulan Hu, Ajay Jasra, Hamza Ruzayqat, Xinyu Wang

    Abstract: We consider the problem of Bayesian estimation of static parameters associated to a partially and discretely observed diffusion process. We assume that the exact transition dynamics of the diffusion process are unavailable, even up-to an unbiased estimator and that one must time-discretize the diffusion process. In such scenarios it has been shown how one can introduce the multilevel Monte Carlo m… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

  46. arXiv:2309.12872  [pdf, other

    stat.ME

    Deep regression learning with optimal loss function

    Authors: Xuancheng Wang, Ling Zhou, Huazhen Lin

    Abstract: In this paper, we develop a novel efficient and robust nonparametric regression estimator under a framework of feedforward neural network. There are several interesting characteristics for the proposed estimator. First, the loss function is built upon an estimated maximum likelihood function, who integrates the information from observed data, as well as the information from data structure. Consequ… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  47. arXiv:2309.06230  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    A Consistent and Scalable Algorithm for Best Subset Selection in Single Index Models

    Authors: Borui Tang, ** Zhu, Junxian Zhu, Xueqin Wang, He** Zhang

    Abstract: Analysis of high-dimensional data has led to increased interest in both single index models (SIMs) and best subset selection. SIMs provide an interpretable and flexible modeling framework for high-dimensional data, while best subset selection aims to find a sparse model from a large set of predictors. However, best subset selection in high-dimensional models is known to be computationally intracta… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  48. arXiv:2309.05145  [pdf, other

    cs.LG cs.AI stat.ML

    Outlier Robust Adversarial Training

    Authors: Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu

    Abstract: Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: Accepted by The 15th Asian Conference on Machine Learning (ACML 2023)

  49. arXiv:2309.05092  [pdf, other

    stat.ME cs.LG math.ST

    Adaptive conformal classification with noisy labels

    Authors: Matteo Sesia, Y. X. Rachel Wang, Xin Tong

    Abstract: This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, leading to more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise characterization of the effective coverage inflation (or deflation) suffered by… ▽ More

    Submitted 21 February, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: 28 pages (127 pages including references and appendices)

  50. arXiv:2308.16382  [pdf

    cs.SI stat.ML

    A stochastic block model for community detection in attributed networks

    Authors: Xiao Wang, Fang Dai, Wenyan Guo, Junfeng Wang

    Abstract: Community detection is an important content in complex network analysis. The existing community detection methods in attributed networks mostly focus on only using network structure, while the methods of integrating node attributes is mainly for the traditional community structures, and cannot detect multipartite structures and mixture structures in network. In addition, the model-based community… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.