Skip to main content

Showing 1–50 of 620 results for author: Wang, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.14772  [pdf, other

    stat.ME cs.CR cs.SI

    Consistent community detection in multi-layer networks with heterogeneous differential privacy

    Authors: Yaoming Zhen, Shirong Xu, Junhui Wang

    Abstract: As network data has become increasingly prevalent, a substantial amount of attention has been paid to the privacy issue in publishing network data. One of the critical challenges for data publishers is to preserve the topological structures of the original network while protecting sensitive information. In this paper, we propose a personalized edge flip** mechanism that allows data publishers to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.12212  [pdf, other

    stat.AP stat.ME

    Identifying Genetic Variants for Obesity Incorporating Prior Insights: Quantile Regression with Insight Fusion for Ultra-high Dimensional Data

    Authors: Jiantong Wang, Heng Lian, Yan Yu, He** Zhang

    Abstract: Obesity is widely recognized as a critical and pervasive health concern. We strive to identify important genetic risk factors from hundreds of thousands of single nucleotide polymorphisms (SNPs) for obesity. We propose and apply a novel Quantile Regression with Insight Fusion (QRIF) approach that can integrate insights from established studies or domain knowledge to simultaneously select variables… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This article is submitted to Journal of the American Statistical Association

  3. arXiv:2406.11011  [pdf, other

    cs.LG cs.CL stat.ML

    Data Shapley in One Training Run

    Authors: Jiachen T. Wang, Prateek Mittal, Dawn Song, Ruoxi Jia

    Abstract: Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing their application to large-scale models. Furthermore, they produce the same attribution score for any models produced by running the learning algorithm, m… ▽ More

    Submitted 29 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  4. arXiv:2406.09557  [pdf, other

    math.OC eess.SY stat.AP

    Measure This, Not That: Optimizing the Cost and Model-Based Information Content of Measurements

    Authors: Jialu Wang, Zedong Peng, Ryan Hughes, Debangsu Bhattacharyya, David E. Bernal Neira, Alexander W. Dowling

    Abstract: Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    MSC Class: 90C25; 90C11; 90C30; 90C90; 62K05

  5. arXiv:2406.06980  [pdf, other

    stat.ME

    Sensitivity Analysis for the Test-Negative Design

    Authors: Soumyabrata Kundu, Peng Ding, Xinran Li, **gshu Wang

    Abstract: The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, which is an important while often unmeasured confounder between the va… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2406.01461  [pdf, other

    cs.LG math.DG stat.ML

    Hardness of Learning Neural Networks under the Manifold Hypothesis

    Authors: Bobak T. Kiani, Jason Wang, Melanie Weber

    Abstract: The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2405.20763  [pdf, other

    cs.LG math.OC stat.ML

    Improving Generalization and Convergence by Enhancing Implicit Regularization

    Authors: Mingze Wang, Haotian He, **bo Wang, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

    Abstract: In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that I… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 35 pages

  8. arXiv:2405.16413  [pdf, other

    cs.AI cs.CL cs.LG stat.AP

    Augmented Risk Prediction for the Onset of Alzheimer's Disease from Electronic Health Records with Large Language Models

    Authors: Jiankun Wang, Sumyeong Ahn, Taykhoom Dalal, Xiaodan Zhang, Weishen Pan, Qiannan Zhang, Bin Chen, Hiroko H. Dodge, Fei Wang, Jiayu Zhou

    Abstract: Alzheimer's disease (AD) is the fifth-leading cause of death among Americans aged 65 and older. Screening and early detection of AD and related dementias (ADRD) are critical for timely intervention and for identifying clinical trial participants. The widespread adoption of electronic health records (EHRs) offers an important resource for develo** ADRD screening tools such as machine learning bas… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  9. arXiv:2405.15441  [pdf, other

    stat.ML cs.CC cs.LG

    Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

    Authors: Jie Wang, March Boedihardjo, Yao Xie

    Abstract: Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear map** that reduces data int… ▽ More

    Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 34 pages, 7 figures, 4 tables

  10. arXiv:2405.08759  [pdf, other

    stat.ME stat.AP

    Optimal Sequential Procedure for Early Detection of Multiple Side Effects

    Authors: Jiayue Wang, Ben Boukai

    Abstract: In this paper, we propose an optimal sequential procedure for the early detection of potential side effects resulting from the administration of some treatment (e.g. a vaccine, say). The results presented here extend previous results obtained in Wang and Boukai (2024) who study the single side effect case to the case of two (or more) side effects. While the sequential procedure we employ, simultan… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: A total of 31 with 6 Tables and 8 Figures

    MSC Class: 62L10; 62L12

  11. arXiv:2405.03875  [pdf, other

    cs.LG stat.ML

    Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

    Authors: Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

    Abstract: Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  12. arXiv:2404.15760  [pdf, other

    cs.LG cs.AI stat.ML

    Debiasing Machine Unlearning with Counterfactual Examples

    Authors: Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, ** Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei

    Abstract: The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.14786  [pdf, other

    cs.AI cs.LG stat.ME

    RealTCD: Temporal Causal Discovery from Interventional Data with Large Language Model

    Authors: Peiwen Li, Xin Wang, Zeyang Zhang, Yuan Meng, Fang Shen, Yue Li, Jialong Wang, Yang Li, Wenweu Zhu

    Abstract: In the field of Artificial Intelligence for Information Technology Operations, causal discovery is pivotal for operation and maintenance of graph construction, facilitating downstream industrial tasks such as root cause analysis. Temporal causal discovery, as an emerging method, aims to identify temporal causal relationships between variables directly from observations by utilizing interventional… ▽ More

    Submitted 26 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  14. arXiv:2404.13964  [pdf, other

    cs.LG econ.GN stat.ME

    An Economic Solution to Copyright Challenges of Generative AI

    Authors: Jiachen T. Wang, Zhun Deng, Hiroaki Chiba-Okabe, Boaz Barak, Weijie J. Su

    Abstract: Generative artificial intelligence (AI) systems are trained on large data corpora to generate new pieces of text, images, videos, and other media. There is growing concern that such systems may infringe on the copyright interests of training data contributors. To address the copyright challenges of generative AI, we propose a framework that compensates copyright owners proportionally to their cont… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.10561  [pdf, other

    cs.LG q-bio.QM stat.ML

    HiGraphDTI: Hierarchical Graph Representation Learning for Drug-Target Interaction Prediction

    Authors: Bin Liu, Siqi Wu, ** Wang, Xin Deng, Ao Zhou

    Abstract: The discovery of drug-target interactions (DTIs) plays a crucial role in pharmaceutical development. The deep learning model achieves more accurate results in DTI prediction due to its ability to extract robust and expressive features from drug and target chemical structures. However, existing deep learning methods typically generate drug features via aggregating molecular atom representations, ig… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  16. arXiv:2404.10207  [pdf, other

    stat.ML cs.LG

    HELLINGER-UCB: A novel algorithm for stochastic multi-armed bandit problem and cold start problem in recommender system

    Authors: Ruibo Yang, Jiazhou Wang, Andrew Mullhaupt

    Abstract: In this paper, we study the stochastic multi-armed bandit problem, where the reward is driven by an unknown random variable. We propose a new variant of the Upper Confidence Bound (UCB) algorithm called Hellinger-UCB, which leverages the squared Hellinger distance to build the upper confidence bound. We prove that the Hellinger-UCB reaches the theoretical lower bound. We also show that the Helling… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.04992  [pdf, other

    cs.CV stat.AP

    Efficient Surgical Tool Recognition via HMM-Stabilized Deep Learning

    Authors: Haifeng Wang, Hao Xu, Jun Wang, Jian Zhou, Ke Deng

    Abstract: Recognizing various surgical tools, actions and phases from surgery videos is an important problem in computer vision with exciting clinical applications. Existing deep-learning-based methods for this problem either process each surgical video as a series of independent images without considering their dependence, or rely on complicated deep learning models to count for dependence of video frames.… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  18. arXiv:2404.01466  [pdf, other

    cs.LG stat.ME

    TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data

    Authors: Omar Faruque, Sahara Ali, Xue Zheng, Jianwu Wang

    Abstract: The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  19. arXiv:2403.14822  [pdf, other

    stat.ML cs.LG math.OC

    Non-Convex Robust Hypothesis Testing using Sinkhorn Uncertainty Sets

    Authors: Jie Wang, Rui Gao, Yao Xie

    Abstract: We present a new framework to address the non-convex robust hypothesis testing problem, wherein the goal is to seek the optimal detector that minimizes the maximum of worst-case type-I and type-II risk functions. The distributional uncertainty sets are constructed to center around the empirical distribution derived from samples based on Sinkhorn discrepancy. Given that the objective involves non-c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 26 pages, 2 figures

  20. arXiv:2402.17366  [pdf

    stat.ME

    The risks of risk assessment: causal blind spots when using prediction models for treatment decisions

    Authors: Nan van Geloven, Ruth H Keogh, Wouter van Amsterdam, Giovanni Cinà, Jesse H. Krijthe, Niels Peek, Kim Luijken, Sara Magliacane, Paweł Morzywołek, Thijs van Ommen, Hein Putter, Matthew Sperrin, Junfeng Wang, Daniala L. Weir, Vanessa Didelez

    Abstract: Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as… ▽ More

    Submitted 6 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  21. arXiv:2402.11948  [pdf

    cs.LG cs.AI stat.ML

    Mini-Hes: A Parallelizable Second-order Latent Factor Analysis Model

    Authors: Jialiang Wang, Weiling Li, Yurong Zhong, Xin Luo

    Abstract: Interactions among large number of entities is naturally high-dimensional and incomplete (HDI) in many big data related tasks. Behavioral characteristics of users are hidden in these interactions, hence, effective representation of the HDI data is a fundamental task for understanding user behaviors. Latent factor analysis (LFA) model has proven to be effective in representing HDI data. The perform… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 6 pages

  22. arXiv:2402.02368  [pdf, other

    cs.LG stat.ML

    Timer: Generative Pre-trained Transformers Are Large Time Series Models

    Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

    Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous prog… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  23. arXiv:2401.13760  [pdf, other

    stat.AP math.ST

    Early Detection of Treatments Side Effect: A Sequential Approach

    Authors: Jiayue Wang, Ben Boukai

    Abstract: With the emergence and spread of infectious diseases with pandemic potential, such as COVID- 19, the urgency for vaccine development have led to unprecedented compressed and accelerated schedules that shortened the standard development timeline. In a relatively short time, the leading pharmaceutical companies1, received an Emergency Use Authorization (EUA) for vaccine\prime s en-mass deployment To… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: There are 21 pages, 8 pictures and 4 tables

    MSC Class: 62L10; 62L12

  24. arXiv:2401.13335  [pdf, other

    stat.ML cs.AI cs.LG

    Full Bayesian Significance Testing for Neural Networks

    Authors: Zehua Liu, Zimeng Li, **gyuan Wang, Yue He

    Abstract: Significance testing aims to determine whether a proposition about the population distribution is the truth or not given observations. However, traditional significance testing often needs to derive the distribution of the testing statistic, failing to deal with complex nonlinear relationships. In this paper, we propose to conduct Full Bayesian Significance Testing for neural networks, called \tex… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Published as a conference paper at AAAI 2024

  25. arXiv:2401.11103  [pdf, other

    cs.DS cs.LG stat.ML

    Efficient Data Shapley for Weighted Nearest Neighbor Algorithms

    Authors: Jiachen T. Wang, Prateek Mittal, Ruoxi Jia

    Abstract: This work aims to address an open problem in data valuation literature concerning the efficient computation of Data Shapley for weighted $K$ nearest neighbor algorithm (WKNN-Shapley). By considering the accuracy of hard-label KNN with discretized weights as the utility function, we reframe the computation of WKNN-Shapley into a counting problem and introduce a quadratic-time algorithm, presenting… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: AISTATS 2024 Oral

  26. arXiv:2401.09125  [pdf, other

    cs.LG stat.ML

    Understanding Heterophily for Graph Neural Networks

    Authors: Junfu Wang, Yuanfang Guo, Liang Yang, Yunhong Wang

    Abstract: Graphs with heterophily have been regarded as challenging scenarios for Graph Neural Networks (GNNs), where nodes are connected with dissimilar neighbors through various patterns. In this paper, we present theoretical understandings of the impacts of different heterophily patterns for GNNs by incorporating the graph convolution (GC) operations into fully connected networks via the proposed Heterop… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: ICML 2024

  27. arXiv:2401.04693  [pdf, other

    stat.ME

    Co-Clustering Multi-View Data Using the Latent Block Model

    Authors: Joshua Tobin, Michaela Black, James Ng, Debbie Rankin, Jonathan Wallace, Catherine Hughes, Leane Hoey, Adrian Moore, **ling Wang, Geraldine Horigan, Paul Carlin, Helene McNulty, Anne M Molloy, Mimi Zhang

    Abstract: The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block cluster and allowing the use of well-grounded model selection methods. The LBM, while adapted in literature to handle different feature types, cannot be applied to datasets consisting of multiple disjoint sets of features, termed views, for a common set of observations.… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  28. arXiv:2312.17122  [pdf, other

    cs.CL cs.AI stat.ML

    Large Language Model for Causal Decision Making

    Authors: Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song

    Abstract: Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  29. arXiv:2312.10563  [pdf, other

    stat.ME math.ST

    Mediation Analysis with Mendelian Randomization and Efficient Multiple GWAS Integration

    Authors: Rita Qiuran Lyu, Chong Wu, Xinwei Ma, **gshen Wang

    Abstract: Mediation analysis is a powerful tool for studying causal pathways between exposure, mediator, and outcome variables of interest. While classical mediation analysis using observational data often requires strong and sometimes unrealistic assumptions, such as unconfoundedness, Mendelian Randomization (MR) avoids unmeasured confounding bias by employing genetic variations as instrumental variables.… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

  30. arXiv:2312.06883  [pdf, other

    stat.ME

    Adaptive Experiments Toward Learning Treatment Effect Heterogeneity

    Authors: Waverly Wei, Xinwei Ma, **gshen Wang

    Abstract: Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analyzing observational data based on strong causal assumptions or conducting post hoc analyses of randomized contr… ▽ More

    Submitted 13 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  31. arXiv:2312.05771  [pdf, other

    cs.LG stat.ML

    Hacking Task Confounder in Meta-Learning

    Authors: **gyao Wang, Yi Ren, Zeen Song, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

    Abstract: Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain thi… ▽ More

    Submitted 29 May, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Accepted by IJCAI 2024, 9 pages, 5 figures, 4 tables

  32. arXiv:2312.05549  [pdf, other

    cs.LG stat.ML

    Multi-granularity Causal Structure Learning

    Authors: Jiaxuan Liang, Jun Wang, Guoxian Yu, Shuyin Xia, Guoyin Wang

    Abstract: Unveil, model, and comprehend the causal mechanisms underpinning natural phenomena stand as fundamental endeavors across myriad scientific disciplines. Meanwhile, new knowledge emerges when discovering causal relationships from data. Existing causal learning algorithms predominantly focus on the isolated effects of variables, overlook the intricate interplay of multiple variables and their collect… ▽ More

    Submitted 12 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: Accepted by the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI2024)

  33. arXiv:2312.03438  [pdf, ps, other

    math.OC eess.SP stat.ML

    On the Estimation Performance of Generalized Power Method for Heteroscedastic Probabilistic PCA

    Authors: **xin Wang, Chonghe Jiang, Huikang Liu, Anthony Man-Cho So

    Abstract: The heteroscedastic probabilistic principal component analysis (PCA) technique, a variant of the classic PCA that considers data heterogeneity, is receiving more and more attention in the data science and signal processing communities. In this paper, to estimate the underlying low-dimensional linear subspace (simply called \emph{ground truth}) from available heterogeneous data samples, we consider… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 22 pages

  34. arXiv:2311.17547  [pdf, other

    stat.ME

    Risk-based decision making: estimands for sequential prediction under interventions

    Authors: Kim Luijken, Paweł Morzywołek, Wouter van Amsterdam, Giovanni Cinà, Jeroen Hoogland, Ruth Keogh, Jesse Krijthe, Sara Magliacane, Thijs van Ommen, Niels Peek, Hein Putter, Maarten van Smeden, Matthew Sperrin, Junfeng Wang, Daniala Weir, Vanessa Didelez, Nan van Geloven

    Abstract: Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk beca… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 32 pages, 2 figures

  35. arXiv:2311.16856  [pdf, other

    cs.LG eess.SP stat.ML

    Attentional Graph Neural Networks for Robust Massive Network Localization

    Authors: Wenzhong Yan, Juntao Wang, Feng Yin, Yang Tian, Abdelhak M. Zoubir

    Abstract: In recent years, Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning. However, their application in regression tasks remains underexplored. To tap the potential of GNNs in regression, this paper integrates GNNs with attention mechanism, a technique that revolutionized sequential learning tasks with its adaptability and robustness, to tackle a… ▽ More

    Submitted 14 February, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  36. arXiv:2311.13825  [pdf, other

    stat.ME stat.CO

    Online Prediction of Extreme Conditional Quantiles via B-Spline Interpolation

    Authors: Zhengpin Li, Jian Wang, Yanxi Hou

    Abstract: Extreme quantiles are critical for understanding the behavior of data in the tail region of a distribution. It is challenging to estimate extreme quantiles, particularly when dealing with limited data in the tail. In such cases, extreme value theory offers a solution by approximating the tail distribution using the Generalized Pareto Distribution (GPD). This allows for the extrapolation beyond the… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 22 pages, 16 figures

  37. arXiv:2311.13196  [pdf, other

    cs.IT eess.SP stat.ME

    Optimal Time of Arrival Estimation for MIMO Backscatter Channels

    Authors: Chen He, Luyang Han, Z. Jane Wang

    Abstract: In this paper, we propose a novel time of arrival (TOA) estimator for multiple-input-multiple-output (MIMO) backscatter channels in closed form. The proposed estimator refines the estimation precision from the topological structure of the MIMO backscatter channels, and can considerably enhance the estimation accuracy. Particularly, we show that for the general $M \times N$ bistatic topology, the m… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  38. arXiv:2311.12379  [pdf, other

    cs.LG cs.AI stat.ML

    Infinite forecast combinations based on Dirichlet process

    Authors: Yinuo Ren, Feng Li, Yanfei Kang, Jue Wang

    Abstract: Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert… ▽ More

    Submitted 24 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  39. arXiv:2311.02532  [pdf, other

    stat.ME

    Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making

    Authors: Ting Li, Chengchun Shi, Jianing Wang, Fan Zhou, Hongtu Zhu

    Abstract: A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentia… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  40. arXiv:2310.20460  [pdf, other

    stat.ME math.ST stat.AP

    Aggregating Dependent Signals with Heavy-Tailed Combination Tests

    Authors: Lin Gui, Yuchao Jiang, **gshu Wang

    Abstract: Combining dependent p-values to evaluate the global null hypothesis presents a longstanding challenge in statistical inference, particularly when aggregating results from diverse methods to boost signal detection. P-value combination tests using heavy-tailed distribution based transformations, such as the Cauchy combination test and the harmonic mean p-value, have recently garnered significant int… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  41. arXiv:2310.16290  [pdf, other

    stat.ME econ.EM

    Fair Adaptive Experiments

    Authors: Waverly Wei, Xinwei Ma, **gshen Wang

    Abstract: Randomized experiments have been the gold standard for assessing the effectiveness of a treatment or policy. The classical complete randomization approach assigns treatments based on a prespecified probability and may lead to inefficient use of data. Adaptive experiments improve upon complete randomization by sequentially learning and updating treatment assignment probabilities. However, their app… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  42. arXiv:2310.16203  [pdf, other

    stat.ME

    Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework

    Authors: Lan Luo, Chengchun Shi, Jitao Wang, Zhenke Wu, Lexin Li

    Abstract: Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treat… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  43. arXiv:2310.10239  [pdf, other

    stat.ML cs.LG stat.ME

    Structural transfer learning of non-Gaussian DAG

    Authors: Mingyang Ren, Xin He, Junhui Wang

    Abstract: Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstr… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 35 pages, 3 figures, 3 tables

  44. arXiv:2310.09583  [pdf, other

    cs.LG stat.ML

    Two Sides of The Same Coin: Bridging Deep Equilibrium Models and Neural ODEs via Homotopy Continuation

    Authors: Shutong Ding, Tianyu Cui, **gya Wang, Ye Shi

    Abstract: Deep Equilibrium Models (DEQs) and Neural Ordinary Differential Equations (Neural ODEs) are two branches of implicit models that have achieved remarkable success owing to their superior performance and low memory consumption. While both are implicit models, DEQs and Neural ODEs are derived from different mathematical formulations. Inspired by homotopy continuation, we establish a connection betwee… ▽ More

    Submitted 21 December, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS2023

  45. arXiv:2310.08268  [pdf, other

    stat.ME

    Change point detection in dynamic heterogeneous networks via subspace tracking

    Authors: Yuzhao Zhang, **gnan Zhang, Yifan Sun, Junhui Wang

    Abstract: Dynamic networks consist of a sequence of time-varying networks, and it is of great importance to detect the network change points. Most existing methods focus on detecting abrupt change points, necessitating the assumption that the underlying network probability matrix remains constant between adjacent change points. This paper introduces a new model that allows the network probability matrix to… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  46. arXiv:2310.00646  [pdf, other

    cs.LG cs.AI stat.ML

    WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

    Authors: **gtan Wang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  47. arXiv:2309.08039  [pdf, other

    stat.ME math.ST

    Flexible Functional Treatment Effect Estimation

    Authors: Jiayi Wang, Raymond K. W. Wong, Xiaoke Zhang, Kwun Chuen Gary Chan

    Abstract: We study treatment effect estimation with functional treatments where the average potential outcome functional is a function of functions, in contrast to continuous treatment effect estimation where the target is a function of real numbers. By considering a flexible scalar-on-function marginal structural model, a weight-modified kernel ridge regression (WMKRR) is adopted for estimation. The weight… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  48. arXiv:2309.06991  [pdf, other

    cs.LG cs.CL stat.ML

    Unsupervised Contrast-Consistent Ranking with Language Models

    Authors: Niklas Stoehr, Pengxiang Cheng, **g Wang, Daniel Preotiuc-Pietro, Rajarshi Bhowmik

    Abstract: Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful cal… ▽ More

    Submitted 3 February, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Long Paper at EACL 2024

  49. arXiv:2309.04957  [pdf, other

    stat.ME

    Winner's Curse Free Robust Mendelian Randomization with Summary Data

    Authors: Zhongming Xie, Wanheng Zhang, **gshen Wang, Chong Wu

    Abstract: In the past decade, the increased availability of genome-wide association studies summary data has popularized Mendelian Randomization (MR) for conducting causal inference. MR analyses, incorporating genetic variants as instrumental variables, are known for their robustness against reverse causation bias and unmeasured confounders. Nevertheless, classical MR analyses utilizing summary data may sti… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  50. arXiv:2309.04626  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning

    Authors: Austin Xu, Andrew D. McRae, **gyan Wang, Mark A. Davenport, Ashwin Pananjady

    Abstract: We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobi… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 42 pages, 6 figures