Skip to main content

Showing 1–50 of 759 results for author: Zhang, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.03082  [pdf, other

    cs.LG stat.ML

    Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations

    Authors: Yuling Zhang, Anpeng Wu, Kun Kuang, Liang Du, Zixun Sun, Zhi Wang

    Abstract: Heterogeneous treatment effect (HTE) estimation is vital for understanding the change of treatment effect across individuals or subgroups. Most existing HTE estimation methods focus on addressing selection bias induced by imbalanced distributions of confounders between treated and control units, but ignore distribution shifts across populations. Thereby, their applicability has been limited to the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by ICDE'2024

  2. arXiv:2406.18035  [pdf, other

    cs.LG stat.ML

    Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

    Authors: Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai

    Abstract: Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense o… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2211.11623

  3. arXiv:2406.16708  [pdf, other

    cs.LG stat.ME

    CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

    Authors: Lingbai Kong, Wengen Li, Hanchen Yang, Yichao Zhang, Jihong Guan, Shuigeng Zhou

    Abstract: Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.13478  [pdf, other

    stat.ME

    Semiparametric Localized Principal Stratification Analysis with Continuous Strata

    Authors: Yichi Zhang, Shu Yang

    Abstract: Principal stratification is essential for revealing causal mechanisms involving post-treatment intermediate variables. Principal stratification analysis with continuous intermediate variables is increasingly common but challenging due to the infinite principal strata and the nonidentifiability and nonregularity of principal causal effects. Inspired by recent research, we resolve these challenges b… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.12764  [pdf, other

    stat.ML cs.LG

    Quasi-Bayes meets Vines

    Authors: David Huk, Yuanhe Zhang, Mark Steel, Ritabrata Dutta

    Abstract: Recently proposed quasi-Bayesian (QB) methods initiated a new era in Bayesian computation by directly constructing the Bayesian predictive distribution through recursion, removing the need for expensive computations involved in sampling the Bayesian posterior distribution. This has proved to be data-efficient for univariate predictions, but extensions to multiple dimensions rely on a conditional d… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 36 pages, 2 figures

    MSC Class: 62G07

  6. arXiv:2406.03849  [pdf

    cs.LG stat.AP stat.ML

    A Noise-robust Multi-head Attention Mechanism for Formation Resistivity Prediction: Frequency Aware LSTM

    Authors: Yongan Zhang, Junfeng Zhao, Jian Li, Xuanran Wang, Youzhuang Sun, Yuntian Chen, Dongxiao Zhang

    Abstract: The prediction of formation resistivity plays a crucial role in the evaluation of oil and gas reservoirs, identification and assessment of geothermal energy resources, groundwater detection and monitoring, and carbon capture and storage. However, traditional well logging techniques fail to measure accurate resistivity in cased boreholes, and the transient electromagnetic method for cased borehole… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  7. arXiv:2406.00633  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Improving GFlowNets for Text-to-Image Diffusion Alignment

    Authors: Dinghuai Zhang, Yizhe Zhang, Jiatao Gu, Ruixiang Zhang, Josh Susskind, Navdeep Jaitly, Shuangfei Zhai

    Abstract: Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as alignment to a text description, which can be specified with a black-box reward function. Prior works fine-tune pretrained diffusion models to achieve this goal throu… ▽ More

    Submitted 16 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  8. arXiv:2406.00196  [pdf, other

    stat.ME stat.AP

    A Seamless Phase II/III Design with Dose Optimization for Oncology Drug Development

    Authors: Yuhan Li, Yiding Zhang, Gu Mi, Ji Lin

    Abstract: The US FDA's Project Optimus initiative that emphasizes dose optimization prior to marketing approval represents a pivotal shift in oncology drug development. It has a ripple effect for rethinking what changes may be made to conventional pivotal trial designs to incorporate a dose optimization component. Aligned with this initiative, we propose a novel Seamless Phase II/III Design with Dose Optimi… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  9. arXiv:2405.20677  [pdf, other

    cs.LG stat.ML

    Provably Efficient Interactive-Grounded Learning with Personalized Reward

    Authors: Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

    Abstract: Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with contex… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  10. arXiv:2405.17828  [pdf, other

    stat.ME

    On Robust Clustering of Temporal Point Process

    Authors: Yuecheng Zhang, Guanhua Fang, Wen Yu

    Abstract: Clustering of event stream data is of great importance in many application scenarios, including but not limited to, e-commerce, electronic health, online testing, mobile music service, etc. Existing clustering algorithms fail to take outlier data into consideration and are implemented without theoretical guarantees. In this paper, we propose a robust temporal point processes clustering framework w… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  11. arXiv:2405.17591  [pdf, other

    stat.ME

    Individualized Dynamic Mediation Analysis Using Latent Factor Models

    Authors: Yijiao Zhang, Yubai Yuan, Yuexia Zhang, Zhongyi Zhu, Annie Qu

    Abstract: Mediation analysis plays a crucial role in causal inference as it can investigate the pathways through which treatment influences outcome. Most existing mediation analysis assumes that mediation effects are static and homogeneous within populations. However, mediation effects usually change over time and exhibit significant heterogeneity in many real-world applications. Additionally, the presence… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages, 3 figures, 3 tables

  12. arXiv:2405.17479  [pdf, other

    cs.LG cs.NE stat.ML

    A rationale from frequency perspective for grokking in training neural network

    Authors: Zhangchen Zhou, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon a… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  13. arXiv:2405.16732  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    The Collusion of Memory and Nonlinearity in Stochastic Approximation With Constant Stepsize

    Authors: Dongyan Huo, Yixuan Zhang, Yudong Chen, Qiaomin Xie

    Abstract: In this work, we investigate stochastic approximation (SA) with Markovian data and nonlinear updates under constant stepsize $α>0$. Existing work has primarily focused on either i.i.d. data or linear update rules. We take a new perspective and carefully examine the simultaneous presence of Markovian dependency of data and nonlinear update rules, delineating how the interplay between these two stru… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  14. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  15. arXiv:2405.16387  [pdf, other

    stat.ML cs.LG

    Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

    Authors: Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

    Abstract: To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Ga… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 68 pages, 2 figures

  16. arXiv:2405.15050  [pdf, ps, other

    stat.ML cs.LG

    Provably Efficient Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs

    Authors: Kihyuk Hong, Yufan Zhang, Ambuj Tewari

    Abstract: We resolve the open problem of designing a computationally efficient algorithm for infinite-horizon average-reward linear Markov Decision Processes (MDPs) with $\widetilde{O}(\sqrt{T})$ regret. Previous approaches with $\widetilde{O}(\sqrt{T})$ regret either suffer from computational inefficiency or require strong assumptions on dynamics, such as ergodicity. In this paper, we approximate the avera… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  17. arXiv:2405.14681  [pdf, other

    cs.LG stat.ML

    Recursive PAC-Bayes: A Frequentist Approach to Sequential Prior Updates with No Information Loss

    Authors: Yi-Shan Wu, Yijie Zhang, Badr-Eddine Chérief-Abdellatif, Yevgeny Seldin

    Abstract: PAC-Bayesian analysis is a frequentist framework for incorporating prior knowledge into learning. It was inspired by Bayesian learning, which allows sequential data processing and naturally turns posteriors from one processing step into priors for the next. However, despite two and a half decades of research, the ability to update priors sequentially without losing confidence information along the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.13912  [pdf, other

    math.ST cs.IT cs.LG math.PR stat.ML

    Matrix Denoising with Doubly Heteroscedastic Noise: Fundamental Limits and Optimal Spectral Methods

    Authors: Yihan Zhang, Marco Mondelli

    Abstract: We study the matrix denoising problem of estimating the singular vectors of a rank-$1$ signal corrupted by noise with both column and row correlations. Existing works are either unable to pinpoint the exact asymptotic estimation error or, when they do so, the resulting approaches (e.g., based on whitening or singular value shrinkage) remain vastly suboptimal. On top of this, most of the literature… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  19. arXiv:2405.11681  [pdf, other

    stat.ME math.ST

    Distributed Tensor Principal Component Analysis

    Authors: Elynn Chen, Xi Chen, Wenbo **g, Yichen Zhang

    Abstract: As tensors become widespread in modern data analysis, Tucker low-rank Principal Component Analysis (PCA) has become essential for dimensionality reduction and structural discovery in tensor datasets. Motivated by the common scenario where large-scale tensors are distributed across diverse geographic locations, this paper investigates tensor PCA within a distributed framework where direct data pool… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  20. arXiv:2405.09003  [pdf, other

    stat.ME math.ST stat.AP

    Nonparametric Inference on Dose-Response Curves Without the Positivity Condition

    Authors: Yikun Zhang, Yen-Chi Chen, Alexander Giessing

    Abstract: Existing statistical methods in causal inference often rely on the assumption that every individual has some chance of receiving any treatment level regardless of its associated covariates, which is known as the positivity condition. This assumption could be violated in observational studies with continuous treatments. In this paper, we present a novel integral estimator of the causal effects with… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 74 pages (23 pages for the main paper), 4 figures

    MSC Class: 62G05 (Primary) 62D20; 62G20 (Secondary)

  21. arXiv:2405.07549  [pdf, other

    q-fin.RM math.PR stat.AP

    On Joint Marginal Expected Shortfall and Associated Contribution Risk Measures

    Authors: Tong Pu, Yifei Zhang, Yiying Zhang

    Abstract: Systemic risk is the risk that a company- or industry-level risk could trigger a huge collapse of another or even the whole institution. Various systemic risk measures have been proposed in the literature to quantify the domino and (relative) spillover effects induced by systemic risks such as the well-known CoVaR, CoES, MES and CoD risk measures, and associated contribution measures. This paper p… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  22. arXiv:2405.07343  [pdf, other

    eess.SY cs.LG stat.ME

    Graph neural networks for power grid operational risk assessment under evolving grid topology

    Authors: Yadong Zhang, Pranav M Karve, Sankaran Mahadevan

    Abstract: This article investigates the ability of graph neural networks (GNNs) to identify risky conditions in a power grid over the subsequent few hours, without explicit, high-resolution information regarding future generator on/off status (grid topology) or power dispatch decisions. The GNNs are trained using supervised learning, to predict the power grid's aggregated bus-level (either zonal or system-l… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Manuscript submitted to Applied Energy

  23. arXiv:2405.05679  [pdf, other

    math.ST math.PR stat.CO stat.ML

    Non-asymptotic estimates for accelerated high order Langevin Monte Carlo algorithms

    Authors: Ariel Neufeld, Ying Zhang

    Abstract: In this paper, we propose two new algorithms, namely aHOLA and aHOLLA, to sample from high-dimensional target distributions with possibly super-linearly growing potentials. We establish non-asymptotic convergence bounds for aHOLA in Wasserstein-1 and Wasserstein-2 distances with rates of convergence equal to $1+q/2$ and $1/2+q/4$, respectively, under a local Hölder condition with exponent… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  24. arXiv:2405.03624  [pdf, ps, other

    cs.LG math.OC q-fin.ST stat.ML

    $ε$-Policy Gradient for Online Pricing

    Authors: Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang

    Abstract: Combining model-based and model-free reinforcement learning approaches, this paper proposes and analyzes an $ε$-policy gradient algorithm for the online pricing learning task. The algorithm extends $ε$-greedy algorithm by replacing greedy exploitation with gradient descent step and facilitates learning via model inference. We optimize the regret of the proposed algorithm by quantifying the explora… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    MSC Class: 62J12; 68Q32; 65Y20

  25. arXiv:2404.15060  [pdf, other

    stat.ME math.ST

    Fast and reliable confidence intervals for a variance component or proportion

    Authors: Yiqiao Zhang, Karl Oskar Ekvall, Aaron J. Molstad

    Abstract: We show that confidence intervals for a variance component or proportion, with asymptotically correct uniform coverage probability, can be obtained by inverting certain test-statistics based on the score for the restricted likelihood. The results apply in settings where the variance or proportion is near or at the boundary of the parameter set. Simulations indicate the proposed test-statistics are… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  26. arXiv:2404.12312  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization

    Authors: Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen

    Abstract: This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparameterized two-layer neural networks. In particular, we consider the minimax optimization problem stemming from estimating linear functional equations defined by conditional expectations, where the objective functions are quadratic in the functional spaces. We address (i) the convergence o… ▽ More

    Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Submitted

  27. arXiv:2404.11212  [pdf

    stat.AP

    Deciphering seasonal depression variations and interplays between weather changes, physical activity, and depression severity in real-world settings: Learnings from RADAR-MDD longitudinal mobile health study

    Authors: Yuezhou Zhang, Amos A. Folarin, Yatharth Ranjan, Nicholas Cummins, Zulqarnain Rashid, Pauline Conde, Callum Stewart, Shaoxiong Sun, Srinivasan Vairavan, Faith Matcham, Carolin Oetzmann, Sara Siddi, Femke Lamers, Sara Simblett, Til Wykes, David C. Mohr, Josep Maria Haro, Brenda W. J. H. Penninx, Vaibhav A. Narayan, Matthew Hotopf, Richard J. B. Dobson, Abhishek Pratap, RADAR-CNS consortium

    Abstract: Prior research has shown that changes in seasons and weather can have a significant impact on depression severity. However, findings are inconsistent across populations, and the interplay between weather, behavior, and depression has not been fully quantified. This study analyzed real-world data from 428 participants (a subset; 68.7% of the cohort) in the RADAR-MDD longitudinal mobile health study… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  28. arXiv:2404.10985  [pdf, ps, other

    cs.CV stat.ML

    Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images

    Authors: Junbiao Pang, Zailin Dong, Jiaxin Deng, Mengyuan Zhu, Yunwei Zhang

    Abstract: Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD revision, semantic-based management, and the generation of 3D prototypes in both the architecture and engineering industries. Labeling symbols from a CAD drawing is a challenging yet notorious task from a practical point of view. In this work, we propose to label and spot symbols from CAD images that are converted from CAD… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 10 pages, 10 figures,6 tables

  29. arXiv:2404.06023  [pdf, other

    stat.ML cs.LG math.OC math.PR

    Prelimit Coupling and Steady-State Convergence of Constant-stepsize Nonsmooth Contractive SA

    Authors: Yixuan Zhang, Dongyan Huo, Yudong Chen, Qiaomin Xie

    Abstract: Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit dist… ▽ More

    Submitted 24 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: ACM SIGMETRICS 2024. 71 pages, 3 figures

  30. arXiv:2404.03764  [pdf, other

    cs.LG stat.ME stat.ML

    CONCERT: Covariate-Elaborated Robust Local Information Transfer with Conditional Spike-and-Slab Prior

    Authors: Ruqian Zhang, Yijiao Zhang, Annie Qu, Zhongyi Zhu, Juan Shen

    Abstract: The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global similarity measure between the source data and the target data, which may lead to inefficiency when only local information is shared. In this paper, we propose a novel Bayesian transfer learning method named "CO… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 31 pages, 22 figures

  31. Statistical Inference For Noisy Matrix Completion Incorporating Auxiliary Information

    Authors: Shujie Ma, Po-Yao Niu, Yichong Zhang, Yinchu Zhu

    Abstract: This paper investigates statistical inference for noisy matrix completion in a semi-supervised model when auxiliary covariates are available. The model consists of two parts. One part is a low-rank matrix induced by unobserved latent factors; the other part models the effects of the observed covariates through a coefficient matrix which is composed of high-dimensional column vectors. We model the… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  32. arXiv:2403.14593  [pdf, other

    cs.LG stat.ML

    Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

    Authors: Yangchun Zhang, Qiang Liu, Weiming Li, Yirui Zhou

    Abstract: Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) signific… ▽ More

    Submitted 14 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  33. arXiv:2403.14573  [pdf, other

    stat.ME stat.AP stat.ML

    A Transfer Learning Causal Approach to Evaluate Racial/Ethnic and Geographic Variation in Outcomes Following Congenital Heart Surgery

    Authors: Larry Han, Yi Zhang, Meena Nathan, John E. Mayer, Jr., Sara K. Pasquali, Katya Zelevinsky, Rui Duan, Sharon-Lise T. Normand

    Abstract: Congenital heart defects (CHD) are the most prevalent birth defects in the United States and surgical outcomes vary considerably across the country. The outcomes of treatment for CHD differ for specific patient subgroups, with non-Hispanic Black and Hispanic populations experiencing higher rates of mortality and morbidity. A valid comparison of outcomes within racial/ethnic subgroups is difficult… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 26 pages

  34. arXiv:2403.12166  [pdf, other

    cs.LG stat.ML

    The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection

    Authors: Mohammad Jafari, Yimeng Zhang, Yihua Zhang, Sijia Liu

    Abstract: As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We int… ▽ More

    Submitted 30 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP 2024

  35. arXiv:2403.12143  [pdf, other

    cs.LG cs.AI stat.ML

    Graph Neural Networks for Learning Equivariant Representations of Neural Networks

    Authors: Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang

    Abstract: Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance,… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In ICLR 2024. Source code: https://github.com/mkofinas/neural-graphs

  36. arXiv:2403.09869  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

    Authors: Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

    Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Develo** methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Published in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  37. arXiv:2403.07310  [pdf, other

    stat.ML cs.LG

    How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance

    Authors: Hongkang Li, Shuai Zhang, Yihua Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen

    Abstract: Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high average accuracy is accompanied by low accuracy in a minority group. Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, t… ▽ More

    Submitted 19 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  38. arXiv:2403.00258  [pdf, ps, other

    stat.ML cs.LG

    "Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

    Authors: Lingyu Gu, Yongqi Du, Yuan Zhang, Di Xie, Shiliang Pu, Robert C. Qiu, Zhenyu Liao

    Abstract: Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, s… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 32 pages, 4 figures, and 2 tables. Fixing typos in Theorems 1 and 2 from NeurIPS 2022 proceeding (https://proceedings.neurips.cc/paper_files/paper/2022/hash/185087ea328b4f03ea8fd0c8aa96f747-Abstract-Conference.html)

  39. arXiv:2402.15301  [pdf, other

    cs.CL cs.LG stat.ME

    Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models

    Authors: Yuzhe Zhang, Yipeng Zhang, Yidong Gan, Lina Yao, Chen Wang

    Abstract: Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests. They often suffer from data collection biases and limitations of individuals' knowledge. The advance of large language models (LLMs) provides opportunities to address these problems. We propose a novel method that leverages LLMs to deduce causal re… ▽ More

    Submitted 18 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  40. arXiv:2402.14703  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

    Authors: Yuheng Zhang, Nan Jiang

    Abstract: We study off-policy evaluation (OPE) in partially observable environments with complex observations, with the goal of develo** estimators whose guarantee avoids exponential dependence on the horizon. While such estimators exist for MDPs and POMDPs can be converted to history-based MDPs, their estimation errors depend on the state-density ratio for MDPs which becomes history ratios after conversi… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  41. arXiv:2402.11228  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Adaptive Split Balancing for Optimal Random Forest

    Authors: Yuqian Zhang, Weijie Ji, Jelena Bradic

    Abstract: While random forests are commonly used for regression problems, existing methods often lack adaptability in complex situations or lose optimality under simple, smooth scenarios. In this study, we introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data while simultaneously achieving minimax optimality under the Lipschitz class. To exploit higher-orde… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  42. arXiv:2402.07314  [pdf, other

    cs.LG stat.ML

    Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

    Authors: Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang

    Abstract: We study Reinforcement Learning from Human Feedback (RLHF) under a general preference oracle. In particular, we do not assume that there exists a reward function and the preference signal is drawn from the Bradley-Terry model as most of the prior works do. We consider a standard mathematical formulation, the reverse-KL regularized minimax game between two LLMs for RLHF under general preference ora… ▽ More

    Submitted 25 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: RLHF, Preference Learning, Alignment for LLMs

  43. arXiv:2402.02697  [pdf, ps, other

    cs.LG stat.ML

    Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

    Authors: Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao

    Abstract: Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML 2024

  44. arXiv:2401.15309  [pdf, other

    stat.ME

    Zero-inflated Smoothing Spline (ZISS) Models for Individual-level Single-cell Temporal Data

    Authors: Yifu Tang, Yi Zhang, Yue Wang, **gyi Zhang, Xiaoxiao Sun

    Abstract: Recent advancements in single-cell RNA-sequencing (scRNA-seq) have enhanced our understanding of cell heterogeneity at a high resolution. With the ability to sequence over 10,000 cells per hour, researchers can collect large scRNA-seq datasets for different participants, offering an opportunity to study the temporal progression of individual-level single-cell data. However, the presence of excessi… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  45. arXiv:2401.13884  [pdf, other

    stat.ML cs.LG math.OC

    Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

    Authors: Yixuan Zhang, Qiaomin Xie

    Abstract: Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 41 pages, 3 figures

  46. arXiv:2401.03072  [pdf, other

    stat.ME math.ST

    Optimal Nonparametric Inference on Network Effects with Dependent Edges

    Authors: Wenqin Du, Yuan Zhang, Wen Zhou

    Abstract: Testing network effects in weighted directed networks is a foundational problem in econometrics, sociology, and psychology. Yet, the prevalent edge dependency poses a significant methodological challenge. Most existing methods are model-based and come with stringent assumptions, limiting their applicability. In response, we introduce a novel, fully nonparametric framework that requires only minima… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 29 pages, 3 figures

    MSC Class: 62E17; 62G10; 91D30

  47. arXiv:2401.00255  [pdf, other

    stat.ME

    Adaptive Rank-based Tests for High Dimensional Mean Problems

    Authors: Yu Zhang, Long Feng

    Abstract: The Wilcoxon signed-rank test and the Wilcoxon-Mann-Whitney test are commonly employed in one sample and two sample mean tests for one-dimensional hypothesis problems. For high-dimensional mean test problems, we calculate the asymptotic distribution of the maximum of rank statistics for each variable and suggest a max-type test. This max-type test is then merged with a sum-type test, based on thei… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  48. arXiv:2312.17111  [pdf, other

    stat.ML cs.LG stat.ME

    Online Tensor Inference

    Authors: Xin Wen, Will Wei Sun, Yichen Zhang

    Abstract: Recent technological advances have led to contemporary applications that demand real-time processing and analysis of sequentially arriving tensor data. Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data due to its voluminous size. Furthermore, existing low-rank tensor methods lack the… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  49. arXiv:2312.10607  [pdf, other

    stat.ME math.ST stat.CO stat.ML

    Bayesian Model Selection via Mean-Field Variational Approximation

    Authors: Yangfan Zhang, Yun Yang

    Abstract: This article considers Bayesian model selection via mean-field (MF) variational approximation. Towards this goal, we study the non-asymptotic properties of MF inference under the Bayesian framework that allows latent variables and model mis-specification. Concretely, we show a Bernstein von-Mises (BvM) theorem for the variational distribution from MF under possible model mis-specification, which i… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  50. arXiv:2312.05404  [pdf, other

    cs.LG cs.AI stat.ME

    Disentangled Latent Representation Learning for Tackling the Confounding M-Bias Problem in Causal Inference

    Authors: Debo Cheng, Yang Xie, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Yinghao Zhang, Zaiwen Feng

    Abstract: In causal inference, it is a fundamental task to estimate the causal effect from observational data. However, latent confounders pose major challenges in causal inference in observational data, for example, confounding bias and M-bias. Recent data-driven causal effect estimators tackle the confounding bias problem via balanced representation learning, but assume no M-bias in the system, thus they… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 10 pages, 3 figures and 5 tables. Accepted by ICDM2023