Skip to main content

Showing 1–50 of 326 results for author: Li, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.00397  [pdf, other

    cs.LG stat.ML

    Markovian Gaussian Process: A Universal State-Space Representation for Stationary Temporal Gaussian Process

    Authors: Weihan Li, Yule Wang, Chengrui Li, Anqi Wu

    Abstract: Gaussian Processes (GPs) and Linear Dynamical Systems (LDSs) are essential time series and dynamic system modeling tools. GPs can handle complex, nonlinear dynamics but are computationally demanding, while LDSs offer efficient computation but lack the expressive power of GPs. To combine their benefits, we introduce a universal method that allows an LDS to mirror stationary temporal GPs. This state… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2406.15523  [pdf, other

    cs.LG stat.ML

    Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

    Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, Xin Wang

    Abstract: To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.13036  [pdf, other

    stat.ML cs.LG math.PR math.ST stat.CO

    Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

    Authors: Matthew T. C. Li, Tiangang Cui, Fengyi Li, Youssef Marzouk, Olivier Zahm

    Abstract: Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $π$ as a perturbation of a given reference measure $μ$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Ga… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2405.20782  [pdf, other

    cs.CR cs.IT stat.ML

    Universal Exact Compression of Differentially Private Mechanisms

    Authors: Yanxiao Liu, Wei-Ning Chen, Ayfer Özgür, Cheuk Ting Li

    Abstract: To reduce the communication cost of differential privacy mechanisms, we introduce a novel construction, called Poisson private representation (PPR), designed to compress and simulate any local randomizer while ensuring local differential privacy. Unlike previous simulation-based local differential privacy mechanisms, PPR exactly preserves the joint distribution of the data and the output of the or… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 30 pages, 3 figures

  5. arXiv:2405.16845  [pdf, other

    cs.LG cs.CL stat.ML

    On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

    Authors: Chenyu Zheng, Wei Huang, Rongzhen Wang, Guoqiang Wu, Jun Zhu, Chongxuan Li

    Abstract: Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objecti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 37pages

  6. arXiv:2405.09485  [pdf, other

    stat.ME

    Predicting Future Change-points in Time Series

    Authors: Chak Fung Choi, Chunxue Li, Chun Yip Yau, Zifeng Zhao

    Abstract: Change-point detection and estimation procedures have been widely developed in the literature. However, commonly used approaches in change-point analysis have mainly been focusing on detecting change-points within an entire time series (off-line methods), or quickest detection of change-points in sequentially observed data (on-line methods). Both classes of methods are concerned with change-points… ▽ More

    Submitted 23 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: 37 pages, 4 figures

    MSC Class: 62M10

  7. arXiv:2405.04566  [pdf, other

    cs.LG cs.DC stat.ML

    Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

    Authors: Chris Junchi Li

    Abstract: Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.01275  [pdf, other

    stat.ME

    Variable Selection in Ultra-high Dimensional Feature Space for the Cox Model with Interval-Censored Data

    Authors: Daewoo Pak, Jianrui Zhang, Di Wu, Haolei Weng, Chenxi Li

    Abstract: We develop a set of variable selection methods for the Cox model under interval censoring, in the ultra-high dimensional setting where the dimensionality can grow exponentially with the sample size. The methods select covariates via a penalized nonparametric maximum likelihood estimation with some popular penalty functions, including lasso, adaptive lasso, SCAD, and MCP. We prove that our penalize… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  9. arXiv:2405.00914  [pdf, other

    math.OC cs.LG stat.ML

    Accelerated Fully First-Order Methods for Bilevel and Minimax Optimization

    Authors: Chris Junchi Li

    Abstract: This paper presents a new algorithm member for accelerating first-order methods for bilevel optimization, namely the \emph{(Perturbed) Restarted Accelerated Fully First-order methods for Bilevel Approximation}, abbreviated as \texttt{(P)RAF${}^2$BA}. The algorithm leverages \emph{fully} first-order oracles and seeks approximate stationary points in nonconvex-strongly-convex bilevel optimization, e… ▽ More

    Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: Minor typographical updates. arXiv admin note: text overlap with arXiv:2307.00126

  10. arXiv:2405.00742  [pdf, other

    cs.CR cs.LG stat.ML

    Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

    Authors: Yi Li, Renyou Xie, Chaojie Li, Yi Wang, Zhaoyang Dong

    Abstract: Mitigating cybersecurity risk in electric vehicle (EV) charging demand forecasting plays a crucial role in the safe operation of collective EV chargings, the stability of the power grid, and the cost-effective infrastructure expansion. However, existing methods either suffer from the data privacy issue and the susceptibility to cyberattacks or fail to consider the spatial correlation among differe… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 11 pages,4 figures

  11. arXiv:2402.17287  [pdf, other

    cs.LG cs.CV stat.ML

    An Interpretable Evaluation of Entropy-based Novelty of Generative Models

    Authors: **gwei Zhang, Cheuk Ting Li, Farzan Farnia

    Abstract: The massive developments of generative model frameworks require principled methods for the evaluation of a model's novelty compared to a reference dataset. While the literature has extensively studied the evaluation of the quality, diversity, and generalizability of generative models, the assessment of a model's novelty compared to a reference model has not been adequately explored in the machine… ▽ More

    Submitted 13 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  12. arXiv:2402.14402  [pdf, other

    cs.LG stat.ML

    Global Safe Sequential Learning via Efficient Knowledge Transfer

    Authors: Cen-You Li, Olaf Duennbier, Marc Toussaint, Barbara Rakitsch, Christoph Zimmer

    Abstract: Sequential learning methods such as active learning and Bayesian optimization select the most informative data to learn about a task. In many medical or engineering applications, the data selection is constrained by a priori unknown safety conditions. A promissing line of safe learning methods utilize Gaussian processes (GPs) to model the safety probability and perform data selection in areas with… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  13. arXiv:2402.11341  [pdf, other

    stat.ME

    Between- and Within-Cluster Spearman Rank Correlations

    Authors: Shengxin Tu, Chun Li, Bryan E. Shepherd

    Abstract: Clustered data are common in practice. Clustering arises when subjects are measured repeatedly, or subjects are nested in groups (e.g., households, schools). It is often of interest to evaluate the correlation between two variables with clustered data. There are three commonly used Pearson correlation coefficients (total, between-, and within-cluster), which together provide an enriched perspectiv… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  14. arXiv:2402.09469  [pdf, other

    cs.LG stat.ML

    Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

    Authors: Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou

    Abstract: In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the internal representations harnessed by neural networks and Transformers. Building on recent progress toward comprehending how networks execute distinct target functions, our study embarks on an exploration of the underlying reasons behind networks adopting specific computational strategies. We direct our focu… ▽ More

    Submitted 24 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Update Section 5.3; clean up problem setup

  15. arXiv:2402.02459  [pdf, other

    stat.ML cs.LG stat.ME

    On Minimum Trace Factor Analysis -- An Old Song Sung to a New Tune

    Authors: C. Li, A. Shkolnik

    Abstract: Dimensionality reduction methods, such as principal component analysis (PCA) and factor analysis, are central to many problems in data science. There are, however, serious and well-understood challenges to finding robust low dimensional approximations for data with significant heteroskedastic noise. This paper introduces a relaxed version of Minimum Trace Factor Analysis (MTFA), a convex optimizat… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  16. arXiv:2402.02368  [pdf, other

    cs.LG stat.ML

    Timer: Generative Pre-trained Transformers Are Large Time Series Models

    Authors: Yong Liu, Haoran Zhang, Chenyu Li, Xiangdong Huang, Jianmin Wang, Mingsheng Long

    Abstract: Deep learning has contributed remarkably to the advancement of time series analysis. Still, deep models can encounter performance bottlenecks in real-world data-scarce scenarios, which can be concealed due to the performance saturation with small models on current benchmarks. Meanwhile, large models have demonstrated great powers in these scenarios through large-scale pre-training. Continuous prog… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  17. arXiv:2401.09719  [pdf, ps, other

    stat.ME

    Kernel-based multi-marker tests of association based on the accelerated failure time model

    Authors: Chenxi Li, Di Wu, Qing Lu

    Abstract: Kernel-based multi-marker tests for survival outcomes use primarily the Cox model to adjust for covariates. The proportional hazards assumption made by the Cox model could be unrealistic, especially in the long-term follow-up. We develop a suite of novel multi-marker survival tests for genetic association based on the accelerated failure time model, which is a popular alternative to the Cox model… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  18. arXiv:2312.16607  [pdf, other

    eess.IV cs.CV stat.ML

    A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

    Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

    Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  19. arXiv:2312.14165  [pdf, other

    stat.AP physics.soc-ph

    Spatiotemporal risk prediction for infectious disease spread and mortality

    Authors: Catherine Li, Daniel Lazarev

    Abstract: With the outbreak of the COVID-19 pandemic, various studies have focused on predicting the trajectory and risk factors of the virus and its variants. Building on previous work that addressed this problem using genetic and epidemiological data, we introduce a method, Geo Score, that also incorporates geographic, socioeconomic, and demographic data to estimate infection and mortality risk by region… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 16 pages, 19 figures, completed as a part of MIT PRIMES program, to be presented in Joint Mathematics Meetings 2024

  20. arXiv:2312.10958  [pdf, ps, other

    stat.ME

    Large-sample properties of multiple imputation estimators for parameters of logistic regression with covariates missing at random separately or simultaneously

    Authors: Phuoc-Loc Tran, Shen-Ming Lee, Truong-Nhat Le, Chin-Shang Li

    Abstract: We consider logistic regression including two sets of discrete or categorical covariates that are missing at random (MAR) separately or simultaneously. We examine the asymptotic properties of two multiple imputation (MI) estimators, given in the study of Lee at al. (2023), for the parameters of the logistic regression model with both sets of discrete or categorical covariates that are MAR separate… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  21. arXiv:2312.05802  [pdf, other

    stat.ME stat.CO

    Enhancing Scalability in Bayesian Nonparametric Factor Analysis of Spatiotemporal Data

    Authors: Yifan Cheng, Cheng Li

    Abstract: This manuscript puts forward novel practicable spatiotemporal Bayesian factor analysis frameworks computationally feasible for moderate to large data. Our models exhibit significantly enhanced computational scalability and storage efficiency, deliver high overall modeling performances, and possess powerful inferential capabilities for adequately predicting outcomes at future time points or new spa… ▽ More

    Submitted 2 June, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Added summaries of computational complexity and memory improvements brought about by our three novelties (the new Appendix H)

  22. Exploration of Superposition Theorem in Spectrum Space for Composite Event Analysis in an ADN

    Authors: Xing He, Qian Ai, Yuezhong Tang, Robert Qiu, Canbing Li

    Abstract: This study presents a formulation of the Superposition Theorem (ST) in the spectrum space, tailored for the analysis of composite events in an active distribution network (ADN). Our formulated ST enables a quantitative analysis on a composite event, uncovering the property of additivity among independent atom events in the spectrum space. This contribution is a significant addition to the existing… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 12 pages. Accepted by IEEE TPWRS

  23. arXiv:2311.17326  [pdf, other

    cs.LG stat.AP

    Mostly Beneficial Clustering: Aggregating Data for Operational Decision Making

    Authors: Chengzhang Li, Zhenkang Peng, Ying Rong

    Abstract: With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based Shrunken-SAA approach that can exploit the… ▽ More

    Submitted 17 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  24. arXiv:2311.14652  [pdf, other

    cs.LG cs.CL stat.ML

    One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

    Authors: Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

    Abstract: Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring substantial computational resources. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we f… ▽ More

    Submitted 5 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  25. arXiv:2311.02516  [pdf, other

    cs.LG stat.CO stat.ML

    Forward $χ^2$ Divergence Based Variational Importance Sampling

    Authors: Chengrui Li, Yule Wang, Weihan Li, Anqi Wu

    Abstract: Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly est… ▽ More

    Submitted 2 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

  26. arXiv:2311.00923  [pdf, other

    cs.LG stat.ME

    A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

    Authors: Hang Chen, Keqing Du, Chenguang Li, Xinyu Yang

    Abstract: The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: under review

  27. arXiv:2310.00729  [pdf, other

    cs.LG math.AP math.NA stat.ML

    Spectral Neural Networks: Approximation Theory and Optimization Landscape

    Authors: Chenghui Li, Rishi Sonthalia, Nicolas Garcia Trillos

    Abstract: There is a large variety of machine learning methodologies that are based on the extraction of spectral geometric information from data. However, the implementations of many of these methods often depend on traditional eigensolvers, which present limitations when applied in practical online big data scenarios. To address some of these challenges, researchers have proposed different strategies for… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  28. arXiv:2309.15699  [pdf, other

    stat.ME math.ST

    STRAW: Structure-Adaptive Weighting Procedure for Large-Scale Spatial Multiple Testing

    Authors: Pengfei Wang, Pengyu Yan, Canhui Li

    Abstract: The problem of large-scale spatial multiple testing is often encountered in various scientific research fields, where the signals are usually enriched on some regions while sparse on others. To integrate spatial structure information from nearby locations, we propose a novel approach, called {\bf STR}ucture-{\bf A}daptive {\bf W}eighting (STRAW) procedure, for large-scale spatial multiple testing.… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  29. Discovery and inference of a causal network with hidden confounding

    Authors: Li Chen, Chunlin Li, Xiaotong Shen, Wei Pan

    Abstract: This article proposes a novel causal discovery and inference method called GrIVET for a Gaussian directed acyclic graph with unmeasured confounders. GrIVET consists of an order-based causal discovery method and a likelihood-based inferential procedure. For causal discovery, we generalize the existing peeling algorithm to estimate the ancestral relations and candidate instruments in the presence of… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 27 pages, 4 figures, 3 tables. The manuscript is accepted by Journal of the American Statistical Association

  30. arXiv:2307.08496  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Can We Trust Race Prediction?

    Authors: Cangyuan Li

    Abstract: In the absence of sensitive race and ethnicity data, researchers, regulators, and firms alike turn to proxies. In this paper, I train a Bidirectional Long Short-Term Memory (BiLSTM) model on a novel dataset of voter registration data from all 50 US states and create an ensemble that achieves up to 36.8% higher out of sample (OOS) F1 scores than the best performing machine learning models in the li… ▽ More

    Submitted 7 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  31. arXiv:2307.00467  [pdf, other

    cs.LG stat.ML

    MissDiff: Training Diffusion Models on Tabular Data with Missing Values

    Authors: Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng

    Abstract: The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data. However, the vanilla diffusion model requires complete or fully observed data for training. Incomplete data is a common issue in various real-world applications, including healthcare and finance, particularly when dealing with tabular datasets. This work presents a unified and principled diff… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: 22 pages, short version is accepted by ICML workshop on Structured Probabilistic Inference & Generative Modeling 2023

    Report number: 22

  32. arXiv:2307.00126  [pdf, other

    math.OC cs.LG stat.ML

    Accelerating Inexact HyperGradient Descent for Bilevel Optimization

    Authors: Haikuo Yang, Luo Luo, Chris Junchi Li, Michael I. Jordan

    Abstract: We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $ε$-first-order stationary point of the objective with $\tilde{\mathcal{O}}(κ^{3.25}ε^{-1.75})$ oracle complexity, where $κ$ is the condition number of the lower-level objective and $ε$ is the desir… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  33. arXiv:2306.17759  [pdf, other

    stat.ML cs.LG

    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

    Authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

    Abstract: In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a… ▽ More

    Submitted 9 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

  34. arXiv:2306.13870  [pdf, ps, other

    stat.ME

    Post-Selection Inference for the Cox Model with Interval-Censored Data

    Authors: Jianrui Zhang, Chenxi Li, Haolei Weng

    Abstract: We develop a post-selection inference method for the Cox proportional hazards model with interval-censored data, which provides asymptotically valid p-values and confidence intervals conditional on the model selected by lasso. The method is based on a pivotal quantity that is shown to converge to a uniform distribution under local alternatives. The proof can be adapted to many other regression mod… ▽ More

    Submitted 30 December, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

    Comments: 46 pages, 14 figures

  35. arXiv:2305.17476  [pdf, other

    cs.LG stat.ML

    Toward Understanding Generative Data Augmentation

    Authors: Chenyu Zheng, Guoqiang Wu, Chongxuan Li

    Abstract: Generative data augmentation, which scales datasets by obtaining fake labeled examples from a trained conditional generative model, boosts classification performance in various learning tasks including (semi-)supervised learning, few-shot learning, and adversarially robust learning. However, little work has theoretically investigated the effect of generative data augmentation. To fill this gap, we… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: 39 pages

  36. arXiv:2305.06172  [pdf, other

    stat.CO math.PR math.ST

    Principal Feature Detection via $Φ$-Sobolev Inequalities

    Authors: Matthew T. C. Li, Youssef Marzouk, Olivier Zahm

    Abstract: We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the refe… ▽ More

    Submitted 16 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: To appear in Bernoulli, but this version contains both the main file and the supplementary material

  37. arXiv:2305.05248  [pdf, ps, other

    cs.LG stat.ML

    Towards Understanding Generalization of Macro-AUC in Multi-label Learning

    Authors: Guoqiang Wu, Chongxuan Li, Yilong Yin

    Abstract: Macro-AUC is the arithmetic mean of the class-wise AUCs in multi-label learning and is commonly used in practice. However, its theoretical understanding is far lacking. Toward solving it, we characterize the generalization properties of various learning algorithms based on the corresponding surrogate losses w.r.t. Macro-AUC. We theoretically identify a critical factor of the dataset affecting the… ▽ More

    Submitted 2 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  38. arXiv:2305.01143  [pdf, other

    stat.ML cs.LG

    Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective

    Authors: Yuxin Dong, Tieliang Gong, Hong Chen, Chen Li

    Abstract: Recently, information theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient/Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far fr… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  39. arXiv:2303.09519  [pdf

    stat.ML cs.LG stat.CO stat.ME

    PyVBMC: Efficient Bayesian inference in Python

    Authors: Bobby Huggins, Chengkun Li, Marlon Tobaben, Mikko J. Aarnos, Luigi Acerbi

    Abstract: PyVBMC is a Python implementation of the Variational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference for black-box computational models (Acerbi, 2018, 2020). VBMC is an approximate inference method designed for efficient parameter estimation and model assessment when model evaluations are mildly-to-very expensive (e.g., a second or more) and/or noisy. Specifically, VBMC com… ▽ More

    Submitted 27 June, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: 6 pages, 1 figure. Published in The Journal of Open Source Software. Documentation is available at https://acerbilab.github.io/pyvbmc and source code is available at https://github.com/acerbilab/pyvbmc

    Journal ref: The Journal of Open Source Software, 8(86), 2023, 5428

  40. arXiv:2303.06815  [pdf, other

    cs.LG stat.ML

    On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee

    Authors: Chenyang Li, Jihoon Chung, Biao Cai, Haimin Wang, Xianlian Zhou, Bo Shen

    Abstract: Model compression is a crucial part of deploying neural networks (NNs), especially when the memory and storage of computing devices are limited in many applications. This paper focuses on two model compression techniques: low-rank approximation and weight pruning in neural networks, which are very popular nowadays. However, training NN with low-rank approximation and weight pruning always suffers… ▽ More

    Submitted 4 January, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: 43 pages

  41. arXiv:2303.05263  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature

    Authors: Chengkun Li, Grégoire Clarté, Martin Jørgensen, Luigi Acerbi

    Abstract: In applied Bayesian inference scenarios, users may have access to a large number of pre-existing model evaluations, for example from maximum-a-posteriori (MAP) optimization runs. However, traditional approximate inference techniques make little to no use of this available information. We propose the framework of post-process Bayesian inference as a means to obtain a quick posterior approximation f… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: 52 pages, 14 figures

  42. Rank Intraclass Correlation for Clustered Data

    Authors: Shengxin Tu, Chun Li, Donglin Zeng, Bryan E. Shepherd

    Abstract: Clustered data are common in biomedical research. Observations in the same cluster are often more similar to each other than to observations from other clusters. The intraclass correlation coefficient (ICC), first introduced by R. A. Fisher, is frequently used to measure this degree of similarity. However, the ICC is sensitive to extreme values and skewed distributions, and depends on the scale of… ▽ More

    Submitted 28 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Journal ref: Statistics in Medicine. 2023; 42(24): 4333-4348

  43. arXiv:2302.08097  [pdf, ps, other

    math.ST econ.EM stat.ME stat.ML

    New $\sqrt{n}$-consistent, numerically stable higher-order influence function estimators

    Authors: Lin Liu, Chang Li

    Abstract: Higher-Order Influence Functions (HOIFs) provide a unified theory for constructing rate-optimal estimators for a large class of low-dimensional (smooth) statistical functionals/parameters (and sometimes even infinite-dimensional functions) that arise in substantive fields including epidemiology, economics, and the social sciences. Since the introduction of HOIFs by Robins et al. (2008), they have… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  44. Nonlinear Causal Discovery with Confounders

    Authors: Chunlin Li, Xiaotong Shen, Wei Pan

    Abstract: This article introduces a causal discovery method to learn nonlinear relationships in a directed acyclic graph with correlated Gaussian errors due to confounding. First, we derive model identifiability under the sublinear growth assumption. Then, we propose a novel method, named the Deconfounded Functional Structure Estimation (DeFuSE), consisting of a deconfounding adjustment to remove the confou… ▽ More

    Submitted 13 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: 28 pages, 4 figures, 3 tables

    Journal ref: Journal of the American Statistical Association, 2023

  45. arXiv:2302.02334  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

    Authors: Chenyu Zheng, Guoqiang Wu, Fan Bao, Yue Cao, Chongxuan Li, Jun Zhu

    Abstract: A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the sta… ▽ More

    Submitted 29 May, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted by ICML 2023, 58 pages

  46. arXiv:2301.04857  [pdf, other

    cs.AI stat.ME

    Neural Spline Search for Quantile Probabilistic Modeling

    Authors: Ruoxi Sun, Chun-Liang Li, Sercan O. Arik, Michael W. Dusenberry, Chen-Yu Lee, Tomas Pfister

    Abstract: Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility. Modeling target distribution at arbitrary quantile levels and at arbitrary input attribute levels are important to offer a comprehensive picture of the data, and requires the quantile function to be expressive enough. The quantile function describing the target distribution… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  47. arXiv:2212.09126  [pdf, other

    stat.CO

    Pigeonhole Stochastic Gradient Langevin Dynamics for Large Crossed Mixed Effects Models

    Authors: Xinyu Zhang, Cheng Li

    Abstract: Large crossed mixed effects models with imbalanced structures and missing data pose major computational challenges for standard Bayesian posterior sampling algorithms, as the computational complexity is usually superlinear in the number of observations. We propose two efficient subset-based stochastic gradient MCMC algorithms for such crossed mixed effects model, which facilitate scalable inferenc… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  48. arXiv:2212.03122  [pdf, ps, other

    stat.ME stat.CO stat.ML

    Robust convex biclustering with a tuning-free method

    Authors: Yifan Chen, Chunyin Lei, Chuanquan Li, Haiqiang Ma, Ningyuan Hu

    Abstract: Biclustering is widely used in different kinds of fields including gene information analysis, text mining, and recommendation system by effectively discovering the local correlation between samples and features. However, many biclustering algorithms will collapse when facing heavy-tailed data. In this paper, we propose a robust version of convex biclustering algorithm with Huber loss. Yet, the new… ▽ More

    Submitted 6 October, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: 17 pages, 4 figures

  49. arXiv:2211.14752  [pdf, other

    cs.LG cs.NE stat.ML

    Differentiable Meta Multigraph Search with Partial Message Propagation on Heterogeneous Information Networks

    Authors: Chao Li, Hao Xu, Kun He

    Abstract: Heterogeneous information networks (HINs) are widely employed for describing real-world data with intricate entities and relationships. To automatically utilize their semantic information, graph neural architecture search has recently been developed on various tasks of HINs. Existing works, on the other hand, show weaknesses in instability and inflexibility. To address these issues, we propose a n… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: 12 pages, 7 figures, 8 tables, accepted by AAAI 2023 conference

  50. arXiv:2211.14692  [pdf, other

    math.ST stat.ME

    Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes

    Authors: Yichen Zhu, Michele Peruzzi, Cheng Li, David B. Dunson

    Abstract: In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable $O(n)$ computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empiri… ▽ More

    Submitted 20 June, 2024; v1 submitted 26 November, 2022; originally announced November 2022.