Skip to main content

Showing 1–50 of 222 results for author: Huang, J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.01015  [pdf, other

    stat.ML cs.LG

    Bayesian Entropy Neural Networks for Physics-Aware Prediction

    Authors: Rahul Rathnakumar, Jiayu Huang, Hao Yan, Yongming Liu

    Abstract: This paper addresses the need for deep learning models to integrate well-defined constraints into their outputs, driven by their application in surrogate models, learning with limited data and partial information, and scenarios requiring flexible model behavior to incorporate non-data sample information. We introduce Bayesian Entropy Neural Networks (BENN), a framework grounded in Maximum Entropy… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages

    ACM Class: I.5.1

  2. arXiv:2406.13197  [pdf, other

    stat.ME

    Representation Transfer Learning for Semiparametric Regression

    Authors: Baihua He, Huihang Liu, Xinyu Zhang, Jian Huang

    Abstract: We propose a transfer learning method that utilizes data representations in a semiparametric regression model. Our aim is to perform statistical inference on the parameter of primary interest in the target model while accounting for potential nonlinear effects of confounding variables. We leverage knowledge from source domains, assuming that the sample size of the source data is substantially larg… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 42 pages, 11 figures, 5 tables

    MSC Class: 62F99

  3. arXiv:2406.07525  [pdf

    econ.GN stat.AP

    Will Southeast Asia be the next global manufacturing hub? A multiway cointegration, causality, and dynamic connectedness analyses on factors influencing offshore decisions

    Authors: Haibo Wang, Lutfu S. Sua, Jun Huang, Jaime Ortiz, Bahram Alidaee

    Abstract: The COVID-19 pandemic has compelled multinational corporations to diversify their global supply chain risk and to relocate their factories to Southeast Asian countries beyond China. Such recent phenomena provide a good opportunity to understand the factors that influenced offshore decisions in the last two decades. We propose a new conceptual framework based on econometric approaches to examine th… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 30 pages

  4. arXiv:2406.03683  [pdf, other

    cs.LG stat.ML

    Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

    Authors: Ding Huang, Ting Li, Jian Huang

    Abstract: We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowle… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 25 pages, 26 figures, and 4 tables

    MSC Class: 62G05; 68T07

  5. arXiv:2405.18284  [pdf, other

    stat.ML cs.LG

    Adaptive debiased SGD in high-dimensional GLMs with streaming data

    Authors: Ruijian Han, Lan Luo, Yuanhang Luo, Yuanyuan Lin, Jian Huang

    Abstract: Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing… ▽ More

    Submitted 1 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 37 pages, 4 figures

  6. arXiv:2404.15760  [pdf, other

    cs.LG cs.AI stat.ML

    Debiasing Machine Unlearning with Counterfactual Examples

    Authors: Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, ** Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei

    Abstract: The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  7. arXiv:2404.00551  [pdf, other

    stat.ML cs.LG

    Convergence of Continuous Normalizing Flows for Learning Probability Distributions

    Authors: Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

    Abstract: Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear inter… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 60 pages, 3 tables, and 3 figures

    MSC Class: 62G05; 68T07

  8. arXiv:2403.16283  [pdf, other

    stat.ME

    Sample Empirical Likelihood Methods for Causal Inference

    Authors: **gyue Huang, Changbao Wu, Leilei Zeng

    Abstract: Causal inference is crucial for understanding the true impact of interventions, policies, or actions, enabling informed decision-making and providing insights into the underlying mechanisms that shape our world. In this paper, we establish a framework for the estimation and inference of average treatment effects using a two-sample empirical likelihood function. Two different approaches to incorpor… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  9. arXiv:2403.12367  [pdf, other

    stat.ML cs.LG stat.ME

    Semisupervised score based matching algorithm to evaluate the effect of public health interventions

    Authors: Hongzhe Zhang, Jiasheng Shi, **g Huang

    Abstract: Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  10. arXiv:2403.12243  [pdf, other

    stat.ME

    Time-Since-Infection Model for Hospitalization and Incidence Data

    Authors: Jiasheng Shi, Yizhao Zhou, **g Huang

    Abstract: The Time Since Infection (TSI) models, which use disease surveillance data to model infectious diseases, have become increasingly popular recently due to their flexibility and capacity to address complex disease control questions. However, a notable limitation of TSI models is their primary reliance on incidence data. Even when hospitalization data are available, existing TSI models have not been… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  11. arXiv:2402.16661  [pdf, other

    stat.ML cs.LG stat.ME

    Penalized Generative Variable Selection

    Authors: Tong Wang, Jian Huang, Shuangge Ma

    Abstract: Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using t… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  12. arXiv:2402.16158  [pdf, other

    stat.ML cs.CY cs.LG

    Distribution-Free Fair Federated Learning with Small Samples

    Authors: Qichuan Yin, Junzhou Huang, Huaxiu Yao, Linjun Zhang

    Abstract: As federated learning gains increasing importance in real-world applications due to its capacity for decentralized data training, addressing fairness concerns across demographic groups becomes critically important. However, most existing machine learning algorithms for ensuring fairness are designed for centralized data environments and generally require large-sample and distributional assumptions… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  13. arXiv:2402.05724  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

    Authors: Jiawei Huang, Niao He, Andreas Krause

    Abstract: We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity. Notably, P-MBED measures the complexity of the single-agent model cl… ▽ More

    Submitted 3 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: ICML 2024; 55 Pages

  14. arXiv:2402.05438  [pdf, other

    math.ST stat.ME

    Penalized spline estimation of principal components for sparse functional data: rates of convergence

    Authors: Shiyuan He, Jianhua Z. Huang, Kejun He

    Abstract: This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functi… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  15. arXiv:2401.06919  [pdf, other

    stat.ME

    Pseudo-Empirical Likelihood Methods for Causal Inference

    Authors: **gyue Huang, Changbao Wu, Leilei Zeng

    Abstract: Causal inference problems have remained an important research topic over the past several decades due to their general applicability in assessing a treatment effect in many different real-world settings. In this paper, we propose two inferential procedures on the average treatment effect (ATE) through a two-sample pseudo-empirical likelihood (PEL) approach. The first procedure uses the estimated p… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  16. arXiv:2312.05579  [pdf, other

    stat.ML cs.LG

    Conditional Stochastic Interpolation for Generative Learning

    Authors: Ding Huang, Jian Huang, Ting Li, Guohao Shen

    Abstract: We propose a conditional stochastic interpolation (CSI) approach to learning conditional distributions. CSI learns probability flow equations or stochastic differential equations that transport a reference distribution to the target conditional distribution. This is achieved by first learning the drift function and the conditional score function based on conditional stochastic interpolation, which… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 44 pages, 4 figures

  17. arXiv:2312.04464  [pdf, other

    cs.LG stat.ML

    Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

    Authors: Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

    Abstract: To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon. The derived regret bound is deemed \emph{sharp}, as it matches the minimax lower bound when specialize… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  18. arXiv:2312.00963  [pdf, other

    cs.LG stat.ME

    Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

    Authors: Kehui Yao, **gyi Huang, Jun Zhu

    Abstract: Effective management of environmental resources and agricultural sustainability heavily depends on accurate soil moisture data. However, datasets like the SMAP/Sentinel-1 soil moisture product often contain missing values across their spatiotemporal grid, which poses a significant challenge. This paper introduces a novel Spatiotemporal Transformer model (ST-Transformer) specifically designed to ad… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  19. arXiv:2311.11475  [pdf, other

    stat.ML cs.LG

    Gaussian Interpolation Flows

    Authors: Yuan Gao, Jian Huang, Yuling Jiao

    Abstract: Gaussian denoising has emerged as a powerful principle for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free conti… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 49 pages, 4 figures

  20. arXiv:2311.06945  [pdf, other

    stat.ME

    An Efficient Approach for Identifying Important Biomarkers for Biomedical Diagnosis

    Authors: **g-Wen Huang, Yan-Hong Chen, Frederick Kin Hing Phoa, Yan-Han Lin, Shau-** Lin

    Abstract: In this paper, we explore the challenges associated with biomarker identification for diagnosis purpose in biomedical experiments, and propose a novel approach to handle the above challenging scenario via the generalization of the Dantzig selector. To improve the efficiency of the regularization method, we introduce a transformation from an inherent nonlinear programming due to its nonlinear link… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  21. arXiv:2310.15026  [pdf, other

    stat.ML cs.LG hep-ex nucl-ex

    Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data

    Authors: Yi Huang, Yihui Ren, Shinjae Yoo, ** Huang

    Abstract: High-energy large-scale particle colliders produce data at high speed in the order of 1 terabytes per second in nuclear physics and petabytes per second in high-energy physics. Develo** real-time data compression algorithms to reduce such data at high throughput to fit permanent storage has drawn increasing attention. Specifically, at the newly constructed sPHENIX experiment at the Relativistic… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  22. arXiv:2310.03597  [pdf, other

    stat.ML cs.LG math.DS math.NA

    Sampling via Gradient Flows in the Space of Probability Measures

    Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

    Abstract: Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design com… ▽ More

    Submitted 9 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Related and text overlap with arXiv:2302.11024

  23. arXiv:2310.03010  [pdf, other

    cs.LG math.PR stat.ML

    High-dimensional SGD aligns with emerging outlier eigenspaces

    Authors: Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

    Abstract: We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient m… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 52 pages, 12 figures

  24. arXiv:2309.00771  [pdf, ps, other

    stat.ML cs.LG

    Non-Asymptotic Bounds for Adversarial Excess Risk under Misspecified Models

    Authors: Changyu Liu, Yuling Jiao, Junhui Wang, Jian Huang

    Abstract: We propose a general approach to evaluating the performance of robust estimators based on adversarial losses under misspecified models. We first show that adversarial risk is equivalent to the risk induced by a distributional adversarial attack under certain smoothness conditions. This ensures that the adversarial training procedure is well-defined. To evaluate the generalization performance of th… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: 27 pages, 3 tables

    MSC Class: 62G05; 62G08; 68T07

  25. arXiv:2308.04246  [pdf, other

    stat.AP

    Spectrally-Corrected and Regularized Global Minimum Variance Portfolio for Spiked Model

    Authors: Hua Li, Jiafu Huang

    Abstract: Considering the shortcomings of the traditional sample covariance matrix estimation, this paper proposes an improved global minimum variance portfolio model and named spectral corrected and regularized global minimum variance portfolio (SCRGMVP), which is better than the traditional risk model. The key of this method is that under the assumption that the population covariance matrix follows the sp… ▽ More

    Submitted 29 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

  26. arXiv:2306.15163  [pdf, other

    stat.ML cs.LG

    Wasserstein Generative Regression

    Authors: Shanshan Song, Tong Wang, Guohao Shen, Yuanyuan Lin, Jian Huang

    Abstract: In this paper, we propose a new and unified approach for nonparametric regression and conditional distribution learning. Our approach simultaneously estimates a regression function and a conditional generator using a generative learning framework, where a conditional generator is a function that can generate samples from a conditional distribution. The main idea is to estimate a conditional genera… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 50 pages, including appendix. 5 figures and 6 tables in the main text. 1 figure and 7 tables in the appendix

    MSC Class: 62G08; 68T07

  27. arXiv:2306.06836  [pdf, other

    cs.LG cs.AI stat.ML

    Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

    Authors: Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

    Abstract: While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite $(1+ε)$-th moments for some $ε\in(0,1]$. In this work, we address the challenge of such r… ▽ More

    Submitted 7 March, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  28. arXiv:2305.11283  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

    Authors: Jiawei Huang, Batuhan Yardim, Niao He

    Abstract: In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MF-MBED), which characterizes the inherent complexity of mean-field model classes. We show that low MF-MBED subsumes a rich family of… ▽ More

    Submitted 13 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 38 Pages

  29. arXiv:2305.00608  [pdf, other

    stat.ML cs.LG

    Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression

    Authors: Guohao Shen, Yuling Jiao, Yuanyuan Lin, Jian Huang

    Abstract: We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating $C^s$ smooth funct… ▽ More

    Submitted 21 April, 2024; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: 78 pages, 20 figures, and 6 tables. arXiv admin note: text overlap with arXiv:2207.10442

    MSC Class: 68G05; 68G08; 68T07

  30. arXiv:2303.02840  [pdf, other

    stat.ME

    The conditionally studentized test for high-dimensional parametric regressions

    Authors: Feng Liang, Chuhan Wang, jiaqi Huang, Lixing Zhu

    Abstract: This paper studies model checking for general parametric regression models having no dimension reduction structures on the predictor vector. Using any U-statistic type test as an initial test, this paper combines the sample-splitting and conditional studentization approaches to construct a COnditionally Studentized Test (COST). Whether the initial test is global or local smoothing-based; the dimen… ▽ More

    Submitted 17 August, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: 35 pages, 2 figures

  31. arXiv:2302.12078  [pdf, ps, other

    stat.ME

    Estimating the Instantaneous Reproduction Number With Imperfect Data: A Method to Account for Case-Reporting Variation and Serial Interval Uncertainty

    Authors: Gary Hettinger, David Rubin, **g Huang

    Abstract: During an infectious disease outbreak, public health decision-makers require real-time monitoring of disease transmission to respond quickly and intelligently. In these settings, a key measure of transmission is the instantaneous time-varying reproduction number, $R_t$. Estimation of this number using a Time-Since-Infection model relies on case-notification data and the distribution of the serial… ▽ More

    Submitted 23 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

  32. arXiv:2302.11024  [pdf, other

    stat.ML math.NA

    Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance

    Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M. Stuart

    Abstract: Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of… ▽ More

    Submitted 2 November, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: 82 pages, 8 figures (Welcome any feedback!)

  33. arXiv:2302.05534  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Knowledge Transfer in Tiered Reinforcement Learning

    Authors: Jiawei Huang, Niao He

    Abstract: In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or rewar… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: 47 Pages; 1 Figure; NeurIPS 2023

  34. arXiv:2302.00973  [pdf, other

    stat.ML cs.LG

    A Light-weight CNN Model for Efficient Parkinson's Disease Diagnostics

    Authors: Xuechao Wang, Junqing Huang, Marianna Chatzakou, Kadri Medijainen, Pille Taba, Aaro Toomela, Sven Nomm, Michael Ruzhansky

    Abstract: In recent years, deep learning methods have achieved great success in various fields due to their strong performance in practical applications. In this paper, we present a light-weight neural network for Parkinson's disease diagnostics, in which a series of hand-drawn data are collected to distinguish Parkinson's disease patients from healthy control subjects. The proposed model consists of a conv… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  35. arXiv:2302.00848  [pdf, other

    cs.LG stat.ME stat.ML

    Causal Effect Estimation: Recent Advances, Challenges, and Opportunities

    Authors: Zhixuan Chu, Jianmin Huang, Ruopeng Li, Wei Chu, Sheng Li

    Abstract: Causal inference has numerous real-world applications in many domains, such as health care, marketing, political science, and online advertising. Treatment effect estimation, a fundamental problem in causal inference, has been extensively studied in statistics for decades. However, traditional treatment effect estimation methods may not well handle large-scale and high-dimensional heterogeneous da… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  36. arXiv:2212.08282  [pdf, other

    stat.ME stat.AP

    Early-Phase Local-Area Model for Pandemics Using Limited Data: A SARS-CoV-2 Application

    Authors: Jiasheng Shi, Jeffrey S. Morris, David M. Rubin, **g Huang

    Abstract: The emergence of novel infectious agents presents challenges to statistical models of disease transmission. These challenges arise from limited, poor-quality data and an incomplete understanding of the agent. Moreover, outbreaks manifest differently across regions due to various factors, making it imperative for models to factor in regional specifics. In this work, we offer a model that effectivel… ▽ More

    Submitted 18 March, 2024; v1 submitted 16 December, 2022; originally announced December 2022.

  37. arXiv:2212.06332  [pdf, other

    stat.AP

    Evaluating Airline Service Quality Through the Comprehensive Text-mining and TOPSIS-VIKOR-AISM Analysis

    Authors: Haotian Xie, Yi Li, Yang Pu, Chen Zhang, Junlin Huang

    Abstract: Service quality rankings are pivotal for maintaining sustainability in the fiercely competitive airline industry. However, prior research in this domain has often fallen short in aspects of sample size, efficiency, and dependability. This study introduces refined insights into this area and establishes a comprehensive, yet highly elucidative, ranking framework. Initially, we employ Latent Semantic… ▽ More

    Submitted 11 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

  38. arXiv:2211.14691  [pdf, other

    stat.ME physics.soc-ph stat.CO

    Detecting Changes in the Transmission Rate of a Stochastic Epidemic Model

    Authors: Jenny Huang, Raphaël Morsomme, David Dunson, Jason Xu

    Abstract: Throughout the course of an epidemic, the rate at which disease spreads varies with behavioral changes, the emergence of new disease variants, and the introduction of mitigation policies. Estimating such changes in transmission rates can help us better model and predict the dynamics of an epidemic, and provide insight into the efficacy of control and intervention strategies. We present a method fo… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

  39. arXiv:2211.14578  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Estimation and inference for transfer learning with high-dimensional quantile regression

    Authors: Jiayu Huang, Mingqiu Wang, Yuanshan Wu

    Abstract: Transfer learning has become an essential technique to exploit information from the source domain to boost performance of the target task. Despite the prevalence in high-dimensional data, heterogeneity and heavy tails are insufficiently accounted for by current transfer learning approaches and thus may undermine the resulting performance. We propose a transfer learning procedure in the framework o… ▽ More

    Submitted 5 November, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: 124 pages

  40. arXiv:2210.10161  [pdf, other

    stat.ML cs.LG math.ST

    Nonparametric Quantile Regression: Non-Crossing Constraints and Conformal Prediction

    Authors: Wenlu Tang, Guohao Shen, Yuanyuan Lin, Jian Huang

    Abstract: We propose a nonparametric quantile regression method using deep neural networks with a rectified linear unit penalty function to avoid quantile crossing. This penalty function is computationally feasible for enforcing non-crossing constraints in multi-dimensional nonparametric quantile regression. We establish non-asymptotic upper bounds for the excess risk of the proposed nonparametric quantile… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 8 figures, 3 tables

    MSC Class: 62G08; 62G20

  41. arXiv:2210.00937  [pdf, ps, other

    stat.ME

    Inference on High-dimensional Single-index Models with Streaming Data

    Authors: Dongxiao Han, **han Xie, ** Liu, Liuquan Sun, Jian Huang, Bei Jian, Linglong Kong

    Abstract: Traditional statistical methods are faced with new challenges due to streaming data. The major challenge is the rapidly growing volume and velocity of data, which makes storing such huge datasets in memory impossible. The paper presents an online inference framework for regression parameters in high-dimensional semiparametric single-index models with unknown link functions. The proposed online pro… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: 38 pages, 2 figures

  42. arXiv:2209.10642  [pdf

    physics.soc-ph cs.DL stat.AP

    Caught in the Crossfire: Fears of Chinese-American Scientists

    Authors: Yu Xie, Xihong Lin, Ju Li, Qian He, Junming Huang

    Abstract: The US leadership in science and technology has greatly benefitted from immigrants from other countries, most notably from China in the recent decades. However, feeling the pressure of potential federal investigation since the 2018 launch of the China Initiative under the Trump administration, Chinese-origin scientists in the US now face higher incentives to leave the US and lower incentives to ap… ▽ More

    Submitted 23 September, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 16 pages, 2 figures

    ACM Class: J.4

  43. arXiv:2207.10772  [pdf, other

    stat.ML cs.LG

    Deep Sufficient Representation Learning via Mutual Information

    Authors: Siming Zheng, Yuanyuan Lin, Jian Huang

    Abstract: We propose a mutual information-based sufficient representation learning (MSRL) approach, which uses the variational formulation of the mutual information and leverages the approximation power of deep neural networks. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. It can easily handle multi-dimensional continuous or c… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: 43 pages, 6 figures and 5 tables

    MSC Class: 62G05; 68T07

  44. arXiv:2207.10442  [pdf, other

    stat.ML cs.LG

    Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks

    Authors: Guohao Shen, Yuling Jiao, Yuanyuan Lin, Joel L. Horowitz, Jian Huang

    Abstract: We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using rectifier quadratic unit (ReQU) activated deep neural networks and introduce a novel penalty function to enforce non-crossing of quantile regression curves. We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared err… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: 44 pages, 10 figures, 6 tables

    MSC Class: 62G05; 62G08; 68T07

  45. arXiv:2206.13497  [pdf, other

    cs.LG cs.AI cs.CV math.PR stat.ML

    Robustness Implies Generalization via Data-Dependent Generalization Bounds

    Authors: Kenji Kawaguchi, Zhun Deng, Kyle Luh, Jiaoyang Huang

    Abstract: This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second… ▽ More

    Submitted 3 August, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted by ICML 2022, and selected for ICML long presentation (top 2% of submissions)

  46. arXiv:2206.13004  [pdf, other

    stat.ME

    Multiple change point detection in tensors

    Authors: Jiaqi Huang, Junhui Wang, Xuehu Zhu, Lixing Zhu

    Abstract: This paper proposes a criterion for detecting change structures in tensor data. To accommodate tensor structure with structural mode that is not suitable to be equally treated and summarized in a distance to measure the difference between any two adjacent tensors, we define a mode-based signal-screening Frobenius distance for the moving sums of slices of tensor data to handle both dense and sparse… ▽ More

    Submitted 18 March, 2023; v1 submitted 26 June, 2022; originally announced June 2022.

  47. arXiv:2205.12418  [pdf, other

    cs.LG cs.AI stat.ML

    Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

    Authors: Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang, Tie-Yan Liu

    Abstract: We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $π^{\text{O}}$ and $π^{\text{E}}$: $π^{\text{O}}$ ("O" for "online") interacts with m… ▽ More

    Submitted 26 February, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: 38 pages; NeurIPS 2022

  48. arXiv:2204.07742  [pdf, other

    cs.LG cs.DC stat.ML

    DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup

    Authors: Bingzhe Wu, Zhipeng Liang, Yuxuan Han, Yatao Bian, Peilin Zhao, Junzhou Huang

    Abstract: Recently, federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. Nevertheless, directly applying federated learning to real-world tasks faces two challenges: (1) heterogeneity in the data among different organizations; and (2) data noises inside individual organizations. In this paper, we propose a… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

  49. arXiv:2203.14860  [pdf, other

    cs.LG stat.ML

    Time-inhomogeneous diffusion geometry and topology

    Authors: Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy

    Abstract: Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator t… ▽ More

    Submitted 5 January, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

  50. arXiv:2203.00229  [pdf, other

    stat.ME stat.AP

    Fitting a stochastic model of intensive care occupancy to noisy hospitalization time series during the COVID-19 pandemic

    Authors: Achal Awasthi, Volodymyr M. Minin, Jenny Huang, Daniel Chow, Jason Xu

    Abstract: Intensive care occupancy is an important indicator of health care stress that has been used to guide policy decisions during the COVID-19 pandemic. Toward reliable decision-making as a pandemic progresses, estimating the rates at which patients are admitted to and discharged from hospitals and intensive care units (ICUs) is crucial. Since individual-level hospital data are rarely available to mode… ▽ More

    Submitted 17 July, 2023; v1 submitted 28 February, 2022; originally announced March 2022.

    Comments: 26 pages, 8 Figures and 5 Tables; data and code to reproduce the simulation study are made available at the authors' webpages