Skip to main content

Showing 1–50 of 113 results for author: Gao, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.10576  [pdf, other

    cs.LG cs.CL stat.ML

    Optimization-based Structural Pruning for Large Language Models without Back-Propagation

    Authors: Yuan Gao, Zu**g Liu, Weizhong Zhang, Bo Du, Gui-Song Xia

    Abstract: Compared to the moderate size of neural network models, structural weight pruning on the Large-Language Models (LLMs) imposes a novel challenge on the efficiency of the pruning algorithms, due to the heavy computation/memory demands of the LLMs. Recent efficient LLM pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages

  2. arXiv:2406.04107  [pdf

    stat.AP

    A Practical Analysis Procedure on Generalizing Comparative Effectiveness in the Randomized Clinical Trial to the Real-world Trialeligible Population

    Authors: Kuan Jiang, Xin-xing Lai, Shu Yang, Ying Gao, Xiao-Hua Zhou

    Abstract: When evaluating the effectiveness of a drug, a Randomized Controlled Trial (RCT) is often considered the gold standard due to its perfect randomization. While RCT assures strong internal validity, its restricted external validity poses challenges in extending treatment effects to the broader real-world population due to possible heterogeneity in covariates. In this paper, we introduce a procedure… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 21 pages, 3 figures, 3tables

  3. arXiv:2405.20114  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Near Optimal Decentralized Optimization with Compression and Momentum Tracking

    Authors: Rustem Islamov, Yuan Gao, Sebastian U. Stich

    Abstract: Communication efficiency has garnered significant attention as it is considered the main bottleneck for large-scale decentralized Machine Learning applications in distributed and federated settings. In this regime, clients are restricted to transmitting small amounts of quantized information to their neighbors over a communication graph. Numerous endeavors have been made to address this challengin… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.06003  [pdf, ps, other

    stat.ML cs.LG

    Binary Hypothesis Testing for Softmax Models and Leverage Score Models

    Authors: Yeqi Gao, Yuzhou Gu, Zhao Song

    Abstract: Softmax distributions are widely used in machine learning, including Large Language Models (LLMs) where the attention unit uses softmax distributions. We abstract the attention unit as the softmax model, where given a vector input, the model produces an output drawn from the softmax distribution (which depends on the vector input). We consider the fundamental problem of binary hypothesis testing i… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.05695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

    Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, **-Gang Yu, Gui-Song Xia, Jiayi Ma

    Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  6. arXiv:2404.08457  [pdf, other

    stat.ME

    A Latent Factor Model for High-Dimensional Binary Data

    Authors: Jiaxin Shi, Yuan Gao, Rui Pan, Hansheng Wang

    Abstract: In this study, we develop a latent factor model for analysing high-dimensional binary data. Specifically, a standard probit model is used to describe the regression relationship between the observed binary data and the continuous latent variables. Our method assumes that the dependency structure of the observed binary data can be fully captured by the continuous latent factors. To estimate the mod… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  7. arXiv:2404.00551  [pdf, other

    stat.ML cs.LG

    Convergence of Continuous Normalizing Flows for Learning Probability Distributions

    Authors: Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

    Abstract: Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear inter… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 60 pages, 3 tables, and 3 figures

    MSC Class: 62G05; 68T07

  8. arXiv:2403.11163  [pdf, ps, other

    stat.ME cs.LG math.ST stat.CO

    A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

    Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, **g Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

    Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  9. arXiv:2402.08871  [pdf, other

    cs.LG stat.ML

    Position: Topological Deep Learning is the New Frontier for Relational Learning

    Authors: Theodore Papamarkou, Tolga Birdal, Michael Bronstein, Gunnar Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Liò, Paolo Di Lorenzo, Vasileios Maroulas, Nina Miolane, Farzana Nasrin, Karthikeyan Natesan Ramamurthy, Bastian Rieck, Simone Scardapane, Michael T. Schaub, Petar Veličković, Bei Wang, Yusu Wang, Guo-Wei Wei, Ghada Zamzmi

    Abstract: Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning setting… ▽ More

    Submitted 30 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  10. arXiv:2312.17122  [pdf, other

    cs.CL cs.AI stat.ML

    Large Language Model for Causal Decision Making

    Authors: Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song

    Abstract: Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to perform inference based on user-specified structured data and knowledge in corpus-rare concepts, such as causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  11. arXiv:2311.11475  [pdf, other

    stat.ML cs.LG

    Gaussian Interpolation Flows

    Authors: Yuan Gao, Jian Huang, Yuling Jiao

    Abstract: Gaussian denoising has emerged as a powerful principle for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free conti… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 49 pages, 4 figures

  12. arXiv:2311.07906  [pdf, other

    stat.ME

    Mixture Conditional Regression with Ultrahigh Dimensional Text Data for Estimating Extralegal Factor Effects

    Authors: Jiaxin Shi, Fang Wang, Yuan Gao, Xiaojun Song, Hansheng Wang

    Abstract: Testing judicial impartiality is a problem of fundamental importance in empirical legal studies, for which standard regression methods have been popularly used to estimate the extralegal factor effects. However, those methods cannot handle control variables with ultrahigh dimensionality, such as found in judgment documents recorded in text format. To solve this problem, we develop a novel mixture… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  13. arXiv:2311.05645  [pdf, other

    math.OC cs.LG stat.ML

    EControl: Fast Distributed Optimization with Compression and Error Control

    Authors: Yuan Gao, Rustem Islamov, Sebastian Stich

    Abstract: Modern distributed training relies heavily on communication compression to reduce the communication overhead. In this work, we study algorithms employing a popular class of contractive compressors in order to reduce communication overhead. However, the naive implementation often leads to unstable convergence or even exponential divergence due to the compression bias. Error Compensation (EC) is an… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  14. arXiv:2309.07418  [pdf, other

    cs.DS cs.LG stat.ML

    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

    Authors: Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin

    Abstract: Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  15. arXiv:2308.10502  [pdf, other

    cs.LG cs.CL stat.ML

    GradientCoin: A Peer-to-Peer Decentralized Large Language Models

    Authors: Yeqi Gao, Zhao Song, Junze Yin

    Abstract: Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally changed the economic system over the last decade. Since 2022, large language models (LLMs) such as GPT have outperformed humans in many real-life tasks. However, these large language models have several practical issues. For example, the model is centralized and controlled by a specific unit. One weakness… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  16. arXiv:2306.06582  [pdf, other

    stat.ML cs.LG

    Fast, Distribution-free Predictive Inference for Neural Networks with Coverage Guarantees

    Authors: Yue Gao, Garvesh Raskutti, Rebecca Willet

    Abstract: This paper introduces a novel, computationally-efficient algorithm for predictive inference (PI) that requires no distributional assumptions on the data and can be computed faster than existing bootstrap-type methods for neural networks. Specifically, if there are $n$ training samples, bootstrap methods require training a model on each of the $n$ subsamples of size $n-1$; for large models like neu… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  17. arXiv:2305.00660  [pdf, ps, other

    cs.LG stat.ML

    An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

    Authors: Yeqi Gao, Zhao Song, Junze Yin

    Abstract: Large language models (LLMs) have numerous real-life applications across various domains, such as natural language translation, sentiment analysis, language modeling, chatbots and conversational agents, creative writing, text classification, summarization, and generation. LLMs have shown great promise in improving the accuracy and efficiency of these tasks, and have the potential to revolutionize… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  18. arXiv:2304.06278  [pdf, ps, other

    math.ST stat.CO

    On the asymptotic properties of a bagging estimator with a massive dataset

    Authors: Yuan Gao, Riquan Zhang, Hansheng Wang

    Abstract: Bagging is a useful method for large-scale statistical analysis, especially when the computing resources are very limited. We study here the asymptotic properties of bagging estimators for $M$-estimation problems but with massive datasets. We theoretically prove that the resulting estimator is consistent and asymptotically normal under appropriate conditions. The results show that the bagging esti… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Journal ref: Stat, 11(1), e485 (2022)

  19. A review of distributed statistical inference

    Authors: Yuan Gao, Weidong Liu, Hansheng Wang, Xiaozhou Wang, Yibo Yan, Riquan Zhang

    Abstract: The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of divide-and-conquer, various distributed frameworks for statistical estimation and inference have been proposed. They were developed to deal with large-scale statistical optim… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Journal ref: Statistical Theory and Related Fields, 6(2), 89-99 (2022)

  20. arXiv:2304.05636  [pdf, other

    stat.ME

    Testing Sufficiency for Transfer Learning

    Authors: Ziqian Lin, Yuan Gao, Feifei Wang, Hansheng Wang

    Abstract: Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes the target data based statistical estimation very difficult. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  21. arXiv:2303.16504  [pdf, ps, other

    cs.LG stat.ML

    An Over-parameterized Exponential Regression

    Authors: Yeqi Gao, Sridhar Mahadevan, Zhao Song

    Abstract: Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathema… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  22. arXiv:2301.00927  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Deep Spectral Q-learning with Application to Mobile Health

    Authors: Yuhe Gao, Chengchun Shi, Rui Song

    Abstract: Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  23. arXiv:2212.06353  [pdf, ps, other

    stat.ME math.NA

    Bayesian Arc Length Survival Analysis Model (BALSAM): Theory and Application to an HIV/AIDS Clinical Trial

    Authors: Yan Gao, Rodney A. Sparapani, Sanjib Basu

    Abstract: Stochastic volatility often implies increasing risks that are difficult to capture given the dynamic nature of real-world applications. We propose using arc length, a mathematical concept, to quantify cumulative variations (the total variability over time) to more fully characterize stochastic volatility. The hazard rate, as defined by the Cox proportional hazards model in survival analysis, is as… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  24. arXiv:2212.05814  [pdf, other

    cs.LG stat.ML

    GWRBoost:A geographically weighted gradient boosting method for explainable quantification of spatially-varying relationships

    Authors: Han Wang, Zhou Huang, Ganmin Yin, Yi Bao, Xiao Zhou, Yong Gao

    Abstract: The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inf… ▽ More

    Submitted 15 December, 2022; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: 13 pages, 8 figures, 4 tables

  25. arXiv:2210.16634  [pdf, other

    stat.CO

    Distributed Estimation and Inference for Spatial Autoregression Model with Large Scale Networks

    Authors: Yimeng Ren, Zhe Li, Xuening Zhu, Yuan Gao, Hansheng Wang

    Abstract: The rapid growth of online network platforms generates large-scale network data and it poses great challenges for statistical analysis using the spatial autoregression (SAR) model. In this work, we develop a novel distributed estimation and statistical inference framework for the SAR model on a distributed system. We first propose a distributed network least squares approximation (DNLSA) method. T… ▽ More

    Submitted 27 November, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

  26. arXiv:2210.09026  [pdf, other

    cs.LG stat.ML

    WILD-SCAV: Benchmarking FPS Gaming AI on Unity3D-based Environments

    Authors: Xi Chen, Tianyu Shi, Qingpeng Zhao, Yuchen Sun, Yunfei Gao, Xiangjun Wang

    Abstract: Recent advances in deep reinforcement learning (RL) have demonstrated complex decision-making capabilities in simulation environments such as Arcade Learning Environment, MuJoCo, and ViZDoom. However, they are hardly extensible to more complicated problems, mainly due to the lack of complexity and variations in the environments they are trained and tested on. Furthermore, they are not extensible t… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  27. arXiv:2210.00173  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Predictive Inference with Feature Conformal Prediction

    Authors: Jiaye Teng, Chuan Wen, Dinghuai Zhang, Yoshua Bengio, Yang Gao, Yang Yuan

    Abstract: Conformal prediction is a distribution-free technique for establishing valid prediction intervals. Although conventionally people conduct conformal prediction in the output space, this is not the only possibility. In this paper, we propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces by leveraging the inductive bias of deep representation… ▽ More

    Submitted 8 April, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: Published as a conference paper at ICLR 2023

  28. arXiv:2207.09097  [pdf, other

    stat.ML cs.LG

    Lazy Estimation of Variable Importance for Large Neural Networks

    Authors: Yue Gao, Abby Stevens, Rebecca Willet, Garvesh Raskutti

    Abstract: As opaque predictive models increasingly impact many areas of modern life, interest in quantifying the importance of a given input variable for making a specific prediction has grown. Recently, there has been a proliferation of model-agnostic methods to measure variable importance (VI) that analyze the difference in predictive power between a full model trained on all variables and a reduced model… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML'22

  29. arXiv:2204.11979  [pdf, other

    stat.ME

    Semi-Parametric Sensitivity Analysis for Trials with Irregular and Informative Assessment Times

    Authors: Bonnie B. Smith, Yu**g Gao, Shu Yang, Ravi Varadhan, Andrea J. Apter, Daniel O. Scharfstein

    Abstract: Many trials are designed to collect outcomes at or around pre-specified times after randomization. In practice, there can be substantial variability in the times when participants are actually assessed. Such irregular assessment times pose a challenge to learning the effect of treatment since not all participants have outcome assessments at the times of interest. Furthermore, observed outcome valu… ▽ More

    Submitted 5 November, 2023; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Revised to include more implementation details, including a tutorial implementing our estimator, and using a more flexible outcome modeling approach in the data analysis

  30. arXiv:2203.04602  [pdf, other

    stat.ME math.ST stat.AP stat.CO

    Factor-augmented model for functional data

    Authors: Yuan Gao, Han Lin Shang, Yanrong Yang

    Abstract: We propose modeling raw functional data as a mixture of a smooth function and a high-dimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is inadequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large am… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

    Comments: 88 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:2102.02580

    MSC Class: 62H25; 62R10

  31. arXiv:2203.01171  [pdf, other

    cs.RO stat.ML

    Imitation of Manipulation Skills Using Multiple Geometries

    Authors: Boyang Ti, Yongsheng Gao, Jie Zhao, Sylvain Calinon

    Abstract: Daily manipulation tasks are characterized by geometric primitives related to actions and object shapes. Such geometric descriptors are poorly represented by only using Cartesian coordinate systems. In this paper, we propose a learning approach to extract the optimal representation from a dictionary of coordinate systems to encode an observed movement/behavior. This is achieved by using an extensi… ▽ More

    Submitted 21 July, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  32. arXiv:2203.00784  [pdf, other

    stat.ME stat.AP stat.CO stat.ML

    Bayesian adaptive and interpretable functional regression for exposure profiles

    Authors: Yunan Gao, Daniel R. Kowal

    Abstract: Pollutant exposure during gestation is a known and adverse factor for birth and health outcomes. However, the links between prenatal air pollution exposures and educational outcomes are less clear, in particular the critical windows of susceptibility during pregnancy. Using a large cohort of students in North Carolina, we study the link between prenatal daily $\mbox{PM}_{2.5}$ exposure and 4th end… ▽ More

    Submitted 10 October, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: Main paper: 32 pages, 11 figures Supplementary materials: 10 pages, 5 figures

  33. arXiv:2202.12472  [pdf, ps, other

    cs.GT cs.AI cs.IR cs.LG stat.ML

    Bidding Agent Design in the LinkedIn Ad Marketplace

    Authors: Yuan Gao, Kaiyu Yang, Yuanlong Chen, Min Liu, Noureddine El Karoui

    Abstract: We establish a general optimization framework for the design of automated bidding agent in dynamic online marketplaces. It optimizes solely for the buyer's interest and is agnostic to the auction mechanism imposed by the seller. As a result, the framework allows, for instance, the joint optimization of a group of ads across multiple platforms each running its own auction format. Bidding strategy d… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  34. arXiv:2202.00187  [pdf, other

    stat.ML cs.LG

    Deep Reference Priors: What is the best way to pretrain a model?

    Authors: Yansong Gao, Rahul Ramesh, Pratik Chaudhari

    Abstract: What is the best way to exploit extra data -- be it unlabeled data from the same task, or labeled data from a related task -- to learn a given task? This paper formalizes the question using the theory of reference priors. Reference priors are objective, uninformative Bayesian priors that maximize the mutual information between the task and the weights of the model. Such priors enable the task to m… ▽ More

    Submitted 15 June, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: 24 pages

  35. arXiv:2111.01692  [pdf, other

    stat.ML cs.AI cs.LG eess.SP stat.AP

    Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

    Authors: Ali Hashemi, Yi**g Gao, Chang Cai, Sanjay Ghosh, Klaus-Robert Müller, Srikantan S. Nagarajan, Stefan Haufe

    Abstract: Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work eith… ▽ More

    Submitted 23 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: Accepted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  36. arXiv:2111.01507  [pdf, other

    stat.ME math.ST stat.CO

    An Asymptotic Analysis of Minibatch-Based Momentum Methods for Linear Regression Models

    Authors: Yuan Gao, Xuening Zhu, Haobo Qi, Guodong Li, Riquan Zhang, Hansheng Wang

    Abstract: Momentum methods have been shown to accelerate the convergence of the standard gradient descent algorithm in practice and theory. In particular, the minibatch-based gradient descent methods with momentum (MGDM) are widely used to solve large-scale optimization problems with massive datasets. Despite the success of the MGDM methods in practice, their theoretical properties are still underexplored.… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

    Comments: 45 pages, 5 figures

  37. arXiv:2110.02787  [pdf, other

    stat.ML cs.LG math.ST

    Relative Entropy Gradient Sampler for Unnormalized Distributions

    Authors: Xingdong Feng, Yuan Gao, Jian Huang, Yuling Jiao, Xu Liu

    Abstract: We propose a relative entropy gradient sampler (REGS) for sampling from unnormalized distributions. REGS is a particle method that seeks a sequence of simple nonlinear transforms iteratively pushing the initial samples from a reference distribution into the samples from an unnormalized target distribution. To determine the nonlinear transforms at each iteration, we consider the Wasserstein gradien… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

  38. arXiv:2107.03375  [pdf, other

    cs.LG cs.CV stat.ML

    Differentiable Architecture Pruning for Transfer Learning

    Authors: Nicolo Colombo, Yang Gao

    Abstract: We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup wher… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: 19 pages (main + appendix), 7 figures and 1 table, Workshop @ ICML 2021, 24th July 2021

  39. arXiv:2106.14630  [pdf, other

    stat.ML cs.LG stat.ME

    Improved Prediction and Network Estimation Using the Monotone Single Index Multi-variate Autoregressive Model

    Authors: Yue Gao, Garvesh Raskutti

    Abstract: Network estimation from multi-variate point process or time series data is a problem of fundamental importance. Prior work has focused on parametric approaches that require a known parametric model, which makes estimation procedures less robust to model mis-specification, non-linearities and heterogeneities. In this paper, we develop a semi-parametric approach based on the monotone single-index mu… ▽ More

    Submitted 28 June, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

  40. arXiv:2104.10507  [pdf, ps, other

    cs.CL cs.SD eess.AS stat.ML

    On Sampling-Based Training Criteria for Neural Language Modeling

    Authors: Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

    Abstract: As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated. The essence of these sampling methods is that the softmax-related traversal over the entire vocabulary can be simplified, giving speedups compared to the baseline. A problem we notice about the current landscape of such sampling methods is the lack o… ▽ More

    Submitted 17 June, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Accepted at INTERSPEECH 2021

  41. arXiv:2103.10060  [pdf, other

    cs.LG stat.ML

    Approximating Probability Distributions by using Wasserstein Generative Adversarial Networks

    Authors: Yihang Gao, Michael K. Ng, Mingjie Zhou

    Abstract: Studied here are Wasserstein generative adversarial networks (WGANs) with GroupSort neural networks as their discriminators. It is shown that the error bound of the approximation for the target distribution depends on the width and depth (capacity) of the generators and discriminators and the number of samples in training. A quantified generalization bound is established for the Wasserstein distan… ▽ More

    Submitted 29 June, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted by SIAM Journal on Mathematics of Data Science (SIMODS)

    MSC Class: 68Q32; 68T15; 68W40

  42. arXiv:2102.10003  [pdf, other

    stat.ME

    Treatment effect estimation with Multilevel Regression and Poststratification

    Authors: Yuxiang Gao, Lauren Kennedy, Daniel Simpson

    Abstract: Multilevel regression and poststratification (MRP) is a flexible modeling technique that has been used in a broad range of small-area estimation problems. Traditionally, MRP studies have been focused on non-causal settings, where estimating a single population value using a nonrepresentative sample was of primary interest. In this manuscript, MRP-style estimators will be evaluated in an experiment… ▽ More

    Submitted 20 January, 2022; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: Updated DAG to reflect true d.g. process for population in simulation study. Also fixed minor typos

  43. arXiv:2102.03474  [pdf, ps, other

    stat.AP

    Multichannel adaptive signal detection: Basic theory and literature review

    Authors: Weijian Liu, Jun Liu, Chengpeng Hao, Yongchan Gao, Yong-Liang Wang

    Abstract: Multichannel adaptive signal detection jointly uses the test and training data to form an adaptive detector, and then make a decision on whether a target exists or not. Remarkably, the resulting adaptive detectors usually possess the constant false alarm rate (CFAR) properties, and hence no additional CFAR processing is needed. Filtering is not needed as a processing procedure either, since the fu… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: 10 pages, 5 figures. This manuscript is accepted in Science China: Information Sciences

    Report number: Manuscript No. SCIS-2020-1112.R1

  44. arXiv:2102.02580  [pdf, other

    stat.ME math.ST

    Factor-augmented Smoothing Model for Functional Data

    Authors: Yuan Gao, Han Lin Shang, Yanrong Yang

    Abstract: We propose modeling raw functional data as a mixture of a smooth function and a highdimensional factor component. The conventional approach to retrieving the smooth function from the raw data is through various smoothing techniques. However, the smoothing model is not adequate to recover the smooth curve or capture the data variation in some situations. These include cases where there is a large a… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

  45. arXiv:2011.00613  [pdf, other

    cs.LG stat.ML

    An Information-Geometric Distance on the Space of Tasks

    Authors: Yansong Gao, Pratik Chaudhari

    Abstract: This paper prescribes a distance between learning tasks modeled as joint distributions on data and labels. Using tools in information geometry, the distance is defined to be the length of the shortest weight trajectory on a Riemannian manifold as a classifier is fitted on an interpolated task. The interpolated task evolves from the source to the target task using an optimal transport formulation.… ▽ More

    Submitted 24 February, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Report number: Proc. of the International Conference of Machine Learning (ICML) 2021

  46. arXiv:2009.05346  [pdf, other

    cs.LG cs.CV stat.ML

    Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification

    Authors: Nicolo Colombo, Yang Gao

    Abstract: The history of deep learning has shown that human-designed problem-specific networks can greatly improve the classification performance of general neural models. In most practical cases, however, choosing the optimal architecture for a given task remains a challenging problem. Recent architecture-search methods are able to automatically build neural models with strong performance but fail to fully… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: 22 pages and 10 figures

  47. arXiv:2008.03961  [pdf, other

    cs.LG stat.ML

    Automatic Remaining Useful Life Estimation Framework with Embedded Convolutional LSTM as the Backbone

    Authors: Yexu Zhou, Yuting Gao, Yiran Huang, Michael Hefenbrock, Till Riedel, Michael Beigl

    Abstract: An essential task in predictive maintenance is the prediction of the Remaining Useful Life (RUL) through the analysis of multivariate time series. Using the sliding window method, Convolutional Neural Network (CNN) and conventional Recurrent Neural Network (RNN) approaches have produced impressive results on this matter, due to their ability to learn optimized features. However, sequence informati… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: 16 pages, 5 figures

  48. arXiv:2007.16103  [pdf, other

    cs.LG cs.CV stat.ML

    Learning-based Computer-aided Prescription Model for Parkinson's Disease: A Data-driven Perspective

    Authors: Yinghuan Shi, Wanqi Yang, Kim-Han Thung, Hao Wang, Yang Gao, Yang Pan, Li Zhang, Dinggang Shen

    Abstract: In this paper, we study a novel problem: "automatic prescription recommendation for PD patients." To realize this goal, we first build a dataset by collecting 1) symptoms of PD patients, and 2) their prescription drug provided by neurologists. Then, we build a novel computer-aided prescription model by learning the relation between observed symptoms and prescription drug. Finally, for the new comi… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: IEEE JBHI 2020

  49. arXiv:2007.14390  [pdf, other

    cs.LG cs.CV stat.ML

    Flower: A Friendly Federated Learning Research Framework

    Authors: Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, Nicholas D. Lane

    Abstract: Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared prediction model, while kee** their training data on the device, thereby decoupling the ability to do machine learning from the need to store the data in the cloud. However, FL is difficult to implement realistically, both in terms of scale and systems heterogeneity. Although there are… ▽ More

    Submitted 5 March, 2022; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: Open-Source, mobile-friendly Federated Learning framework

  50. arXiv:2006.06136  [pdf, ps, other

    stat.ML cs.LG

    Weighted Lasso Estimates for Sparse Logistic Regression: Non-asymptotic Properties with Measurement Error

    Authors: Huamei Huang, Yu**g Gao, Huiming Zhang, Bo Li

    Abstract: When we are interested in high-dimensional system and focus on classification performance, the $\ell_{1}$-penalized logistic regression is becoming important and popular. However, the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data. We proposed two types of weighted Lasso estimates depending on covariates by the McDiarmid i… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

    Comments: 24 pages, 6 tables. Accepted by Acta Mathematica Scientia

    MSC Class: 62J12; 62H12; 62H30