-
ULV: A robust statistical method for clustered data, with applications to multisubject, single-cell omics data
Authors:
Mingyu Du,
Kevin Johnston,
Veronica Berrocal,
Wei Li,
Xiangmin Xu,
Zhaoxia Yu
Abstract:
Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile bi…
▽ More
Molecular and genomic technological advancements have greatly enhanced our understanding of biological processes by allowing us to quantify key biological variables such as gene expression, protein levels, and microbiome compositions. These breakthroughs have enabled us to achieve increasingly higher levels of resolution in our measurements, exemplified by our ability to comprehensively profile biological information at the single-cell level. However, the analysis of such data faces several critical challenges: limited number of individuals, non-normality, potential dropouts, outliers, and repeated measurements from the same individual. In this article, we propose a novel method, which we call U-statistic based latent variable (ULV). Our proposed method takes advantage of the robustness of rank-based statistics and exploits the statistical efficiency of parametric methods for small sample sizes. It is a computationally feasible framework that addresses all the issues mentioned above simultaneously. An additional advantage of ULV is its flexibility in modeling various types of single-cell data, including both RNA and protein abundance. The usefulness of our method is demonstrated in two studies: a single-cell proteomics study of acute myelogenous leukemia (AML) and a single-cell RNA study of COVID-19 symptoms. In the AML study, ULV successfully identified differentially expressed proteins that would have been missed by the pseudobulk version of the Wilcoxon rank-sum test. In the COVID-19 study, ULV identified genes associated with covariates such as age and gender, and genes that would be missed without adjusting for covariates. The differentially expressed genes identified by our method are less biased toward genes with high expression levels. Furthermore, ULV identified additional gene pathways likely contributing to the mechanisms of COVID-19 severity.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Semi-supervised Fréchet Regression
Authors:
Rui Qiu,
Zhou Yu,
Zhenhua Lin
Abstract:
This paper explores the field of semi-supervised Fréchet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fréchet regression and semi-supervised kNN Fréchet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervi…
▽ More
This paper explores the field of semi-supervised Fréchet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fréchet regression and semi-supervised kNN Fréchet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervised Euclidean regression methods. We establish their convergence rates with limited labeled data and large amounts of unlabeled data, taking into account the low-dimensional manifold structure of the feature space. Through comprehensive simulations across diverse settings and applications to real data, we demonstrate the superior performance of our methods over their supervised counterparts. This study addresses existing research gaps and paves the way for further exploration and advancements in the field of semi-supervised Fréchet regression.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
From Poisson Observations to Fitted Negative Binomial Distribution
Authors:
Yingying Yang,
Niloufar Dousti Mousavi,
Zhou Yu,
Jie Yang
Abstract:
The Kolmogorov-Smirnov (KS) test has been widely used for testing whether a random sample comes from a specific distribution, possibly with estimated parameters. If the data come from a Poisson distribution, however, one can hardly tell that they do not come from a negative binomial distribution by running a KS test, even with a large sample size. In this paper, we rigorously justify that the KS t…
▽ More
The Kolmogorov-Smirnov (KS) test has been widely used for testing whether a random sample comes from a specific distribution, possibly with estimated parameters. If the data come from a Poisson distribution, however, one can hardly tell that they do not come from a negative binomial distribution by running a KS test, even with a large sample size. In this paper, we rigorously justify that the KS test statistic converges to zero almost surely, as the sample size goes to infinity. To prove this result, we demonstrate a notable finding that in this case the maximum likelihood estimates (MLE) for the parameters of the negative binomial distribution converge to infinity and one, respectively and almost surely. Our result highlights a potential limitation of the KS test, as well as other tests based on empirical distribution functions (EDF), in efficiently identifying the true underlying distribution. Our findings and justifications also underscore the importance of careful interpretation and further investigation when identifying the most appropriate distributions in practice.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Quasi-Bayesian Estimation and Inference with Control Functions
Authors:
Ruixuan Liu,
Zhengfei Yu
Abstract:
We consider a quasi-Bayesian method that combines a frequentist estimation in the first stage and a Bayesian estimation/inference approach in the second stage. The study is motivated by structural discrete choice models that use the control function methodology to correct for endogeneity bias. In this scenario, the first stage estimates the control function using some frequentist parametric or non…
▽ More
We consider a quasi-Bayesian method that combines a frequentist estimation in the first stage and a Bayesian estimation/inference approach in the second stage. The study is motivated by structural discrete choice models that use the control function methodology to correct for endogeneity bias. In this scenario, the first stage estimates the control function using some frequentist parametric or nonparametric approach. The structural equation in the second stage, associated with certain complicated likelihood functions, can be more conveniently dealt with using a Bayesian approach. This paper studies the asymptotic properties of the quasi-posterior distributions obtained from the second stage. We prove that the corresponding quasi-Bayesian credible set does not have the desired coverage in large samples. Nonetheless, the quasi-Bayesian point estimator remains consistent and is asymptotically equivalent to a frequentist two-stage estimator. We show that one can obtain valid inference by bootstrap** the quasi-posterior that takes into account the first-stage estimation uncertainty.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Authors:
Myeongseob Ko,
Feiyang Kang,
Weiyan Shi,
Ming **,
Zhou Yu,
Ruoxi Jia
Abstract:
Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious comp…
▽ More
Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models.
In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches.
We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.
△ Less
Submitted 19 June, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Computing Gerber-Shiu function in the classical risk model with interest using collocation method
Authors:
Zan Yu,
Lianzeng Zhang
Abstract:
The Gerber-Shiu function is a classical research topic in actuarial science.However, exact solutions are only available in the literature for very specific cases where the claim amounts follow distributions such as the exponential distribution. This presents a longstanding challenge, particularly from a computational perspective. For the classical risk process in continuous time, the Gerber-Shiu d…
▽ More
The Gerber-Shiu function is a classical research topic in actuarial science.However, exact solutions are only available in the literature for very specific cases where the claim amounts follow distributions such as the exponential distribution. This presents a longstanding challenge, particularly from a computational perspective. For the classical risk process in continuous time, the Gerber-Shiu discounted penalty function satisfies a class of Volterra integral equations. In this paper, we use the collocation method to compute the Gerber-Shiu function for risk model with interest. Our methodology demonstrates that the function can be expressed as a linear algebraic system, which is straightforward to implement. One major advantage of our approach is that it does not require any specific distributional assumptions on the claim amounts, except for mild differentiability and continuity conditions that can be easily verified. We also examine the convergence orders of the collocation method. Finally, we present several numerical examples to illustrate the desirable performance of our proposed method.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space
Authors:
Yunchen Li,
Zhou Yu,
Gaoqi He,
Yunhang Shen,
Ke Li,
Xing Sun,
Shaohui Lin
Abstract:
Symmetric positive definite~(SPD) matrices have shown important value and applications in statistics and machine learning, such as FMRI analysis and traffic prediction. Previous works on SPD matrices mostly focus on discriminative models, where predictions are made directly on $E(X|y)$, where $y$ is a vector and $X$ is an SPD matrix. However, these methods are challenging to handle for large-scale…
▽ More
Symmetric positive definite~(SPD) matrices have shown important value and applications in statistics and machine learning, such as FMRI analysis and traffic prediction. Previous works on SPD matrices mostly focus on discriminative models, where predictions are made directly on $E(X|y)$, where $y$ is a vector and $X$ is an SPD matrix. However, these methods are challenging to handle for large-scale data, as they need to access and process the whole data. In this paper, inspired by denoising diffusion probabilistic model~(DDPM), we propose a novel generative model, termed SPD-DDPM, by introducing Gaussian distribution in the SPD space to estimate $E(X|y)$. Moreover, our model is able to estimate $p(X)$ unconditionally and flexibly without giving $y$. On the one hand, the model conditionally learns $p(X|y)$ and utilizes the mean of samples to obtain $E(X|y)$ as a prediction. On the other hand, the model unconditionally learns the probability distribution of the data $p(X)$ and generates samples that conform to this distribution. Furthermore, we propose a new SPD net which is much deeper than the previous networks and allows for the inclusion of conditional factors. Experiment results on toy data and real taxi data demonstrate that our models effectively fit the data distribution both unconditionally and unconditionally and provide accurate predictions.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Characteristic Circuits
Authors:
Zhongjie Yu,
Martin Trapp,
Kristian Kersting
Abstract:
In many real-world scenarios, it is crucial to be able to reliably and efficiently reason under uncertainty while capturing complex relationships in data. Probabilistic circuits (PCs), a prominent family of tractable probabilistic models, offer a remedy to this challenge by composing simple, tractable distributions into a high-dimensional probability distribution. However, learning PCs on heteroge…
▽ More
In many real-world scenarios, it is crucial to be able to reliably and efficiently reason under uncertainty while capturing complex relationships in data. Probabilistic circuits (PCs), a prominent family of tractable probabilistic models, offer a remedy to this challenge by composing simple, tractable distributions into a high-dimensional probability distribution. However, learning PCs on heterogeneous data is challenging and densities of some parametric distributions are not available in closed form, limiting their potential use. We introduce characteristic circuits (CCs), a family of tractable probabilistic models providing a unified formalization of distributions over heterogeneous data in the spectral domain. The one-to-one relationship between characteristic functions and probability measures enables us to learn high-dimensional distributions on heterogeneous data domains and facilitates efficient probabilistic inference even when no closed-form density function is available. We show that the structure and parameters of CCs can be learned efficiently from the data and find that CCs outperform state-of-the-art density estimators for heterogeneous data domains on common benchmark data sets.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Time-varying effect in the competing risks based on restricted mean time lost
Authors:
Zhiyin Yu,
Zhao** Li,
Chengfeng Zhang,
Yawen Hou,
Derun Zhou,
Zheng Chen
Abstract:
Patients with breast cancer tend to die from other diseases, so for studies that focus on breast cancer, a competing risks model is more appropriate. Considering subdistribution hazard ratio, which is used often, limited to model assumptions and clinical interpretation, we aimed to quantify the effects of prognostic factors by an absolute indicator, the difference in restricted mean time lost (RMT…
▽ More
Patients with breast cancer tend to die from other diseases, so for studies that focus on breast cancer, a competing risks model is more appropriate. Considering subdistribution hazard ratio, which is used often, limited to model assumptions and clinical interpretation, we aimed to quantify the effects of prognostic factors by an absolute indicator, the difference in restricted mean time lost (RMTL), which is more intuitive. Additionally, prognostic factors may have dynamic effects (time-varying effects) in long-term follow-up. However, existing competing risks regression models only provide a static view of covariate effects, leading to a distorted assessment of the prognostic factor. To address this issue, we proposed a dynamic effect RMTL regression that can explore the between-group cumulative difference in mean life lost over a period of time and obtain the real-time effect by the speed of accumulation, as well as personalized predictions on a time scale. Through Monte Carlo simulation, we validated the dynamic effects estimated by the proposed regression having low bias and a coverage rate of around 95%. Applying this model to an elderly early-stage breast cancer cohort, we found that most factors had different patterns of dynamic effects, revealing meaningful physiological mechanisms underlying diseases. Moreover, from the perspective of prediction, the mean C-index in external validation reached 0.78. Dynamic effect RMTL regression can analyze both dynamic cumulative effects and real-time effects of covariates, providing a more comprehensive prognosis and better prediction when competing risks exist.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Sparse Fréchet Sufficient Dimension Reduction with Graphical Structure Among Predictors
Authors:
Jiaying Weng,
Kai Tan,
Cheng Wang,
Zhou Yu
Abstract:
Fréchet regression has received considerable attention to model metric-space valued responses that are complex and non-Euclidean data, such as probability distributions and vectors on the unit sphere. However, existing Fréchet regression literature focuses on the classical setting where the predictor dimension is fixed, and the sample size goes to infinity. This paper proposes sparse Fréchet suffi…
▽ More
Fréchet regression has received considerable attention to model metric-space valued responses that are complex and non-Euclidean data, such as probability distributions and vectors on the unit sphere. However, existing Fréchet regression literature focuses on the classical setting where the predictor dimension is fixed, and the sample size goes to infinity. This paper proposes sparse Fréchet sufficient dimension reduction with graphical structure among high-dimensional Euclidean predictors. In particular, we propose a convex optimization problem that leverages the graphical information among predictors and avoids inverting the high-dimensional covariance matrix. We also provide the Alternating Direction Method of Multipliers (ADMM) algorithm to solve the optimization problem. Theoretically, the proposed method achieves subspace estimation and variable selection consistency under suitable conditions. Extensive simulations and a real data analysis are carried out to illustrate the finite-sample performance of the proposed method.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Learning Theory of Distribution Regression with Neural Networks
Authors:
Zhongjie Shi,
Zhan Yu,
Ding-Xuan Zhou
Abstract:
In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On…
▽ More
In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On the other hand, the classical neural network structure requires the input variable to be a vector. When the input samples are probability distributions, the traditional deep neural network method cannot be directly used and the difficulty arises for distribution regression. A well-defined neural network structure for distribution inputs is intensively desirable. There is no mathematical model and theoretical analysis on neural network realization of distribution regression. To overcome technical difficulties and address this issue, we establish a novel fully connected neural network framework to realize an approximation theory of functionals defined on the space of Borel probability measures. Furthermore, based on the established functional approximation results, in the hypothesis space induced by the novel FNN structure with distribution inputs, almost optimal learning rates for the proposed distribution regression model up to logarithmic terms are derived via a novel two-stage error decomposition technique.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Partial Linear Cox Model with Deep ReLU Networks for Interval-Censored Failure Time Data
Authors:
Jie Zhou,
Yue Zhang,
Zhangsheng Yu
Abstract:
The partial linear Cox model for interval-censoring is well-studied under the additive assumption but is still under-investigated without this assumption. In this paper, we propose to use a deep ReLU neural network to estimate the nonparametric components of a partial linear Cox model for interval-censored data. This model not only retains the nice interpretability of the parametric component but…
▽ More
The partial linear Cox model for interval-censoring is well-studied under the additive assumption but is still under-investigated without this assumption. In this paper, we propose to use a deep ReLU neural network to estimate the nonparametric components of a partial linear Cox model for interval-censored data. This model not only retains the nice interpretability of the parametric component but also improves the predictive power compared to the partial linear additive Cox model. We derive the convergence rate of the proposed estimator and show that it can break the curse of dimensionality under some certain smoothness assumptions. Based on such rate, the asymptotic normality and the semiparametric efficiency are also established. Intensive simulation studies are carried out to demonstrate the finite sample performance on both estimation and prediction. The proposed estimation procedure is illustrated on a real dataset.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
A Trigamma-free Approach for Computing Information Matrices Related to Trigamma Function
Authors:
Zhou Yu,
Niloufar Dousti Mousavi,
Jie Yang
Abstract:
Negative binomial related distributions have been widely used in practice. The calculation of the corresponding Fisher information matrices involves the expectation of trigamma function values which can only be calculated numerically and approximately. In this paper, we propose a trigamma-free approach to approximate the expectations involving the trigamma function, along with theoretical upper bo…
▽ More
Negative binomial related distributions have been widely used in practice. The calculation of the corresponding Fisher information matrices involves the expectation of trigamma function values which can only be calculated numerically and approximately. In this paper, we propose a trigamma-free approach to approximate the expectations involving the trigamma function, along with theoretical upper bounds for approximation errors. We show by numerical studies that our approach is highly efficient and much more accurate than previous methods. We also apply our approach to compute the Fisher information matrices of zero-inflated negative binomial (ZINB) and beta negative binomial (ZIBNB) probabilistic models, as well as ZIBNB regression models.
△ Less
Submitted 18 January, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Generalization Ability of Wide Residual Networks
Authors:
Jianfa Lai,
Zixiong Yu,
Songtao Tian,
Qian Lin
Abstract:
In this paper, we study the generalization ability of the wide residual network on $\mathbb{S}^{d-1}$ with the ReLU activation function. We first show that as the width $m\rightarrow\infty$, the residual network kernel (RNK) uniformly converges to the residual neural tangent kernel (RNTK). This uniform convergence further guarantees that the generalization error of the residual network converges t…
▽ More
In this paper, we study the generalization ability of the wide residual network on $\mathbb{S}^{d-1}$ with the ReLU activation function. We first show that as the width $m\rightarrow\infty$, the residual network kernel (RNK) uniformly converges to the residual neural tangent kernel (RNTK). This uniform convergence further guarantees that the generalization error of the residual network converges to that of the kernel regression with respect to the RNTK. As direct corollaries, we then show $i)$ the wide residual network with the early stop** strategy can achieve the minimax rate provided that the target regression function falls in the reproducing kernel Hilbert space (RKHS) associated with the RNTK; $ii)$ the wide residual network can not generalize well if it is trained till overfitting the data. We finally illustrate some experiments to reconcile the contradiction between our theoretical result and the widely observed ``benign overfitting phenomenon''
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Distributed Gradient Descent for Functional Learning
Authors:
Zhan Yu,
Jun Fan,
Ding-Xuan Zhou
Abstract:
In recent years, different types of distributed learning schemes have received increasing attention for their strong advantages in handling large-scale data information. In the information era, to face the big data challenges which stem from functional data analysis very recently, we propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across…
▽ More
In recent years, different types of distributed learning schemes have received increasing attention for their strong advantages in handling large-scale data information. In the information era, to face the big data challenges which stem from functional data analysis very recently, we propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space. Based on integral operator approaches, we provide the first theoretical understanding of the DGDFL algorithm in many different aspects in the literature. On the way of understanding DGDFL, firstly, a data-based gradient descent functional learning (GDFL) algorithm associated with a single-machine model is proposed and comprehensively studied. Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression. We further provide a semi-supervised DGDFL approach to weaken the restriction on the maximal number of local machines to ensure optimal rates. To our best knowledge, the DGDFL provides the first distributed iterative training approach to functional learning and enriches the stage of functional data analysis.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains
Authors:
Yicheng Li,
Zixiong Yu,
Guhan Chen,
Qian Lin
Abstract:
In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the…
▽ More
In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.
△ Less
Submitted 8 January, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Prediction method of cigarette draw resistance based on correlation analysis
Authors:
Linsheng Chen,
Zhonghua Yu,
Bo Zhang,
Qiang Zhu,
Hu Fan,
Yucan Qiu
Abstract:
The cigarette draw resistance monitoring method is incomplete and single, and the lacks correlation analysis and preventive modeling, resulting in substandard cigarettes in the market. To address this problem without increasing the hardware cost, in this paper, multi-indicator correlation analysis is used to predict cigarette draw resistance. First, the monitoring process of draw resistance is ana…
▽ More
The cigarette draw resistance monitoring method is incomplete and single, and the lacks correlation analysis and preventive modeling, resulting in substandard cigarettes in the market. To address this problem without increasing the hardware cost, in this paper, multi-indicator correlation analysis is used to predict cigarette draw resistance. First, the monitoring process of draw resistance is analyzed based on the existing quality control framework, and optimization ideas are proposed. In addition, for the three production units, the cut tobacco supply (VE), the tobacco rolling (SE), and the cigarette-forming (MAX), direct and potential factors associated with draw resistance are explored, based on the linear and non-linear correlation analysis. Then, the correlates of draw resistance are used as inputs for the machine learning model, and the predicted values of draw resistance are used as outputs. Finally, this research also innovatively verifies the practical application value of draw resistance prediction: the distribution characteristics of substandard cigarettes are analyzed based on the prediction results, the time interval of substandard cigarettes being produced is determined, the probability model of substandard cigarettes being sampled is derived, and the reliability of the prediction result is further verified by the example. The results show that the prediction model based on correlation analysis has good performance in three months of actual production.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Intrinsic minimum average variance estimation for sufficient dimension reduction with symmetric positive definite matrices and beyond
Authors:
B. Chen,
S. Dai,
Z. Yu
Abstract:
In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods und…
▽ More
In this paper, we target the problem of sufficient dimension reduction with symmetric positive definite matrices valued responses. We propose the intrinsic minimum average variance estimation method and the intrinsic outer product gradient method which fully exploit the geometric structure of the Riemannian manifold where responses lie. We present the algorithms for our newly developed methods under the log-Euclidean metric and the log-Cholesky metric. Each of the two metrics is linked to an abelian Lie group structure that transforms our model defined on a manifold into a Euclidean one. The proposed methods are then further extended to general Riemannian manifolds. We establish rigourous asymptotic results for the proposed estimators, including the rate of convergence and the asymptotic normality. We also develop a cross validation algorithm for the estimation of the structural dimension with theoretical guarantee Comprehensive simulation studies and an application to the New York taxi network data are performed to show the superiority of the proposed methods.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Elliptically symmetric distributions for directional data of arbitrary dimension
Authors:
Zehao Yu,
Xianzheng Huang
Abstract:
We formulate a class of angular Gaussian distributions that allows different degrees of isotropy for directional random variables of arbitrary dimension. Through a series of novel reparameterization, this distribution family is indexed by parameters with meaningful statistical interpretations that can range over the entire real space of an adequate dimension. The new parameterization greatly simpl…
▽ More
We formulate a class of angular Gaussian distributions that allows different degrees of isotropy for directional random variables of arbitrary dimension. Through a series of novel reparameterization, this distribution family is indexed by parameters with meaningful statistical interpretations that can range over the entire real space of an adequate dimension. The new parameterization greatly simplifies maximum likelihood estimation of all model parameters, which in turn leads to theoretically sound and numerically stable inference procedures to infer key features of the distribution. Byproducts from the likelihood-based inference are used to develop graphical and numerical diagnostic tools for assessing goodness of fit of this distribution in a data application. Simulation study and application to data from a hydrogeology study are used to demonstrate implementation and performance of the inference procedures and diagnostics methods.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
Double Robust Bayesian Inference on Average Treatment Effects
Authors:
Christoph Breunig,
Ruixuan Liu,
Zhengfei Yu
Abstract:
We propose a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. Our robust Bayesian approach involves two important modifications: first, we adjust the prior distributions of the conditional mean function; second, we correct the posterior distribution of the resulting ATE. Both adjustments make use of pilot estimators motivated by the semiparam…
▽ More
We propose a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. Our robust Bayesian approach involves two important modifications: first, we adjust the prior distributions of the conditional mean function; second, we correct the posterior distribution of the resulting ATE. Both adjustments make use of pilot estimators motivated by the semiparametric influence function for ATE estimation. We prove asymptotic equivalence of our Bayesian procedure and efficient frequentist ATE estimators by establishing a new semiparametric Bernstein-von Mises theorem under double robustness; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score and vice versa. Consequently, the resulting Bayesian credible sets form confidence intervals with asymptotically exact coverage probability. In simulations, our double robust Bayesian procedure leads to significant bias reduction of point estimation over conventional Bayesian methods and more accurate coverage of confidence intervals compared to existing frequentist methods. We illustrate our method in an application to the National Supported Work Demonstration.
△ Less
Submitted 21 February, 2024; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Orthogonal Gromov-Wasserstein Discrepancy with Efficient Lower Bound
Authors:
Hongwei **,
Zishun Yu,
Xinhua Zhang
Abstract:
Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although…
▽ More
Comparing structured data from possibly different metric-measure spaces is a fundamental task in machine learning, with applications in, e.g., graph classification. The Gromov-Wasserstein (GW) discrepancy formulates a coupling between the structured data based on optimal transportation, tackling the incomparability between different structures by aligning the intra-relational geometries. Although efficient \emph{local} solvers such as conditional gradient and Sinkhorn are available, the inherent non-convexity still prevents a tractable evaluation, and the existing lower bounds are not tight enough for practical use. To address this issue, we take inspiration from the connection with the quadratic assignment problem, and propose the orthogonal Gromov-Wasserstein (OGW) discrepancy as a surrogate of GW. It admits an efficient and \emph{closed-form} lower bound with $\mathcal{O}(n^3)$ complexity, and directly extends to the fused Gromov-Wasserstein (FGW) distance, incorporating node features into the coupling. Extensive experiments on both the synthetic and real-world datasets show the tightness of our lower bounds, and both OGW and its lower bounds efficiently deliver accurate predictions and satisfactory barycenters for graph sets.
△ Less
Submitted 10 July, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Random Forest Weighted Local Fréchet Regression with Random Objects
Authors:
Rui Qiu,
Zhou Yu,
Ruoqing Zhu
Abstract:
Statistical analysis is increasingly confronted with complex data from metric spaces. Petersen and Müller (2019) established a general paradigm of Fréchet regression with complex metric space valued responses and Euclidean predictors. However, the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper pro…
▽ More
Statistical analysis is increasingly confronted with complex data from metric spaces. Petersen and Müller (2019) established a general paradigm of Fréchet regression with complex metric space valued responses and Euclidean predictors. However, the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forest weighted local Fréchet regression paradigm. The main mechanism of our approach relies on a locally adaptive kernel generated by random forests. Our first method uses these weights as the local average to solve the conditional Fréchet mean, while the second method performs local linear Fréchet regression, both significantly improving existing Fréchet regression methods. Based on the theory of infinite order U-processes and infinite order $M_{m_n}$-estimator, we establish the consistency, rate of convergence, and asymptotic normality for our local constant estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our methods with several commonly encountered types of responses such as distribution functions, symmetric positive-definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to New York taxi data and human mortality data.
△ Less
Submitted 16 March, 2024; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces
Authors:
Chaoxia Yuan,
Chao Ying,
Zhou Yu,
Fang Fang
Abstract:
Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet…
▽ More
Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.
△ Less
Submitted 22 July, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
A Global Two-stage Algorithm for Non-convex Penalized High-dimensional Linear Regression Problems
Authors:
Peili Li,
Min Liu,
Zhou Yu
Abstract:
By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally c…
▽ More
By the asymptotic oracle property, non-convex penalties represented by minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) have attracted much attentions in high-dimensional data analysis, and have been widely used in signal processing, image restoration, matrix estimation, etc. However, in view of their non-convex and non-smooth characteristics, they are computationally challenging. Almost all existing algorithms converge locally, and the proper selection of initial values is crucial. Therefore, in actual operation, they often combine a warm-starting technique to meet the rigid requirement that the initial value must be sufficiently close to the optimal solution of the corresponding problem. In this paper, based on the DC (difference of convex functions) property of MCP and SCAD penalties, we aim to design a global two-stage algorithm for the high-dimensional least squares linear regression problems. A key idea for making the proposed algorithm to be efficient is to use the primal dual active set with continuation (PDASC) method, which is equivalent to the semi-smooth Newton (SSN) method, to solve the corresponding sub-problems. Theoretically, we not only prove the global convergence of the proposed algorithm, but also verify that the generated iterative sequence converges to a d-stationary point. In terms of computational performance, the abundant research of simulation and real data show that the algorithm in this paper is superior to the latest SSN method and the classic coordinate descent (CD) algorithm for solving non-convex penalized high-dimensional linear regression problems.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Fast Sketching of Polynomial Kernels of Polynomial Degree
Authors:
Zhao Song,
David P. Woodruff,
Zheng Yu,
Lichen Zhang
Abstract:
Kernel methods are fundamental in machine learning, and faster algorithms for kernel approximation provide direct speedups for many core tasks in machine learning. The polynomial kernel is especially important as other kernels can often be approximated by the polynomial kernel via a Taylor series expansion. Recent techniques in oblivious sketching reduce the dependence in the running time on the d…
▽ More
Kernel methods are fundamental in machine learning, and faster algorithms for kernel approximation provide direct speedups for many core tasks in machine learning. The polynomial kernel is especially important as other kernels can often be approximated by the polynomial kernel via a Taylor series expansion. Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic. However, for more slowly growing kernels, such as the neural tangent and arc-cosine kernels, $q$ needs to be polynomial, and previous work incurs a polynomial factor slowdown in the running time. We give a new oblivious sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term. Combined with a novel sampling scheme, we give the fastest algorithms for approximating a large family of slow-growing kernels.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Kernel regression for cause-specific hazard models with time-dependent coefficients
Authors:
Xiaomeng Qi,
Zhangsheng Yu
Abstract:
Competing risk data appear widely in modern biomedical research. Cause-specific hazard models are often used to deal with competing risk data in the past two decades. There is no current study on the kernel likelihood method for the cause-specific hazard model with time-varying coefficients. We propose to use the local partial log-likelihood approach for nonparametric time-varying coefficient esti…
▽ More
Competing risk data appear widely in modern biomedical research. Cause-specific hazard models are often used to deal with competing risk data in the past two decades. There is no current study on the kernel likelihood method for the cause-specific hazard model with time-varying coefficients. We propose to use the local partial log-likelihood approach for nonparametric time-varying coefficient estimation. Simulation studies demonstrate that our proposed nonparametric kernel estimator has a good performance under assumed finite sample settings. Finally, we apply the proposed method to analyze a diabetes dialysis study with competing death causes.
△ Less
Submitted 11 September, 2021; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Inference on Individual Treatment Effects in Nonseparable Triangular Models
Authors:
Jun Ma,
Vadim Marmer,
Zhengfei Yu
Abstract:
In nonseparable triangular models with a binary endogenous treatment and a binary instrumental variable, Vuong and Xu (2017) established identification results for individual treatment effects (ITEs) under the rank invariance assumption. Using their approach, Feng, Vuong, and Xu (2019) proposed a uniformly consistent kernel estimator for the density of the ITE that utilizes estimated ITEs. In this…
▽ More
In nonseparable triangular models with a binary endogenous treatment and a binary instrumental variable, Vuong and Xu (2017) established identification results for individual treatment effects (ITEs) under the rank invariance assumption. Using their approach, Feng, Vuong, and Xu (2019) proposed a uniformly consistent kernel estimator for the density of the ITE that utilizes estimated ITEs. In this paper, we establish the asymptotic normality of the density estimator of Feng, Vuong, and Xu (2019) and show that the ITE estimation errors have a non-negligible effect on the asymptotic distribution of the estimator. We propose asymptotically valid standard errors that account for ITEs estimation, as well as a bias correction. Furthermore, we develop uniform confidence bands for the density of the ITE using the jackknife multiplier or nonparametric bootstrap critical values.
△ Less
Submitted 15 February, 2023; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Leveraging Probabilistic Circuits for Nonparametric Multi-Output Regression
Authors:
Zhongjie Yu,
Mingye Zhu,
Martin Trapp,
Arseny Skryagin,
Kristian Kersting
Abstract:
Inspired by recent advances in the field of expert-based approximations of Gaussian processes (GPs), we present an expert-based approach to large-scale multi-output regression using single-output GP experts. Employing a deeply structured mixture of single-output GPs encoded via a probabilistic circuit allows us to capture correlations between multiple output dimensions accurately. By recursively p…
▽ More
Inspired by recent advances in the field of expert-based approximations of Gaussian processes (GPs), we present an expert-based approach to large-scale multi-output regression using single-output GP experts. Employing a deeply structured mixture of single-output GPs encoded via a probabilistic circuit allows us to capture correlations between multiple output dimensions accurately. By recursively partitioning the covariate space and the output space, posterior inference in our model reduces to inference on single-output GP experts, which only need to be conditioned on a small subset of the observations. We show that inference can be performed exactly and efficiently in our model, that it can capture correlations between output dimensions and, hence, often outperforms approaches that do not incorporate inter-output correlations, as demonstrated on several data sets in terms of the negative log predictive density.
△ Less
Submitted 1 August, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Robust Kernel-based Distribution Regression
Authors:
Zhan Yu,
Daniel W. C. Ho,
Ding-Xuan Zhou
Abstract:
Regularization schemes for regression have been widely studied in learning theory and inverse problems. In this paper, we study distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS). Recently, theoretical analysis on DR has been carried out via kernel ridge regress…
▽ More
Regularization schemes for regression have been widely studied in learning theory and inverse problems. In this paper, we study distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS). Recently, theoretical analysis on DR has been carried out via kernel ridge regression and several learning behaviors have been observed. However, the topic has not been explored and understood beyond the least square based DR. By introducing a robust loss function $l_σ$ for two-stage sampling problems, we present a novel robust distribution regression (RDR) scheme. With a windowing function $V$ and a scaling parameter $σ$ which can be appropriately chosen, $l_σ$ can include a wide range of popular used loss functions that enrich the theme of DR. Moreover, the loss $l_σ$ is not necessarily convex, hence largely improving the former regression class (least square) in the literature of DR. The learning rates under different regularity ranges of the regression function $f_ρ$ are comprehensively studied and derived via integral operator techniques. The scaling parameter $σ$ is shown to be crucial in providing robustness and satisfactory learning rates of RDR.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
A Measurement of In-Betweenness and Inference Based on Shape Theories
Authors:
Dustin Pluta,
Xiangmin Xu,
Daniel L. Gillen,
Zhaoxia Yu
Abstract:
We propose a statistical framework to investigate whether a given subpopulation lies between two other subpopulations in a multivariate feature space. This methodology is motivated by a biological question from a collaborator: Is a newly discovered cell type between two known types in several given features? We propose two in-betweenness indices (IBI) to quantify the in-betweenness exhibited by a…
▽ More
We propose a statistical framework to investigate whether a given subpopulation lies between two other subpopulations in a multivariate feature space. This methodology is motivated by a biological question from a collaborator: Is a newly discovered cell type between two known types in several given features? We propose two in-betweenness indices (IBI) to quantify the in-betweenness exhibited by a random triangle formed by the summary statistics of the three subpopulations. Statistical inference methods are provided for triangle shape and IBI metrics. The application of our methods is demonstrated in three examples: the classic Iris data set, a study of risk of relapse across three breast cancer subtypes, and the motivating neuronal cell data with measured electrophysiological features.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Time-varying $\ell_0$ optimization for Spike Inference from Multi-Trial Calcium Recordings
Authors:
Tong Shen,
Kevin Johnston,
Gyorgy Lur,
Michele Guindani,
Hernando Ombao,
Zhaoxia Yu
Abstract:
Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained…
▽ More
Optical imaging of genetically encoded calcium indicators is a powerful tool to record the activity of a large number of neurons simultaneously over a long period of time from freely behaving animals. However, determining the exact time at which a neuron spikes and estimating the underlying firing rate from calcium fluorescence data remains challenging, especially for calcium imaging data obtained from a longitudinal study. We propose a multi-trial time-varying $\ell_0$ penalized method to jointly detect spikes and estimate firing rates by robustly integrating evolving neural dynamics across trials. Our simulation study shows that the proposed method performs well in both spike detection and firing rate estimation. We demonstrate the usefulness of our method on calcium fluorescence trace data from two studies, with the first study showing differential firing rate functions between two behaviors and the second study showing evolving firing rate function across trials due to learning.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
To Deconvolve, or Not to Deconvolve: Inferences of Neuronal Activities using Calcium Imaging Data
Authors:
Tong Shen,
Gyorgy Lur,
Xiangmin Xu,
Zhaoxia Yu
Abstract:
With the increasing popularity of calcium imaging data in neuroscience research, methods for analyzing calcium trace data are critical to address various questions. The observed calcium traces are either analyzed directly or deconvolved to spike trains to infer neuronal activities. When both approaches are applicable, it is unclear whether deconvolving calcium traces is a necessary step. In this a…
▽ More
With the increasing popularity of calcium imaging data in neuroscience research, methods for analyzing calcium trace data are critical to address various questions. The observed calcium traces are either analyzed directly or deconvolved to spike trains to infer neuronal activities. When both approaches are applicable, it is unclear whether deconvolving calcium traces is a necessary step. In this article, we compare the performance of using calcium traces or their deconvolved spike trains for three common analyses: clustering, principal component analysis (PCA), and population decoding. Our simulations and applications to real data suggest that the estimated spike data outperform calcium trace data for both clustering and PCA. Although calcium trace data show higher predictability than spike data at each time point, spike history or cumulative spike counts is comparable to or better than calcium traces in population decoding.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Ridge-penalized adaptive Mantel test and its application in imaging genetics
Authors:
Dustin Pluta,
Tong Shen,
Gui Xue,
Chuansheng Chen,
Hernando Ombao,
Zhaoxia Yu
Abstract:
We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement a…
▽ More
We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement and testing. This result is not only theoretically interesting but also has important implications in penalized hypothesis testing, especially in high dimensional settings such as imaging genetics. Applying the proposed method to an imaging genetic study of visual working memory in health adults, we identified interesting associations of brain connectivity (measured by EEG coherence) with selected genetic features.
△ Less
Submitted 20 March, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
CogDL: A Comprehensive Library for Graph Deep Learning
Authors:
Yukuo Cen,
Zhenyu Hou,
Yan Wang,
Qibin Chen,
Yizhen Luo,
Zhongming Yu,
Hengrui Zhang,
Xingcheng Yao,
Aohan Zeng,
Shiguang Guo,
Yuxiao Dong,
Yang Yang,
Peng Zhang,
Guohao Dai,
Yu Wang,
Chang Zhou,
Hongxia Yang,
Jie Tang
Abstract:
Graph neural networks (GNNs) have attracted tremendous attention from the graph learning community in recent years. It has been widely adopted in various real-world applications from diverse domains, such as social networks and biological graphs. The research and applications of graph deep learning present new challenges, including the sparse nature of graph data, complicated training of GNNs, and…
▽ More
Graph neural networks (GNNs) have attracted tremendous attention from the graph learning community in recent years. It has been widely adopted in various real-world applications from diverse domains, such as social networks and biological graphs. The research and applications of graph deep learning present new challenges, including the sparse nature of graph data, complicated training of GNNs, and non-standard evaluation of graph tasks. To tackle the issues, we present CogDL, a comprehensive library for graph deep learning that allows researchers and practitioners to conduct experiments, compare methods, and build applications with ease and efficiency. In CogDL, we propose a unified design for the training and evaluation of GNN models for various graph tasks, making it unique among existing graph learning libraries. By utilizing this unified trainer, CogDL can optimize the GNN training loop with several training techniques, such as mixed precision training. Moreover, we develop efficient sparse operators for CogDL, enabling it to become the most competitive graph library for efficiency. Another important CogDL feature is its focus on ease of use with the aim of facilitating open and reproducible research of graph learning. We leverage CogDL to report and maintain benchmark results on fundamental graph tasks, which can be reproduced and directly used by the community.
△ Less
Submitted 17 April, 2023; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Bayesian nonparametric analysis for the detection of spikes in noisy calcium imaging data
Authors:
Laura D'Angelo,
Antonio Canale,
Zhaoxia Yu,
Michele Guindani
Abstract:
Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian f…
▽ More
Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intra-cellular calcium signals. An on-going challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time-series. In this manuscript, we propose a nested Bayesian finite mixture specification that allows the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a data set from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities.
△ Less
Submitted 27 January, 2022; v1 submitted 18 February, 2021;
originally announced February 2021.
-
On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
Authors:
Junyu Zhang,
Chengzhuo Ni,
Zheng Yu,
Csaba Szepesvari,
Mengdi Wang
Abstract:
Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this…
▽ More
Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an $\tilde{\mathcal{O}}(ε^{-3})$ sample complexity for TSIVR-PG to find an $ε$-stationary policy. By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $ε$-optimal policy with $\tilde{\mathcal{O}}(ε^{-2})$ samples.
△ Less
Submitted 27 May, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Change-point detection using spectral PCA for multivariate time series
Authors:
Shuhao Jiao,
Tong Shen,
Zhaoxia Yu,
Hernando Ombao
Abstract:
We propose a two-stage approach Spec PC-CP to identify change points in multivariate time series. In the first stage, we obtain a low-dimensional summary of the high-dimensional time series by Spectral Principal Component Analysis (Spec-PCA). In the second stage, we apply cumulative sum-type test on the Spectral PCA component using a binary segmentation algorithm. Compared with existing approaches…
▽ More
We propose a two-stage approach Spec PC-CP to identify change points in multivariate time series. In the first stage, we obtain a low-dimensional summary of the high-dimensional time series by Spectral Principal Component Analysis (Spec-PCA). In the second stage, we apply cumulative sum-type test on the Spectral PCA component using a binary segmentation algorithm. Compared with existing approaches, the proposed method is able to capture the lead-lag relationship in time series. Our simulations demonstrate that the Spec PC-CP method performs significantly better than competing methods for detecting change points in high-dimensional time series. The results on epileptic seizure EEG data and stock data also indicate that our new method can efficiently {detect} change points corresponding to the onset of the underlying events.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
A dynamic programming approach for generalized nearly isotonic optimization
Authors:
Zhensheng Yu,
Xuyu Chen,
Xudong Li
Abstract:
Shape restricted statistical estimation problems have been extensively studied, with many important practical applications in signal processing, bioinformatics, and machine learning. In this paper, we propose and study a generalized nearly isotonic optimization (GNIO) model, which recovers, as special cases, many classic problems in shape constrained statistical regression, such as isotonic regres…
▽ More
Shape restricted statistical estimation problems have been extensively studied, with many important practical applications in signal processing, bioinformatics, and machine learning. In this paper, we propose and study a generalized nearly isotonic optimization (GNIO) model, which recovers, as special cases, many classic problems in shape constrained statistical regression, such as isotonic regression, nearly isotonic regression and unimodal regression problems. We develop an efficient and easy-to-implement dynamic programming algorithm for solving the proposed model whose recursion nature is carefully uncovered and exploited. For special $\ell_2$-GNIO problems, implementation details and the optimal ${\cal O}(n)$ running time analysis of our algorithm are discussed. Numerical experiments, including the comparisons among our approach, the powerful commercial solver Gurobi, and existing fast algorithms for solving $\ell_1$-GNIO and $\ell_2$-GNIO problems, on both simulated and real data sets, are presented to demonstrate the high efficiency and robustness of our proposed algorithm in solving large scale GNIO problems.
△ Less
Submitted 10 October, 2022; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Generalized Leverage Score Sampling for Neural Networks
Authors:
Jason D. Lee,
Ruoqi Shen,
Zhao Song,
Mengdi Wang,
Zheng Yu
Abstract:
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e.g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow. Recently, it has been shown that leverage score sampling helps to accelerate kernel…
▽ More
Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e.g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow. Recently, it has been shown that leverage score sampling helps to accelerate kernel methods [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17].
In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels. We further bring the leverage score sampling into the field of deep learning theory.
$\bullet$ We show the connection between the initialization for neural network training and approximating the neural tangent kernel with random features.
$\bullet$ We prove the equivalence between regularized neural network and neural tangent kernel ridge regression under the initialization of both classical random Gaussian and leverage score sampling.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
A general kernel boosting framework integrating pathways for predictive modeling based on genomic data
Authors:
Li Zeng,
Zhaolong Yu,
Yiliang Zhang,
Hongyu Zhao
Abstract:
Predictive modeling based on genomic data has gained popularity in biomedical research and clinical practice by allowing researchers and clinicians to identify biomarkers and tailor treatment decisions more efficiently. Analysis incorporating pathway information can boost discovery power and better connect new findings with biological mechanisms. In this article, we propose a general framework, Pa…
▽ More
Predictive modeling based on genomic data has gained popularity in biomedical research and clinical practice by allowing researchers and clinicians to identify biomarkers and tailor treatment decisions more efficiently. Analysis incorporating pathway information can boost discovery power and better connect new findings with biological mechanisms. In this article, we propose a general framework, Pathway-based Kernel Boosting (PKB), which incorporates clinical information and prior knowledge about pathways for prediction of binary, continuous and survival outcomes. We introduce appropriate loss functions and optimization procedures for different outcome types. Our prediction algorithm incorporates pathway knowledge by constructing kernel function spaces from the pathways and use them as base learners in the boosting procedure. Through extensive simulations and case studies in drug response and cancer survival datasets, we demonstrate that PKB can substantially outperform other competing methods, better identify biological pathways related to drug response and patient survival, and provide novel insights into cancer pathogenesis and treatment response.
△ Less
Submitted 31 January, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Out-of-distribution Generalization via Partial Feature Decorrelation
Authors:
Xin Guo,
Zhengxu Yu,
Chao Xiang,
Zhongming **,
Jianqiang Huang,
Deng Cai,
Xiaofei He,
Xian-Sheng Hua
Abstract:
Most deep-learning-based image classification methods assume that all samples are generated under an independent and identically distributed (IID) setting. However, out-of-distribution (OOD) generalization is more common in practice, which means an agnostic context distribution shift between training and testing environments. To address this problem, we present a novel Partial Feature Decorrelatio…
▽ More
Most deep-learning-based image classification methods assume that all samples are generated under an independent and identically distributed (IID) setting. However, out-of-distribution (OOD) generalization is more common in practice, which means an agnostic context distribution shift between training and testing environments. To address this problem, we present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimizes a feature decomposition network and the target image classification model. The feature decomposition network decomposes feature embeddings into the independent and the correlated parts such that the correlations between features will be highlighted. Then, the correlated features help learn a stable feature representation by decorrelating the highlighted correlations while optimizing the image classification model. We verify the correlation modeling ability of the feature decomposition network on a synthetic dataset. The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
△ Less
Submitted 23 February, 2022; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Unsupervised Controllable Generation with Self-Training
Authors:
Grigorios G Chrysos,
Jean Kossaifi,
Zhiding Yu,
Anima Anandkumar
Abstract:
Recent generative adversarial networks (GANs) are able to generate impressive photo-realistic images. However, controllable generation with GANs remains a challenging research problem. Achieving controllable generation requires semantically interpretable and disentangled factors of variation. It is challenging to achieve this goal using simple fixed distributions such as Gaussian distribution. Ins…
▽ More
Recent generative adversarial networks (GANs) are able to generate impressive photo-realistic images. However, controllable generation with GANs remains a challenging research problem. Achieving controllable generation requires semantically interpretable and disentangled factors of variation. It is challenging to achieve this goal using simple fixed distributions such as Gaussian distribution. Instead, we propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Self-training provides an iterative feedback in the GAN training, from the discriminator to the generator, and progressively improves the proposal of the latent codes as training proceeds. The latent codes are sampled from a latent variable model that is learned in the feature space of the discriminator. We consider a normalized independent component analysis model and learn its parameters through tensor factorization of the higher-order moments. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder, and is able to discover semantically meaningful latent codes without any supervision. We demonstrate empirically on both cars and faces datasets that each group of elements in the learned code controls a mode of variation with a semantic meaning, e.g. pose or background change. We also demonstrate with quantitative metrics that our method generates better results compared to other approaches.
△ Less
Submitted 2 May, 2021; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Neural Networks with Recurrent Generative Feedback
Authors:
Yujia Huang,
James Gornet,
Sihui Dai,
Zhiding Yu,
Tan Nguyen,
Doris Y. Tsao,
Anima Anandkumar
Abstract:
Neural networks are vulnerable to input perturbations such as additive noise and adversarial attacks. In contrast, human perception is much more robust to such perturbations. The Bayesian brain hypothesis states that human brains use an internal generative model to update the posterior beliefs of the sensory input. This mechanism can be interpreted as a form of self-consistency between the maximum…
▽ More
Neural networks are vulnerable to input perturbations such as additive noise and adversarial attacks. In contrast, human perception is much more robust to such perturbations. The Bayesian brain hypothesis states that human brains use an internal generative model to update the posterior beliefs of the sensory input. This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of an internal generative model and the external environment. Inspired by such hypothesis, we enforce self-consistency in neural networks by incorporating generative recurrent feedback. We instantiate this design on convolutional neural networks (CNNs). The proposed framework, termed Convolutional Neural Networks with Feedback (CNN-F), introduces a generative feedback with latent variables to existing CNN architectures, where consistent predictions are made through alternating MAP inference under a Bayesian framework. In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.
△ Less
Submitted 10 November, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
CovidCare: Transferring Knowledge from Existing EMR to Emerging Epidemic for Interpretable Prognosis
Authors:
Liantao Ma,
Xinyu Ma,
Junyi Gao,
Chaohe Zhang,
Zhihao Yu,
Xianfeng Jiao,
Wenjie Ruan,
Yasha Wang,
Wen Tang,
Jiangtao Wang
Abstract:
Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from systemic life-threatening problems and need to be carefully monitored in ICUs. Thus the intelligent prognosis is in an urgent need to assist physicians to take an early intervention, prevent the adverse outcome, and optimize the medical resource allocatio…
▽ More
Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from systemic life-threatening problems and need to be carefully monitored in ICUs. Thus the intelligent prognosis is in an urgent need to assist physicians to take an early intervention, prevent the adverse outcome, and optimize the medical resource allocation. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, rarity of the cases, and privacy concerns. In this paper, we propose a deep-learning-based approach, CovidCare, which leverages the existing electronic medical records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data via transfer learning. The transferred parameters are further trained to imitate the teacher model's representation behavior based on knowledge distillation, which embeds the health status more comprehensively in the source dataset. We conduct the length of stay prediction experiments for patients on a real-world COVID-19 dataset. The experiment results indicate that our proposed model consistently outperforms the comparative baseline methods. CovidCare also reveals that, 1) hs-cTnI, hs-CRP and Platelet Counts are the most fatal biomarkers, whose abnormal values usually indicate emergency adverse outcome. 2) Normal values of gamma-GT, AP and eGFR indicate the overall improvement of health. The medical findings extracted by CovidCare are empirically confirmed by human experts and medical literatures.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Automated Synthetic-to-Real Generalization
Authors:
Wuyang Chen,
Zhiding Yu,
Zhangyang Wang,
Anima Anandkumar
Abstract:
Models trained on synthetic images often face degraded generalization to real data. As a convention, these models are often initialized with ImageNet pre-trained representation. Yet the role of ImageNet knowledge is seldom discussed despite common practices that leverage this knowledge to maintain the generalization ability. An example is the careful hand-tuning of early stop** and layer-wise le…
▽ More
Models trained on synthetic images often face degraded generalization to real data. As a convention, these models are often initialized with ImageNet pre-trained representation. Yet the role of ImageNet knowledge is seldom discussed despite common practices that leverage this knowledge to maintain the generalization ability. An example is the careful hand-tuning of early stop** and layer-wise learning rates, which is shown to improve synthetic-to-real generalization but is also laborious and heuristic. In this work, we explicitly encourage the synthetically trained model to maintain similar representations with the ImageNet pre-trained model, and propose a \textit{learning-to-optimize (L2O)} strategy to automate the selection of layer-wise learning rates. We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
Estimates on Learning Rates for Multi-Penalty Distribution Regression
Authors:
Zhan Yu,
Daniel W. C. Ho
Abstract:
This paper is concerned with functional learning by utilizing two-stage sampled distribution regression. We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory. The algorithm aims at regressing to real valued outputs from probability measures. The theoretical analysis on distribution regression is far from maturity and quite challenging…
▽ More
This paper is concerned with functional learning by utilizing two-stage sampled distribution regression. We study a multi-penalty regularization algorithm for distribution regression under the framework of learning theory. The algorithm aims at regressing to real valued outputs from probability measures. The theoretical analysis on distribution regression is far from maturity and quite challenging, since only second stage samples are observable in practical setting. In the algorithm, to transform information from samples, we embed the distributions to a reproducing kernel Hilbert space $\mathcal{H}_K$ associated with Mercer kernel $K$ via mean embedding technique. The main contribution of the paper is to present a novel multi-penalty regularization algorithm to capture more features of distribution regression and derive optimal learning rates for the algorithm. The work also derives learning rates for distribution regression in the nonstandard setting $f_ρ\notin\mathcal{H}_K$, which is not explored in existing literature. Moreover, we propose a distribution regression-based distributed learning algorithm to face large-scale data or information challenge. The optimal learning rates are derived for the distributed learning algorithm. By providing new algorithms and showing their learning rates, we improve the existing work in different aspects in the literature.
△ Less
Submitted 28 November, 2023; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Reciprocal Adversarial Learning via Characteristic Functions
Authors:
Shengxi Li,
Zeyang Yu,
Min Xiang,
Danilo Mandic
Abstract:
Generative adversarial nets (GANs) have become a preferred tool for tasks involving complicated distributions. To stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. This provides extensive IPM-GANs with theoretical support for basically comparing moments in an embedded domain of the \textit{cri…
▽ More
Generative adversarial nets (GANs) have become a preferred tool for tasks involving complicated distributions. To stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. This provides extensive IPM-GANs with theoretical support for basically comparing moments in an embedded domain of the \textit{critic}. We generalise this by comparing the distributions rather than their moments via a powerful tool, i.e., the characteristic function (CF), which uniquely and universally comprising all the information about a distribution. For rigour, we first establish the physical meaning of the phase and amplitude in CF, and show that this provides a feasible way of balancing the accuracy and diversity of generation. We then develop an efficient sampling strategy to calculate the CFs. Within this framework, we further prove an equivalence between the embedded and data domains when a reciprocal exists, where we naturally develop the GAN in an auto-encoder structure, in a way of comparing everything in the embedded space (a semantically meaningful manifold). This efficient structure uses only two modules, together with a simple training strategy, to achieve bi-directionally generating clear images, which is referred to as the reciprocal CF GAN (RCF-GAN). Experimental results demonstrate the superior performances of the proposed RCF-GAN in terms of both generation and reconstruction.
△ Less
Submitted 23 October, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Deep Dimension Reduction for Supervised Representation Learning
Authors:
Jian Huang,
Yuling Jiao,
Xu Liao,
** Liu,
Zhou Yu
Abstract:
The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low dimensionality and disentanglement are some of the most essential ones. We propose a deep dimension reduction approach to learning representations with these characteri…
▽ More
The goal of supervised representation learning is to construct effective data representations for prediction. Among all the characteristics of an ideal nonparametric representation of high-dimensional complex data, sufficiency, low dimensionality and disentanglement are some of the most essential ones. We propose a deep dimension reduction approach to learning representations with these characteristics. The proposed approach is a nonparametric generalization of the sufficient dimension reduction method. We formulate the ideal representation learning task as that of finding a nonparametric representation that minimizes an objective function characterizing conditional independence and promoting disentanglement at the population level. We then estimate the target representation at the sample level nonparametrically using deep neural networks. We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero. Our extensive numerical experiments using simulated and real benchmark data demonstrate that the proposed methods have better performance than several existing dimension reduction methods and the standard deep learning models in the context of classification and regression.
△ Less
Submitted 1 September, 2022; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Looking back to lower-level information in few-shot learning
Authors:
Zhongjie Yu,
Sebastian Raschka
Abstract:
Humans are capable of learning new concepts from small numbers of examples. In contrast, supervised deep learning models usually lack the ability to extract reliable predictive rules from limited data scenarios when attempting to classify new examples. This challenging scenario is commonly known as few-shot learning. Few-shot learning has garnered increased attention in recent years due to its sig…
▽ More
Humans are capable of learning new concepts from small numbers of examples. In contrast, supervised deep learning models usually lack the ability to extract reliable predictive rules from limited data scenarios when attempting to classify new examples. This challenging scenario is commonly known as few-shot learning. Few-shot learning has garnered increased attention in recent years due to its significance for many real-world problems. Recently, new methods relying on meta-learning paradigms combined with graph-based structures, which model the relationship between examples, have shown promising results on a variety of few-shot classification tasks. However, existing work on few-shot learning is only focused on the feature embeddings produced by the last layer of the neural network. In this work, we propose the utilization of lower-level, supporting information, namely the feature embeddings of the hidden neural network layers, to improve classifier accuracy. Based on a graph-based meta-learning framework, we develop a method called Looking-Back, where such lower-level information is used to construct additional graphs for label propagation in limited data settings. Our experiments on two popular few-shot learning datasets, miniImageNet and tieredImageNet, show that our method can utilize the lower-level information in the network to improve state-of-the-art classification performance.
△ Less
Submitted 15 July, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.
-
Adaptive-Step Graph Meta-Learner for Few-Shot Graph Classification
Authors:
Ning Ma,
Jiajun Bu,
Jieyu Yang,
Zhen Zhang,
Chengwei Yao,
Zhi Yu,
Sheng Zhou,
Xifeng Yan
Abstract:
Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protei…
▽ More
Graph classification aims to extract accurate information from graph-structured data for classification and is becoming more and more important in graph learning community. Although Graph Neural Networks (GNNs) have been successfully applied to graph classification tasks, most of them overlook the scarcity of labeled graph data in many applications. For example, in bioinformatics, obtaining protein graph labels usually needs laborious experiments. Recently, few-shot learning has been explored to alleviate this problem with only given a few labeled graph samples of test classes. The shared sub-structures between training classes and test classes are essential in few-shot graph classification. Exiting methods assume that the test classes belong to the same set of super-classes clustered from training classes. However, according to our observations, the label spaces of training classes and test classes usually do not overlap in real-world scenario. As a result, the existing methods don't well capture the local structures of unseen test classes. To overcome the limitation, in this paper, we propose a direct method to capture the sub-structures with well initialized meta-learner within a few adaptation steps. More specifically, (1) we propose a novel framework consisting of a graph meta-learner, which uses GNNs based modules for fast adaptation on graph data, and a step controller for the robustness and generalization of meta-learner; (2) we provide quantitative analysis for the framework and give a graph-dependent upper bound of the generalization error based on our framework; (3) the extensive experiments on real-world datasets demonstrate that our framework gets state-of-the-art results on several few-shot graph classification tasks compared to baselines.
△ Less
Submitted 23 June, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.