-
Objective Bayesian analysis for the generalized exponential distribution
Authors:
Aojun Li,
Keying Ye,
Min Wang
Abstract:
In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We propose an efficient sampling algorithm via the generalized ratio-of-uniforms method to draw samples for making posterior inference. We carry out simulation studies…
▽ More
In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We propose an efficient sampling algorithm via the generalized ratio-of-uniforms method to draw samples for making posterior inference. We carry out simulation studies to assess the finite-sample performance of the proposed Bayesian approach. Finally, a real-data application is provided for illustrative purposes.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Heterogeneous Domain Adaptation for IoT Intrusion Detection: A Geometric Graph Alignment Approach
Authors:
Jiashu Wu,
Hao Dai,
Yang Wang,
Kejiang Ye,
Chengzhong Xu
Abstract:
Data scarcity hinders the usability of data-dependent algorithms when tackling IoT intrusion detection (IID). To address this, we utilise the data rich network intrusion detection (NID) domain to facilitate more accurate intrusion detection for IID domains. In this paper, a Geometric Graph Alignment (GGA) approach is leveraged to mask the geometric heterogeneities between domains for better intrus…
▽ More
Data scarcity hinders the usability of data-dependent algorithms when tackling IoT intrusion detection (IID). To address this, we utilise the data rich network intrusion detection (NID) domain to facilitate more accurate intrusion detection for IID domains. In this paper, a Geometric Graph Alignment (GGA) approach is leveraged to mask the geometric heterogeneities between domains for better intrusion knowledge transfer. Specifically, each intrusion domain is formulated as a graph where vertices and edges represent intrusion categories and category-wise interrelationships, respectively. The overall shape is preserved via a confused discriminator incapable to identify adjacency matrices between different intrusion domain graphs. A rotation avoidance mechanism and a centre point matching mechanism is used to avoid graph misalignment due to rotation and symmetry, respectively. Besides, category-wise semantic knowledge is transferred to act as vertex-level alignment. To exploit the target data, a pseudo-label election mechanism that jointly considers network prediction, geometric property and neighbourhood information is used to produce fine-grained pseudo-label assignment. Upon aligning the intrusion graphs geometrically from different granularities, the transferred intrusion knowledge can boost IID performance. Comprehensive experiments on several intrusion datasets demonstrate state-of-the-art performance of the GGA approach and validate the usefulness of GGA constituting components.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Authors:
Zonglin Li,
Chong You,
Srinadh Bhojanapalli,
Daliang Li,
Ankit Singh Rawat,
Sashank J. Reddi,
Ke Ye,
Felix Chern,
Felix Yu,
Ruiqi Guo,
Sanjiv Kumar
Abstract:
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activation function, and by sparse we mean that on average very few entries (e.g., 3.0% for T5-Base and 6.3% for ViT-B16) are nonzero for each input to MLP…
▽ More
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activation function, and by sparse we mean that on average very few entries (e.g., 3.0% for T5-Base and 6.3% for ViT-B16) are nonzero for each input to MLP. Moreover, larger Transformers with more layers and wider MLP hidden dimensions are sparser as measured by the percentage of nonzero entries. Through extensive experiments we demonstrate that the emergence of sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks, on both training and evaluation data, for Transformers of various configurations, at layers of all depth levels, as well as for other architectures including MLP-mixers and 2-layer MLPs. We show that sparsity also emerges using training datasets with random labels, or with random inputs, or with infinite amount of data, demonstrating that sparsity is not a result of a specific family of datasets. We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers. Moreover, we demonstrate perhaps surprisingly that enforcing an even sparser activation via Top-k thresholding with a small value of k brings a collection of desired but missing properties for Transformers, namely less sensitivity to noisy training data, more robustness to input corruptions, and better calibration for their prediction confidence.
△ Less
Submitted 9 June, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
A Study on the Power Parameter in Power Prior Bayesian Analysis
Authors:
Zifei Han,
Keying Ye,
Min Wang
Abstract:
The power prior and its variations have been proven to be a useful class of informative priors in Bayesian inference due to their flexibility in incorporating the historical information by raising the likelihood of the historical data to a fractional power δ. The derivation of the marginal likelihood based on the original power prior,and its variation, the normalized power prior, introduces a scal…
▽ More
The power prior and its variations have been proven to be a useful class of informative priors in Bayesian inference due to their flexibility in incorporating the historical information by raising the likelihood of the historical data to a fractional power δ. The derivation of the marginal likelihood based on the original power prior,and its variation, the normalized power prior, introduces a scaling factor C(δ) in the form of a prior predictive distribution with powered likelihood. In this paper, we show that the scaling factor might be infinite for some positive δ with conventionally used initial priors, which would change the admissible set of the power parameter. This result seems to have been almost completely ignored in the literature. We then illustrate that such a phenomenon may jeopardize the posterior inference under the power priors when the initial prior of the model parameters is improper. The main findings of this paper suggest that special attention should be paid when the suggested level of borrowing is close to 0, while the actual optimum might be below the suggested value. We use a normal linear model as an example for illustrative purposes.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
Normalized Power Prior Bayesian Analysis
Authors:
Keying Ye,
Zifei Han,
Yuyan Duan,
Tianyu Bai
Abstract:
The elicitation of power priors, based on the availability of historical data, is realized by raising the likelihood function of the historical data to a fractional power δ, which quantifies the degree of discounting of the historical information in making inference with the current data. When δ is not pre-specified and is treated as random, it can be estimated from the data using Bayesian updatin…
▽ More
The elicitation of power priors, based on the availability of historical data, is realized by raising the likelihood function of the historical data to a fractional power δ, which quantifies the degree of discounting of the historical information in making inference with the current data. When δ is not pre-specified and is treated as random, it can be estimated from the data using Bayesian updating paradigm. However, in the original form of the joint power prior Bayesian approach, certain positive constants before the likelihood of the historical data could be multiplied when different settings of sufficient statistics are employed. This would change the power priors with different constants, and hence the likelihood principle is violated. In this article, we investigate a normalized power prior approach which obeys the likelihood principle and is a modified form of the joint power prior. The optimality properties of the normalized power prior in the sense of minimizing the weighted Kullback-Leibler divergence is investigated. By examining the posteriors of several commonly used distributions, we show that the discrepancy between the historical and the current data can be well quantified by the power parameter under the normalized power prior setting. Efficient algorithms to compute the scale factor is also proposed. In addition, we illustrate the use of the normalized power prior Bayesian analysis with three data examples, and provide an implementation with an R package NPP.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
A simple consistent Bayes factor for testing the Kendall rank correlation coefficient
Authors:
Shen Zhang,
Keying Ye,
Min Wang
Abstract:
In this paper, we propose a simple and easy-to-implement Bayesian hypothesis test for the presence of an association, described by Kendall's τcoefficient, between two variables measured on at least an ordinal scale. Owing to the absence of the likelihood functions for the data, we employ the asymptotic sampling distributions of the test statistic as the working likelihoods and then specify a trunc…
▽ More
In this paper, we propose a simple and easy-to-implement Bayesian hypothesis test for the presence of an association, described by Kendall's τcoefficient, between two variables measured on at least an ordinal scale. Owing to the absence of the likelihood functions for the data, we employ the asymptotic sampling distributions of the test statistic as the working likelihoods and then specify a truncated normal prior distribution on the noncentrality parameter of the alternative hypothesis, which results in the Bayes factor available in closed form in terms of the cumulative distribution function of the standard normal distribution. Investigating the asymptotic behavior of the Bayes factor we find the conditions of the priors so that it is consistent to whichever the hypothesis is true. Simulation studies and a real-data application are used to illustrate the effectiveness of the proposed Bayes factor. It deserves mentioning that the proposed method can be easily covered in undergraduate and graduate courses in nonparametric statistics with an emphasis on students' Bayesian thinking for data analysis.
△ Less
Submitted 8 September, 2022; v1 submitted 1 May, 2021;
originally announced May 2021.
-
Intrinsic Gaussian Processes on Manifolds and Their Accelerations by Symmetry
Authors:
Ke Ye,
Mu Niu,
Pokman Cheung,
Zhenwen Dai,
Yuan Liu
Abstract:
Amidst the growing interest in nonparametric regression, we address a significant challenge in Gaussian processes(GP) applied to manifold-based predictors. Existing methods primarily focus on low dimensional constrained domains for heat kernel estimation, limiting their effectiveness in higher-dimensional manifolds. Our research proposes an intrinsic approach for constructing GP on general manifol…
▽ More
Amidst the growing interest in nonparametric regression, we address a significant challenge in Gaussian processes(GP) applied to manifold-based predictors. Existing methods primarily focus on low dimensional constrained domains for heat kernel estimation, limiting their effectiveness in higher-dimensional manifolds. Our research proposes an intrinsic approach for constructing GP on general manifolds such as orthogonal groups, unitary groups, Stiefel manifolds and Grassmannian manifolds. Our methodology estimates the heat kernel by simulating Brownian motion sample paths using the exponential map, ensuring independence from the manifold's embedding. The introduction of our strip algorithm, tailored for manifolds with extra symmetries, and the ball algorithm, designed for arbitrary manifolds, constitutes our significant contribution. Both algorithms are rigorously substantiated through theoretical proofs and numerical testing, with the strip algorithm showcasing remarkable efficiency gains over traditional methods. This intrinsic approach delivers several key advantages, including applicability to high dimensional manifolds, eliminating the requirement for global parametrization or embedding. We demonstrate its practicality through regression case studies (torus knots and eight dimensional projective spaces) and by develo** binary classifiers for real world datasets (gorilla skulls planar images and diffusion tensor images). These classifiers outperform traditional methods, particularly in limited data scenarios.
△ Less
Submitted 31 January, 2024; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Numerical algorithms on the affine Grassmannian
Authors:
Lek-Heng Lim,
Ken Sze-Wai Wong,
Ke Ye
Abstract:
The affine Grassmannian is a noncompact smooth manifold that parameterizes all affine subspaces of a fixed dimension. It is a natural generalization of Euclidean space, points being zero-dimensional affine subspaces. We will realize the affine Grassmannian as a matrix manifold and extend Riemannian optimization algorithms including steepest descent, Newton method, and conjugate gradient, to real-v…
▽ More
The affine Grassmannian is a noncompact smooth manifold that parameterizes all affine subspaces of a fixed dimension. It is a natural generalization of Euclidean space, points being zero-dimensional affine subspaces. We will realize the affine Grassmannian as a matrix manifold and extend Riemannian optimization algorithms including steepest descent, Newton method, and conjugate gradient, to real-valued functions on the affine Grassmannian. Like their counterparts for the Grassmannian, these algorithms are in the style of Edelman--Arias--Smith --- they rely only on standard numerical linear algebra and are readily computable.
△ Less
Submitted 25 June, 2018; v1 submitted 6 July, 2016;
originally announced July 2016.
-
Bayesian detection of embryonic gene expression onset in C. elegans
Authors:
Jie Hu,
Zhongying Zhao,
Hari Krishna Yalamanchili,
Junwen Wang,
Kenny Ye,
Xiaodan Fan
Abstract:
To study how a zygote develops into an embryo with different tissues, large-scale 4D confocal movies of C. elegans embryos have been produced recently by experimental biologists. However, the lack of principled statistical methods for the highly noisy data has hindered the comprehensive analysis of these data sets. We introduced a probabilistic change point model on the cell lineage tree to estima…
▽ More
To study how a zygote develops into an embryo with different tissues, large-scale 4D confocal movies of C. elegans embryos have been produced recently by experimental biologists. However, the lack of principled statistical methods for the highly noisy data has hindered the comprehensive analysis of these data sets. We introduced a probabilistic change point model on the cell lineage tree to estimate the embryonic gene expression onset time. A Bayesian approach is used to fit the 4D confocal movies data to the model. Subsequent classification methods are used to decide a model selection threshold and further refine the expression onset time from the branch level to the specific cell time level. Extensive simulations have shown the high accuracy of our method. Its application on real data yields both previously known results and new findings.
△ Less
Submitted 16 September, 2015;
originally announced September 2015.