Search | arXiv e-print repository

Polarization Holes as an Indicator of Magnetic Field-Angular Momentum Alignment I. Initial Tests

Authors: Lijun Wang, Zhuo Cao, Xiaodan Fan, Hua-bai Li

Abstract: The formation of protostellar disks is still a mystery, largely due to the difficulties in observations that can constrain theories. For example, the 3D alignment between the rotation of the disk and the magnetic fields (B-fields) in the formation environment is critical in some models, but so far impossible to observe. Here, we study the possibility of probing the alignment between B-field and di… ▽ More The formation of protostellar disks is still a mystery, largely due to the difficulties in observations that can constrain theories. For example, the 3D alignment between the rotation of the disk and the magnetic fields (B-fields) in the formation environment is critical in some models, but so far impossible to observe. Here, we study the possibility of probing the alignment between B-field and disk rotation using ``polarization holes'' (PHs). PHs are widely observed and are caused by unresolved B-field structures. With ideal magnetohydrodynamic (MHD) simulations, we demonstrate that different initial alignments between B-field and angular momentum (AM) can result in B-field structures that are distinct enough to produce distinguishable PHs. Thus PHs can potentially serve as probes for alignments between B-field and AM in disk formation. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: accepted by The Astrophysical Journal

arXiv:2403.00600 [pdf, other]

Random Interval Distillation for Detecting Multiple Changes in General Dependent Data

Authors: Xinyuan Fan, Weichi Wu

Abstract: We propose a new and generic approach for detecting multiple change-points in general dependent data, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, our new approach captures the shifts in signal characteristics across diverse dependent data forms including locally s… ▽ More We propose a new and generic approach for detecting multiple change-points in general dependent data, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, our new approach captures the shifts in signal characteristics across diverse dependent data forms including locally stationary high-dimensional time series and dynamic networks with Markov formation. We further propose a range of secondary refinements tailored to various data types to enhance the localization precision. Notably, for univariate time series and low-rank autoregressive networks, our methods achieve the minimax optimality as their independent counterparts. For practical applications, we introduce a clustering-based and data-driven procedure to determine the optimal threshold for signal strength, which is adaptable to a wide array of dependent data scenarios utilizing the connection between RID and clustering. Additionally, our method has been extended to identify kinks and changes in signals characterized by piecewise polynomial trends. We examine the effectiveness and usefulness of our methodology via extensive simulation studies and a real data example, implementing it in the R-package rid. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 59 pages, 5 figures

arXiv:2401.06383 [pdf, other]

Decomposition with Monotone B-splines: Fitting and Testing

Authors: Lijun Wang, Xiaodan Fan, Hongyu Zhao, Jun S. Liu

Abstract: A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise s… ▽ More A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise scenarios. Several theoretical guarantees are established for the proposed approach. Additionally, we present statistics to test the monotonicity of a function based on monotone decomposition, which can better control Type I error and achieve comparable (if not always higher) power compared to existing methods. Finally, we apply the proposed fitting and testing approaches to analyze the single-cell pseudotime trajectory datasets, identifying significant biological insights for non-monotonically expressed genes through Gene Ontology enrichment analysis. The source code implementing the methodology and producing all results is accessible at https://github.com/szcf-weiya/MonotoneDecomposition.jl. △ Less

Submitted 9 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2312.08324 [pdf, other]

Bayesian Nonparametric Clustering with Feature Selection for Spatially Resolved Transcriptomics Data

Authors: Bencong Zhu, Guanyu Hu, Yang Xie, Lin Xu, Xiaodan Fan, Qiwei Li

Abstract: The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These… ▽ More The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These challenges pose obstacles to effective clustering, which is a fundamental problem in SRT data analysis. Current computational approaches often rely on heuristic data preprocessing and arbitrary cluster number prespecification, leading to considerable information loss and consequently, suboptimal downstream analysis. In response to these challenges, we introduce BNPSpace, a novel Bayesian nonparametric spatial clustering framework that directly models SRT count data. BNPSpace facilitates the partitioning of the whole spatial domain, which is characterized by substantial heterogeneity, into homogeneous spatial domains with similar molecular characteristics while identifying a parsimonious set of discriminating genes among different spatial domains. Moreover, BNPSpace incorporates spatial information through a Markov random field prior model, encouraging a smooth and biologically meaningful partition pattern. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2309.07136 [pdf, other]

Masked Transformer for Electrocardiogram Classification

Authors: Ya Zhou, Xiaolin Diao, Yanni Huo, Yang Liu, Xiaohan Fan, Wei Zhao

Abstract: Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Trans… ▽ More Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Transformer for ECG classification (MTECG), a simple yet effective method which significantly outperforms recent state-of-the-art algorithms in ECG classification. Our approach adapts the image-based masked autoencoders to self-supervised representation learning from ECG time series. We utilize a lightweight Transformer for the encoder and a 1-layer Transformer for the decoder. The ECG signal is split into a sequence of non-overlap** segments along the time dimension, and learnable positional embeddings are added to preserve the sequential information. We construct the Fuwai dataset comprising 220,251 ECG recordings with a broad range of diagnoses, annotated by medical experts, to explore the potential of Transformer. A strong pre-training and fine-tuning recipe is proposed from the empirical study. The experiments demonstrate that the proposed method increases the macro F1 scores by 3.4%-27.5% on the Fuwai dataset, 9.9%-32.0% on the PTB-XL dataset, and 9.4%-39.1% on a multicenter dataset, compared to the alternative methods. We hope that this study could direct future research on the application of Transformer to more ECG tasks. △ Less

Submitted 22 April, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

Comments: more experimental results; more implementation details; different abstracts

arXiv:2308.13630 [pdf, other]

Degrees of Freedom: Search Cost and Self-consistency

Authors: Lijun Wang, Hongyu Zhao, Xiaodan Fan

Abstract: Model degrees of freedom ($\df$) is a fundamental concept in statistics because it quantifies the flexibility of a fitting procedure and is indispensable in model selection. The $\df$ is often intuitively equated with the number of independent variables in the fitting procedure. But for adaptive regressions that perform variable selection (e.g., the best subset regressions), the model $\df$ is lar… ▽ More Model degrees of freedom ($\df$) is a fundamental concept in statistics because it quantifies the flexibility of a fitting procedure and is indispensable in model selection. The $\df$ is often intuitively equated with the number of independent variables in the fitting procedure. But for adaptive regressions that perform variable selection (e.g., the best subset regressions), the model $\df$ is larger than the number of selected variables. The excess part has been defined as the \emph{search degrees of freedom} ($\sdf$) to account for model selection. However, this definition is limited since it does not consider fitting procedures in augmented space, such as splines and regression trees; and it does not use the same fitting procedure for $\sdf$ and $\df$. For example, the lasso's $\sdf$ is defined through the \emph{relaxed} lasso's $\df$ instead of the lasso's $\df$. Here we propose a \emph{modified search degrees of freedom} ($\msdf$) to directly account for the cost of searching in the original or augmented space. Since many fitting procedures can be characterized by a linear operator, we define the search cost as the effort to determine such a linear operator. When we construct a linear operator for the lasso via the iterative ridge regression, $\msdf$ offers a new perspective for its search cost. For some complex procedures such as the multivariate adaptive regression splines (MARS), the search cost needs to be pre-determined to serve as a tuning parameter for the procedure itself, but it might be inaccurate. To investigate the inaccurate pre-determined search cost, we develop two concepts, \emph{nominal} $\df$ and \emph{actual} $\df$, and formulate a property named \emph{self-consistency} when there is no gap between the \emph{nominal} $\df$ and the \emph{actual} $\df$. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2307.01748 [pdf, other]

Monotone Cubic B-Splines with a Neural-Network Generator

Authors: Lijun Wang, Xiaodan Fan, Huabai Li, Jun S. Liu

Abstract: We present a method for fitting monotone curves using cubic B-splines, which is equivalent to putting a monotonicity constraint on the coefficients. We explore different ways of enforcing this constraint and analyze their theoretical and empirical properties. We propose two algorithms for solving the spline fitting problem: one that uses standard optimization techniques and one that trains a Multi… ▽ More We present a method for fitting monotone curves using cubic B-splines, which is equivalent to putting a monotonicity constraint on the coefficients. We explore different ways of enforcing this constraint and analyze their theoretical and empirical properties. We propose two algorithms for solving the spline fitting problem: one that uses standard optimization techniques and one that trains a Multi-Layer Perceptrons (MLP) generator to approximate the solutions under various settings and perturbations. The generator approach can speed up the fitting process when we need to solve the problem repeatedly, such as when constructing confidence bands using bootstrap. We evaluate our method against several existing methods, some of which do not use the monotonicity constraint, on some monotone curves with varying noise levels. We demonstrate that our method outperforms the other methods, especially in high-noise scenarios. We also apply our method to analyze the polarization-hole phenomenon during star formation in astrophysics. The source code is accessible at \texttt{\url{https://github.com/szcf-weiya/MonotoneSplines.jl}}. △ Less

Submitted 17 November, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2303.06377 [pdf, other]

A Geometric Statistic for Quantifying Correlation Between Tree-Shaped Datasets

Authors: Shanjun Mao, Xiaodan Fan, Jie Hu

Abstract: The magnitude of Pearson correlation between two scalar random variables can be visually judged from the two-dimensional scatter plot of an independent and identically distributed sample drawn from the joint distribution of the two variables: the closer the points lie to a straight slanting line, the greater the correlation. To the best of our knowledge, similar graphical representation or geometr… ▽ More The magnitude of Pearson correlation between two scalar random variables can be visually judged from the two-dimensional scatter plot of an independent and identically distributed sample drawn from the joint distribution of the two variables: the closer the points lie to a straight slanting line, the greater the correlation. To the best of our knowledge, similar graphical representation or geometric quantification of tree correlation does not exist in the literature although tree-shaped datasets are frequently encountered in various fields, such as academic genealogy tree and embryonic development tree. In this paper, we introduce a geometric statistic to both represent tree correlation intuitively and quantify its magnitude precisely. The theoretical properties of the geometric statistic are provided. Large-scale simulations based on various data distributions demonstrate that the geometric statistic is precise in measuring the tree correlation. Its real application on mathematical genealogy trees also demonstrated its usefulness. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2303.06368 [pdf, other]

Bayesian Inference of Gene Expression Dynamics in Alzheimer Brains

Authors: Shanjun Mao, Xiaodan Fan

Abstract: Alzheimer's disease (AD) is a serious neurodegenerative disease consisting of four stages where the illness gets progressively worse. It is of great significance to detect the gene regulatory mechanism as AD progresses and, thus, to help us better understand the causes of AD and find ways to treat or control AD. There are numerous researches to conduct this kind of study. However, the majority of… ▽ More Alzheimer's disease (AD) is a serious neurodegenerative disease consisting of four stages where the illness gets progressively worse. It is of great significance to detect the gene regulatory mechanism as AD progresses and, thus, to help us better understand the causes of AD and find ways to treat or control AD. There are numerous researches to conduct this kind of study. However, the majority of methods are processing region by region of brain, stage by stage of AD, and then compare the results to detect changes. It is unclear how to combine these three dimensions, i.e., gene, region and stage, simultaneously to study gene expression dynamics of AD. This is the motivation of our research. In our study, we propose a statistical model of increments to clarify the relationship between gene expression in adjacent stages, so that we could better estimate the missing data we want and obtain a complete reasonable dynamic regulatory network model. Simulations are conducted to validate the statistical power of our algorithm. Moreover, a real data analysis shows that our method can capture the dynamic gene regulatory relationships among this complex brain data. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2303.01626 [pdf, other]

Vine dependence graphs with latent variables as summaries for gene expression data

Authors: Xinyao Fan, Harry Joe, Yong** Park

Abstract: The advent of high-throughput sequencing technologies has lead to vast comparative genome sequences. The construction of gene-gene interaction networks or dependence graphs on the genome scale is vital for understanding the regulation of biological processes. Different dependence graphs can provide different information. Some existing methods for dependence graphs based on high-order partial cor… ▽ More The advent of high-throughput sequencing technologies has lead to vast comparative genome sequences. The construction of gene-gene interaction networks or dependence graphs on the genome scale is vital for understanding the regulation of biological processes. Different dependence graphs can provide different information. Some existing methods for dependence graphs based on high-order partial correlations are sparse and not informative when there are latent variables that can explain much of the dependence in groups of genes. Other methods of dependence graphs based on correlations and first-order partial correlations might have dense graphs. When genes can be divided into groups with stronger within group dependence in gene expression than between group dependence, we present a dependence graph based on truncated vines with latent variables that makes use of group information and low-order partial correlations. The graphs are not dense, and the genes that might be more central have more neighbors in the vine dependency graph. We demonstrate the use of our dependence graph construction on two RNA-seq data sets -- yeast and prostate cancer. There is some biological evidence to support the relationship between genes in the resulting dependence graphs. A flexible framework is provided for building dependence graphs via low-order partial correlations and formation of groups, leading to graphs that are not too sparse or dense. We anticipate that this approach will help to identify groups that might be central to different biological functions. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 24pages, 7 figures

arXiv:2302.09921 [pdf, other]

Free-Form Variational Inference for Gaussian Process State-Space Models

Authors: Xuhui Fan, Edwin V. Bonilla, Terence J. O'Kane, Scott A. Sisson

Abstract: Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to modeling the dynamics of a latent state, which is observed at discrete-time points via a likelihood model. However, inference in GPSSMs is computationally and statistically challenging due to the large number of latent variables in the model and the strong temporal dependencies between them. In this paper, w… ▽ More Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to modeling the dynamics of a latent state, which is observed at discrete-time points via a likelihood model. However, inference in GPSSMs is computationally and statistically challenging due to the large number of latent variables in the model and the strong temporal dependencies between them. In this paper, we propose a new method for inference in Bayesian GPSSMs, which overcomes the drawbacks of previous approaches, namely over-simplified assumptions, and high computational requirements. Our method is based on free-form variational inference via stochastic gradient Hamiltonian Monte Carlo within the inducing-variable formalism. Furthermore, by exploiting our proposed variational distribution, we provide a collapsed extension of our method where the inducing variables are marginalized analytically. We also showcase results when combining our framework with particle MCMC methods. We show that, on six real-world datasets, our approach can learn transition dynamics and latent states more accurately than competing methods. △ Less

Submitted 16 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Updating to final version to appear in the proceedings

arXiv:2302.06807 [pdf, other]

Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space

Authors: Xiran Fan, Chun-Hao Yang, Baba C. Vemuri

Abstract: Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we… ▽ More Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA. △ Less

Submitted 28 September, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: To appear at Neural Information Processing Systems (NeurIPS) 2023

arXiv:2211.16475 [pdf, other]

doi 10.1002/sim.9414

Robust structured heterogeneity analysis approach for high-dimensional data

Authors: Yifan Sun, Ziye Luo, Xinyan Fan

Abstract: Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important… ▽ More Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlap** group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlap** cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grou** stability. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 35 pages, 6 figures

Journal ref: Statistics in Medicine, 41: 3229-3259, 2022

arXiv:2211.15152 [pdf, other]

doi 10.1002/bimj.202100119

Regression-based heterogeneity analysis to identify overlap** subgroup structure in high-dimensional data

Authors: Ziye Luo, Xinyue Yao, Yifan Sun, Xinyan Fan

Abstract: Heterogeneity is a hallmark of complex diseases. Regression-based heterogeneity analysis, which is directly concerned with outcome-feature relationships, has led to a deeper understanding of disease biology. Such an analysis identifies the underlying subgroup structure and estimates the subgroup-specific regression coefficients. However, most of the existing regression-based heterogeneity analyses… ▽ More Heterogeneity is a hallmark of complex diseases. Regression-based heterogeneity analysis, which is directly concerned with outcome-feature relationships, has led to a deeper understanding of disease biology. Such an analysis identifies the underlying subgroup structure and estimates the subgroup-specific regression coefficients. However, most of the existing regression-based heterogeneity analyses can only address disjoint subgroups; that is, each sample is assigned to only one subgroup. In reality, some samples have multiple labels, for example, many genes have several biological functions, and some cells of pure cell types transition into other types over time, which suggest that their outcome-feature relationships (regression coefficients) can be a mixture of relationships in more than one subgroups, and as a result, the disjoint subgrou** results can be unsatisfactory. To this end, we develop a novel approach to regression-based heterogeneity analysis, which takes into account possible overlaps between subgroups and high data dimensions. A subgroup membership vector is introduced for each sample, which is combined with a loss function. Considering the lack of information arising from small sample sizes, an $l_2$ norm penalty is developed for each membership vector to encourage similarity in its elements. A sparse penalization is also applied for regularized estimation and feature selection. Extensive simulations demonstrate its superiority over direct competitors. The analysis of Cancer Cell Line Encyclopedia data and lung cancer data from The Cancer Genome Atlas shows that the proposed approach can identify an overlap** subgroup structure with favorable performance in prediction and stability. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: 33 pages, 16 figures

Journal ref: Biometrial Journal, 2022

arXiv:2205.14487 [pdf, other]

High-dimensional factor copula models with estimation of latent variables

Authors: Xinyao Fan, Harry Joe

Abstract: Factor models are a parsimonious way to explain the dependence of variables using several latent variables. In Gaussian 1-factor and structural factor models (such as bi-factor, oblique factor) and their factor copula counterparts, factor scores or proxies are defined as conditional expectations of latent variables given the observed variables. With mild assumptions, the proxies are consistent for… ▽ More Factor models are a parsimonious way to explain the dependence of variables using several latent variables. In Gaussian 1-factor and structural factor models (such as bi-factor, oblique factor) and their factor copula counterparts, factor scores or proxies are defined as conditional expectations of latent variables given the observed variables. With mild assumptions, the proxies are consistent for corresponding latent variables as the sample size and the number of observed variables linked to each latent variable go to infinity. When the bivariate copulas linking observed variables to latent variables are not assumed in advance, sequential procedures are used for latent variables estimation, copula family selection and parameter estimation. The use of proxy variables for factor copulas means that approximate log-likelihoods can be used to estimate copula parameters with less computational effort for numerical integration. △ Less

Submitted 28 May, 2022; originally announced May 2022.

Comments: 29pages 1 figure

arXiv:2205.07294 [pdf, ps, other]

Mutual Influence Regression Model

Authors: Xinyan Fan, Wei Lan, Tao Zou, Chih-Ling Tsai

Abstract: In this article, we propose the mutual influence regression model (MIR) to establish the relationship between the mutual influence matrix of actors and a set of similarity matrices induced by their associated attributes. This model is able to explain the heterogeneous structure of the mutual influence matrix by extending the commonly used spatial autoregressive model while allowing it to change wi… ▽ More In this article, we propose the mutual influence regression model (MIR) to establish the relationship between the mutual influence matrix of actors and a set of similarity matrices induced by their associated attributes. This model is able to explain the heterogeneous structure of the mutual influence matrix by extending the commonly used spatial autoregressive model while allowing it to change with time. To facilitate making inferences with MIR, we establish parameter estimation, weight matrices selection and model testing. Specifically, we employ the quasi-maximum likelihood estimation method to estimate unknown regression coefficients, and demonstrate that the resulting estimator is asymptotically normal without imposing the normality assumption and while allowing the number of similarity matrices to diverge. In addition, an extended BIC-type criterion is introduced for selecting relevant matrices from the divergent number of similarity matrices. To assess the adequacy of the proposed model, we further propose an influence matrix test and develop a novel approach in order to obtain the limiting distribution of the test. Finally, we extend the model to accommodate endogenous weight matrices, exogenous covariates, and both individual and time fixed effects, to broaden the usefulness of MIR. The simulation studies support our theoretical findings, and a real example is presented to illustrate the usefulness of the proposed MIR model. △ Less

Submitted 15 May, 2022; originally announced May 2022.

arXiv:2205.07174 [pdf, ps, other]

Covariance Model with General Linear Structure and Divergent Parameters

Authors: Xinyan Fan, Wei Lan, Tao Zou, Chih-Ling Tsai

Abstract: For estimating the large covariance matrix with a limited sample size, we propose the covariance model with general linear structure (CMGL) by employing the general link function to connect the covariance of the continuous response vector to a linear combination of weight matrices. Without assuming the distribution of responses, and allowing the number of parameters associated with weight matrices… ▽ More For estimating the large covariance matrix with a limited sample size, we propose the covariance model with general linear structure (CMGL) by employing the general link function to connect the covariance of the continuous response vector to a linear combination of weight matrices. Without assuming the distribution of responses, and allowing the number of parameters associated with weight matrices to diverge, we obtain the quasi-maximum likelihood estimators (QMLE) of parameters and show their asymptotic properties. In addition, an extended Bayesian information criteria (EBIC) is proposed to select relevant weight matrices, and the consistency of EBIC is demonstrated. Under the identity link function, we introduce the ordinary least squares estimator (OLS) that has the closed form. Hence, its computational burden is reduced compared to QMLE, and the theoretical properties of OLS are also investigated. To assess the adequacy of the link function, we further propose the quasi-likelihood ratio test and obtain its limiting distribution. Simulation studies are presented to assess the performance of the proposed methods, and the usefulness of generalized covariance models is illustrated by an analysis of the US stock market. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2205.07106 [pdf, ps, other]

Robust Regularized Low-Rank Matrix Models for Regression and Classification

Authors: Hsin-Hsiung Huang, Feng Yu, Xing Fan, Teng Zhang

Abstract: While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regulariz… ▽ More While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional and noisy matrix-valued predictors. To address these issues, this paper proposes a framework of matrix variate regression models based on a rank constraint, vector regularization (e.g., sparsity), and a general loss function with three special cases considered: ordinary matrix regression, robust matrix regression, and matrix logistic regression. We also propose an alternating projected gradient descent algorithm. Based on analyzing our objective functions on manifolds with bounded curvature, we show that the algorithm is guaranteed to converge, all accumulation points of the iterates have estimation errors in the order of $O(1/\sqrt{n})$ asymptotically and substantially attaining the minimax rate. Our theoretical analysis can be applied to general optimization problems on manifolds with bounded curvature and can be considered an important technical contribution to this work. We validate the proposed method through simulation studies and real image data examples. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: 26 pages, 7 figures

MSC Class: 62J12

arXiv:2112.03402 [pdf, other]

Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design

Authors: Xiran Fan, Chun-Hao Yang, Baba C. Vemuri

Abstract: Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in develo** these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use loca… ▽ More Hyperbolic neural networks have been popular in the recent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in develo** these networks lies in the nonlinearity of the embedding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group. Most existing methods (with some exceptions) use local linearization to define a variety of operations paralleling those used in traditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) followed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower-dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transformations. This projection is computationally efficient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it allows for weight sharing. The nested hyperbolic space representation is the core component of our network and therefore, we first compare this ensuing nested hyperbolic space representation with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projection. Finally, we present experiments demonstrating comparative performance of our network on several publicly available data sets. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 19 pages, 6 figures

arXiv:2110.12567 [pdf, other]

Alignment Attention by Matching Key and Query Distributions

Authors: Shujian Zhang, Xinjie Fan, Huangjie Zheng, Korawat Tanwisuth, Mingyuan Zhou

Abstract: The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and… ▽ More The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. The resulting alignment attention networks can be optimized as an unsupervised regularization in the existing attention framework. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our approach on graph attention and visual question answering, showing the great potential of incorporating our alignment method into various attention-related tasks. △ Less

Submitted 24 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021; Our code is publicly available at https://github.com/szhang42/alignment_attention

arXiv:2110.12024 [pdf, other]

A Prototype-Oriented Framework for Unsupervised Domain Adaptation

Authors: Korawat Tanwisuth, Xinjie Fan, Huangjie Zheng, Shujian Zhang, Hao Zhang, Bo Chen, Mingyuan Zhou

Abstract: Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target… ▽ More Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target features with them. We demonstrate the general applicability of our method on a wide range of scenarios, including single-source, multi-source, class-imbalance, and source-private domain adaptation. Requiring no additional model parameters and having a moderate increase in computation over the source model alone, the proposed method achieves competitive performance with state-of-the-art methods. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021

arXiv:2107.07965 [pdf, ps, other]

Self-normalized Cramer moderate deviations for a supercritical Galton-Watson process

Authors: Xiequan Fan, Qi-Man Shao

Abstract: Let $(Z_n)_{n\geq0}$ be a supercritical Galton-Watson process. Consider the Lotka-Nagaev estimator for the offspring mean. In this paper, we establish self-normalized Cramér type moderate deviations and Berry-Esseen's bounds for the Lotka-Nagaev estimator. The results are believed to be optimal or near optimal. Let $(Z_n)_{n\geq0}$ be a supercritical Galton-Watson process. Consider the Lotka-Nagaev estimator for the offspring mean. In this paper, we establish self-normalized Cramér type moderate deviations and Berry-Esseen's bounds for the Lotka-Nagaev estimator. The results are believed to be optimal or near optimal. △ Less

Submitted 16 July, 2021; originally announced July 2021.

MSC Class: primary 60J80; 60F10; secondary 62F03; 62F12

Journal ref: Journal of Applied Probability 2023

arXiv:2106.05251 [pdf, other]

Bayesian Attention Belief Networks

Authors: Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

Abstract: Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierar… ▽ More Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our method on neural machine translation and visual question answering, showing great potential of incorporating our method into various attention-related tasks. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: ICML 2021

arXiv:2105.06715 [pdf, other]

Maximizing Mutual Information Across Feature and Topology Views for Learning Graph Representations

Authors: Xiaolong Fan, Maoguo Gong, Yue Wu, Hao Li

Abstract: Recently, maximizing mutual information has emerged as a powerful method for unsupervised graph representation learning. The existing methods are typically effective to capture information from the topology view but ignore the feature view. To circumvent this issue, we propose a novel approach by exploiting mutual information maximization across feature and topology views. Specifically, we first u… ▽ More Recently, maximizing mutual information has emerged as a powerful method for unsupervised graph representation learning. The existing methods are typically effective to capture information from the topology view but ignore the feature view. To circumvent this issue, we propose a novel approach by exploiting mutual information maximization across feature and topology views. Specifically, we first utilize a multi-view representation learning module to better capture both local and global information content across feature and topology views on graphs. To model the information shared by the feature and topology spaces, we then develop a common representation learning module using mutual information maximization and reconstruction loss minimization. To explicitly encourage diversity between graph representations from the same view, we also introduce a disagreement regularization to enlarge the distance between representations from the same view. Experiments on synthetic and real-world datasets demonstrate the effectiveness of integrating feature and topology views. In particular, compared with the previous supervised methods, our proposed method can achieve comparable or even better performance under the unsupervised representation and linear evaluation protocol. △ Less

Submitted 11 October, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

arXiv:2102.10226 [pdf, ps, other]

ALMA: Alternating Minimization Algorithm for Clustering Mixture Multilayer Network

Authors: Xing Fan, Marianna Pensky, Feng Yu, Teng Zhang

Abstract: The paper considers a Mixture Multilayer Stochastic Block Model (MMLSBM), where layers can be partitioned into groups of similar networks, and networks in each group are equipped with a distinct Stochastic Block Model. The goal is to partition the multilayer network into clusters of similar layers, and to identify communities in those layers. **g et al. (2020) introduced the MMLSBM and developed… ▽ More The paper considers a Mixture Multilayer Stochastic Block Model (MMLSBM), where layers can be partitioned into groups of similar networks, and networks in each group are equipped with a distinct Stochastic Block Model. The goal is to partition the multilayer network into clusters of similar layers, and to identify communities in those layers. **g et al. (2020) introduced the MMLSBM and developed a clustering methodology, TWIST, based on regularized tensor decomposition. The present paper proposes a different technique, an alternating minimization algorithm (ALMA), that aims at simultaneous recovery of the layer partition, together with estimation of the matrices of connection probabilities of the distinct layers. Compared to TWIST, ALMA achieves higher accuracy both theoretically and numerically. △ Less

Submitted 12 October, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

arXiv:2102.08685 [pdf, ps, other]

doi 10.1016/j.spa.2022.07.002

Deviation inequalities for stochastic approximation by averaging

Authors: Xiequan Fan, Pierre Alquier, Paul Doukhan

Abstract: We introduce a class of Markov chains, that contains the model of stochastic approximation by averaging and non-averaging. Using martingale approximation method, we establish various deviation inequalities for separately Lipschitz functions of such a chain, with different moment conditions on some dominating random variables of martingale differences.Finally, we apply these inequalities to the sto… ▽ More We introduce a class of Markov chains, that contains the model of stochastic approximation by averaging and non-averaging. Using martingale approximation method, we establish various deviation inequalities for separately Lipschitz functions of such a chain, with different moment conditions on some dominating random variables of martingale differences.Finally, we apply these inequalities to the stochastic approximation by averaging and empirical risk minimisation. △ Less

Submitted 18 February, 2022; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: 35 pages

MSC Class: 60G42; 60J05; 60F10; 60E15

Journal ref: Stochastic Processes and their Applications 152 (2022)

arXiv:2010.10604 [pdf, other]

Bayesian Attention Modules

Authors: Xinjie Fan, Shujian Zhang, Bo Chen, Mingyuan Zhou

Abstract: Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main re… ▽ More Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2009.14308 [pdf, other]

Attention that does not Explain Away

Authors: Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

Abstract: Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. Following a probabilistic view of the attention via the Gaussian mixture model, we find empir… ▽ More Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. Following a probabilistic view of the attention via the Gaussian mixture model, we find empirical evidence that the Transformer attention tends to "explain away" certain input neurons. To compensate for this, we propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect without introducing significant computational or memory cost. Empirically, we show that the new attention schemes result in improved performance on several well-known benchmarks. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.08868 [pdf]

doi 10.1371/journal.pcbi.1009291

Review of Machine-Learning Methods for RNA Secondary Structure Prediction

Authors: Qi Zhao, Zheng Zhao, Xiaoya Fan, Zhengwei Yuan, Qian Mao, Yudong Yao

Abstract: Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagn… ▽ More Secondary structure plays an important role in determining the function of non-coding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine-learning technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on machine-learning technologies and a tabularized summary of the most important methods in this field. The current pending issues in the field of RNA secondary structure prediction and future trends are also discussed. △ Less

Submitted 31 August, 2020; originally announced September 2020.

Comments: 25 pages, 5 figures, 1 table

MSC Class: I.2.0 General

arXiv:2008.03628 [pdf, other]

Appearance-free Tripartite Matching for Multiple Object Tracking

Authors: Lijun Wang, Yanting Zhu, Jue Shi, Xiaodan Fan

Abstract: Multiple Object Tracking (MOT) detects the trajectories of multiple objects given an input video. It has become more and more important for various research and industry areas, such as cell tracking for biomedical research and human tracking in video surveillance. Most existing algorithms depend on the uniqueness of the object's appearance, and the dominating bipartite matching scheme ignores the… ▽ More Multiple Object Tracking (MOT) detects the trajectories of multiple objects given an input video. It has become more and more important for various research and industry areas, such as cell tracking for biomedical research and human tracking in video surveillance. Most existing algorithms depend on the uniqueness of the object's appearance, and the dominating bipartite matching scheme ignores the speed smoothness. Although several methods have incorporated the velocity smoothness for tracking, they either fail to pursue global smooth velocity or are often trapped in local optimums. We focus on the general MOT problem regardless of the appearance and propose an appearance-free tripartite matching to avoid the irregular velocity problem of the bipartite matching. The tripartite matching is formulated as maximizing the likelihood of the state vectors constituted of the position and velocity of objects, which results in a chain-dependent structure. We resort to the dynamic programming algorithm to find such a maximum likelihood estimate. To overcome the high computational cost induced by the vast search space of dynamic programming when many objects are to be tracked, we decompose the space by the number of disappearing objects and propose a reduced-space approach by truncating the decomposition. Extensive simulations have shown the superiority and efficiency of our proposed method, and the comparisons with top methods on Cell Tracking Challenge also demonstrate our competence. We also applied our method to track the motion of natural killer cells around tumor cells in a cancer study.\footnote{The source code is available on \url{https://github.com/szcf-weiya/TriMatchMOT} △ Less

Submitted 7 October, 2021; v1 submitted 8 August, 2020; originally announced August 2020.

Comments: 36 pages, 14 figures

arXiv:2007.07203 [pdf, other]

Deep Retrieval: Learning A Retrievable Structure for Large-Scale Recommendations

Authors: Weihao Gao, Xiangjun Fan, Chong Wang, Jiankai Sun, Kai Jia, Wenzhi Xiao, Ruofan Ding, Xingyan Bin, Hui Yang, Xiaobing Liu

Abstract: One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to lear… ▽ More One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to learn a retrievable structure directly with user-item interaction data (e.g. clicks) without resorting to the Euclidean space assumption in ANN algorithms. DR's structure encodes all candidate items into a discrete latent space. Those latent codes for the candidates are model parameters and learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the structure is performed to retrieve the top candidates for reranking. Empirically, we first demonstrate that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline on two public datasets. Moreover, we show that, in a live production recommendation system, a deployed DR approach significantly outperforms a well-tuned ANN baseline in terms of engagement metrics. To the best of our knowledge, DR is among the first non-ANN algorithms successfully deployed at the scale of hundreds of millions of items for industrial recommendation systems. △ Less

Submitted 18 May, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

Comments: 9 pages, 6 figures

arXiv:2003.00269 [pdf, other]

Online Binary Space Partitioning Forests

Authors: Xuhui Fan, Bin Li, Scott A. Sisson

Abstract: The Binary Space Partitioning-Tree~(BSP-Tree) process was recently proposed as an efficient strategy for space partitioning tasks. Because it uses more than one dimension to partition the space, the BSP-Tree Process is more efficient and flexible than conventional axis-aligned cutting strategies. However, due to its batch learning setting, it is not well suited to large-scale classification and re… ▽ More The Binary Space Partitioning-Tree~(BSP-Tree) process was recently proposed as an efficient strategy for space partitioning tasks. Because it uses more than one dimension to partition the space, the BSP-Tree Process is more efficient and flexible than conventional axis-aligned cutting strategies. However, due to its batch learning setting, it is not well suited to large-scale classification and regression problems. In this paper, we develop an online BSP-Forest framework to address this limitation. With the arrival of new data, the resulting online algorithm can simultaneously expand the space coverage and refine the partition structure, with guaranteed universal consistency for both classification and regression problems. The effectiveness and competitive performance of the online BSP-Forest is verified via simulations on real-world datasets. △ Less

Submitted 29 February, 2020; originally announced March 2020.

arXiv:2002.11394 [pdf, other]

Bayesian Nonparametric Space Partitions: A Survey

Authors: Xuhui Fan, Bin Li, Ling Luo, Scott A. Sisson

Abstract: Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a $D$-dimensional space into a set of blocks. In this way, the data points lie in the same block would share certain kinds of homogeneity. BNSP models can be applied to various areas, such as regression/classification trees, random feature construction, relational modeling, etc. In this survey, we… ▽ More Bayesian nonparametric space partition (BNSP) models provide a variety of strategies for partitioning a $D$-dimensional space into a set of blocks. In this way, the data points lie in the same block would share certain kinds of homogeneity. BNSP models can be applied to various areas, such as regression/classification trees, random feature construction, relational modeling, etc. In this survey, we investigate the current progress of BNSP research through the following three perspectives: models, which review various strategies for generating the partitions in the space and discuss their theoretical foundation `self-consistency'; applications, which cover the current mainstream usages of BNSP models and their potential future practises; and challenges, which identify the current unsolved problems and valuable future research topics. As there are no comprehensive reviews of BNSP literature before, we hope that this survey can induce further exploration and exploitation on this topic. △ Less

Submitted 28 February, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

arXiv:2002.11246 [pdf, other]

Supervised Categorical Metric Learning with Schatten p-Norms

Authors: Xuhui Fan, Eric Gaussier

Abstract: Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development on categorical data still needs further exploration. In this paper, we propose a method, called CPML for \emph{categorical projected metric learning}, that tries to efficiently~(i.e. less computational time and better prediction accuracy) address the problem of metric learning in cat… ▽ More Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development on categorical data still needs further exploration. In this paper, we propose a method, called CPML for \emph{categorical projected metric learning}, that tries to efficiently~(i.e. less computational time and better prediction accuracy) address the problem of metric learning in categorical data. We make use of the Value Distance Metric to represent our data and propose new distances based on this representation. We then show how to efficiently learn new metrics. We also generalize several previous regularizers through the Schatten $p$-norm and provides a generalization bound for it that complements the standard generalization bound for metric learning. Experimental results show that our method provides △ Less

Submitted 25 February, 2020; originally announced February 2020.

arXiv:2002.11159 [pdf, other]

Smoothing Graphons for Modelling Exchangeable Relational Data

Authors: Xuhui Fan, Yaqiong Li, Ling Chen, Bin Li, Scott A. Sisson

Abstract: Modelling exchangeable relational data can be described by \textit{graphon theory}. Most Bayesian methods for modelling exchangeable relational data can be attributed to this framework by exploiting different forms of graphons. However, the graphons adopted by existing Bayesian methods are either piecewise-constant functions, which are insufficiently flexible for accurate modelling of the relation… ▽ More Modelling exchangeable relational data can be described by \textit{graphon theory}. Most Bayesian methods for modelling exchangeable relational data can be attributed to this framework by exploiting different forms of graphons. However, the graphons adopted by existing Bayesian methods are either piecewise-constant functions, which are insufficiently flexible for accurate modelling of the relational data, or are complicated continuous functions, which incur heavy computational costs for inference. In this work, we introduce a smoothing procedure to piecewise-constant graphons to form {\em smoothing graphons}, which permit continuous intensity values for describing relations, but without impractically increasing computational costs. In particular, we focus on the Bayesian Stochastic Block Model (SBM) and demonstrate how to adapt the piecewise-constant SBM graphon to the smoothed version. We initially propose the Integrated Smoothing Graphon (ISG) which introduces one smoothing parameter to the SBM graphon to generate continuous relational intensity values. We then develop the Latent Feature Smoothing Graphon (LFSG), which improves on the ISG by introducing auxiliary hidden labels to decompose the calculation of the ISG intensity and enable efficient inference. Experimental results on real-world data sets validate the advantages of applying smoothing strategies to the Stochastic Block Model, demonstrating that smoothing graphons can greatly improve AUC and precision for link prediction without increasing computational complexity. △ Less

Submitted 25 February, 2020; originally announced February 2020.

arXiv:2002.10235 [pdf, other]

Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data Modelling

Authors: Yaqiong Li, Xuhui Fan, Ling Chen, Bin Li, Zheng Yu, Scott A. Sisson

Abstract: The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational d… ▽ More The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework -- the Recurrent Dirichlet Belief Network~(Recurrent-DBN) -- to study interpretable hidden structures from dynamic relational data. The proposed Recurrent-DBN has the following merits: (1) it infers interpretable and organised hierarchical latent structures for objects within and across time steps; (2) it enables recurrent long-term temporal dependence modelling, which outperforms the one-order Markov descriptions in most of the dynamic probabilistic frameworks. In addition, we develop a new inference strategy, which first upward-and-backward propagates latent counts and then downward-and-forward samples variables, to enable efficient Gibbs sampling for the Recurrent-DBN. We apply the Recurrent-DBN to dynamic relational data problems. The extensive experiment results on real-world data validate the advantages of the Recurrent-DBN over the state-of-the-art models in interpretable latent structure discovery and improved link prediction performance. △ Less

Submitted 29 April, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: 7 pages, 3 figures

arXiv:2002.00901 [pdf, other]

Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel

Authors: Zheng Yu, Xuhui Fan, Marcin Pietrasik, Marek Reformat

Abstract: The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information~(e.g. entities' community structural information) can not be well embedded in the modelling;… ▽ More The Mixed-Membership Stochastic Blockmodel~(MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information~(e.g. entities' community structural information) can not be well embedded in the modelling; (2), community evolution can not be well described in the literature. Therefore, we propose a non-parametric fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fcMMSB). Our model performs entity-based clustering to capture the community information for entities and linkage-based clustering to derive the group information for links simultaneously. Besides, the proposed model infers the network structure and models community evolution, manifested by appearances and disappearances of communities, using the discrete fragmentation coagulation process (DFCP). By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB. An efficient Gibbs sampling scheme with Polya Gamma (PG) approach is implemented for posterior inference. We validate our model on synthetic and real world data. △ Less

Submitted 17 January, 2020; originally announced February 2020.

Comments: AAAI 2020

arXiv:1912.13151 [pdf, other]

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Authors: Xinjie Fan, Yizhe Zhang, Zhendong Wang, Mingyuan Zhou

Abstract: Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, t… ▽ More Sequence generation models are commonly refined with reinforcement learning over user-defined metrics. However, high gradient variance hinders the practical use of this method. To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control. Due to the correlation, the number of unique rollouts is random and adaptive to model uncertainty; those rollouts naturally become baselines for each other, and hence are combined to effectively reduce gradient variance. We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios by decomposing each categorical action into a sequence of binary actions. We evaluate our methods on both neural program synthesis and image captioning. The proposed methods yield lower gradient variance and consistent improvement over related baselines. △ Less

Submitted 17 June, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

Comments: ICLR 2020 (updated to fix a typo in Algorithm 1)

arXiv:1912.03668 [pdf, other]

Short-term Load Forecasting with Dense Average Network

Authors: Zhifang Liao, Haihui Pan, Qi Zeng, ** Fan, Yan Zhang, Song Yu

Abstract: As an important part of the power system, power load forecasting directly affects the national economy. The data shows that improving the load forecasting accuracy by 0.01% can save millions of dollars for the power industry. Therefore, improving the accuracy of power load forecasting has always been the pursuing goals for a power system. Based on this goal, this paper proposes a novel connection,… ▽ More As an important part of the power system, power load forecasting directly affects the national economy. The data shows that improving the load forecasting accuracy by 0.01% can save millions of dollars for the power industry. Therefore, improving the accuracy of power load forecasting has always been the pursuing goals for a power system. Based on this goal, this paper proposes a novel connection, the dense average connection, in which the outputs of all preceding layers are averaged as the input of the next layer in a feed-forward fashion. Based on dense average connection , we construct the dense average network for power load forecasting. The predictions of the proposed model for two public datasets are better than those of existing methods. On this basis, we use the ensemble method to further improve the accuracy of the model. To verify the reliability of the model predictions, the robustness is analyzed and verified by adding input disturbances. The experimental results show that the proposed model is effective and robust for power load forecasting. △ Less

Submitted 28 May, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

Comments: 8 pages, 5 figures

arXiv:1911.05443 [pdf, other]

Dynamic Connected Neural Decision Classifier and Regressor with Dynamic Softing Pruning

Authors: Xinyu Fan

Abstract: To deal with various datasets over different complexity, this paper presents an self-adaptive learning model that combines the proposed Dynamic Connected Neural Decision Networks (DNDN) and a new pruning method--Dynamic Soft Pruning (DSP). DNDN is a combination of random forests and deep neural networks that enjoys both the advantages of strong classification capability of tree-like structure and… ▽ More To deal with various datasets over different complexity, this paper presents an self-adaptive learning model that combines the proposed Dynamic Connected Neural Decision Networks (DNDN) and a new pruning method--Dynamic Soft Pruning (DSP). DNDN is a combination of random forests and deep neural networks that enjoys both the advantages of strong classification capability of tree-like structure and representation learning capability of network structure. Based on Deep Neural Decision Forests (DNDF), this paper adopts an end-to-end training approach by representing the classification distribution with multiple randomly initialized softmax layers, which further allows an ensemble of multiple random forests attached to layers of neural network with different depth. We also propose a soft pruning method DSP to reduce the redundant connections of the network adaptively to avoid over-fitting simple dataset. The model demonstrates no performance loss compared with unpruned models and even higher robustness over different data and feature distribution. Extensive experiments on different datasets demonstrate the superiority of the proposed model over other popular algorithms in solving classification tasks. △ Less

Submitted 22 February, 2021; v1 submitted 13 November, 2019; originally announced November 2019.

arXiv:1911.05441 [pdf, other]

Regression via Arbitrary Quantile Modeling

Authors: Faen Zhang, Xinyu Fan, Hui Xu, Pengcheng Zhou, Yujian He, Junlong Liu

Abstract: In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the… ▽ More In the regression problem, L1 and L2 are the most commonly used loss functions, which produce mean predictions with different biases. However, the predictions are neither robust nor adequate enough since they only capture a few conditional distributions instead of the whole distribution, especially for small datasets. To address this problem, we proposed arbitrary quantile modeling to regulate the prediction, which achieved better performance compared to traditional loss functions. More specifically, a new distribution regression method, Deep Distribution Regression (DDR), is proposed to estimate arbitrary quantiles of the response variable. Our DDR method consists of two models: a Q model, which predicts the corresponding value for arbitrary quantile, and an F model, which predicts the corresponding quantile for arbitrary value. Furthermore, the duality between Q and F models enables us to design a novel loss function for joint training and perform a dual inference mechanism. Our experiments demonstrate that our DDR-joint and DDR-disjoint methods outperform previous methods such as AdaBoost, random forest, LightGBM, and neural networks both in terms of mean and quantile prediction. △ Less

Submitted 13 November, 2019; originally announced November 2019.

arXiv:1911.01535 [pdf, other]

Scalable Deep Generative Relational Models with High-Order Node Dependence

Authors: Xuhui Fan, Bin Li, Scott Anthony Sisson, Caoyuan Li, Ling Chen

Abstract: We propose a probabilistic framework for modelling and exploring the latent structure of relational data. Given feature information for the nodes in a network, the scalable deep generative relational model (SDREM) builds a deep network architecture that can approximate potential nonlinear map**s between nodes' feature information and the nodes' latent representations. Our contribution is two-fol… ▽ More We propose a probabilistic framework for modelling and exploring the latent structure of relational data. Given feature information for the nodes in a network, the scalable deep generative relational model (SDREM) builds a deep network architecture that can approximate potential nonlinear map**s between nodes' feature information and the nodes' latent representations. Our contribution is two-fold: (1) We incorporate high-order neighbourhood structure information to generate the latent representations at each node, which vary smoothly over the network. (2) Due to the Dirichlet random variable structure of the latent representations, we introduce a novel data augmentation trick which permits efficient Gibbs sampling. The SDREM can be used for large sparse networks as its computational cost scales with the number of positive links. We demonstrate its competitive performance through improved link prediction performance on a range of real-world datasets. △ Less

Submitted 4 November, 2019; originally announced November 2019.

arXiv:1910.13052 [pdf, other]

Scalable Inference for Nonparametric Hawkes Process Using Pólya-Gamma Augmentation

Authors: Feng Zhou, Zhidong Li, Xuhui Fan, Yang Wang, Arcot Sowmya, Fang Chen

Abstract: In this paper, we consider the sigmoid Gaussian Hawkes process model: the baseline intensity and triggering kernel of Hawkes process are both modeled as the sigmoid transformation of random trajectories drawn from Gaussian processes (GP). By introducing auxiliary latent random variables (branching structure, Pólya-Gamma random variables and latent marked Poisson processes), the likelihood is conve… ▽ More In this paper, we consider the sigmoid Gaussian Hawkes process model: the baseline intensity and triggering kernel of Hawkes process are both modeled as the sigmoid transformation of random trajectories drawn from Gaussian processes (GP). By introducing auxiliary latent random variables (branching structure, Pólya-Gamma random variables and latent marked Poisson processes), the likelihood is converted to two decoupled components with a Gaussian form which allows for an efficient conjugate analytical inference. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum a posteriori (MAP) estimate. Furthermore, we extend the EM algorithm to an efficient approximate Bayesian inference algorithm: mean-field variational inference. We demonstrate the performance of two algorithms on simulated fictitious data. Experiments on real data show that our proposed inference algorithms can recover well the underlying prompting characteristics efficiently. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.08018 [pdf, other]

A Unified Framework for Tuning Hyperparameters in Clustering Problems

Authors: Xinjie Fan, Yuguang Yue, Purnamrita Sarkar, Y. X. Rachel Wang

Abstract: Selecting hyperparameters for unsupervised learning problems is challenging in general due to the lack of ground truth for validation. Despite the prevalence of this issue in statistics and machine learning, especially in clustering problems, there are not many methods for tuning these hyperparameters with theoretical guarantees. In this paper, we provide a framework with provable guarantees for s… ▽ More Selecting hyperparameters for unsupervised learning problems is challenging in general due to the lack of ground truth for validation. Despite the prevalence of this issue in statistics and machine learning, especially in clustering problems, there are not many methods for tuning these hyperparameters with theoretical guarantees. In this paper, we provide a framework with provable guarantees for selecting hyperparameters in a number of distinct models. We consider both the subgaussian mixture model and network models to serve as examples of i.i.d. and non-i.i.d. data. We demonstrate that the same framework can be used to choose the Lagrange multipliers of penalty terms in semi-definite programming (SDP) relaxations for community detection, and the bandwidth parameter for constructing kernel similarity matrices for spectral clustering. By incorporating a cross-validation procedure, we show the framework can also do consistent model selection for network models. Using a variety of simulated and real data examples, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings. △ Less

Submitted 1 February, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

arXiv:1906.02438 [pdf, other]

Fast Multi-resolution Segmentation for Nonstationary Hawkes Process Using Cumulants

Authors: Feng Zhou, Zhidong Li, Xuhui Fan, Yang Wang, Arcot Sowmya, Fang Chen

Abstract: The stationarity is assumed in vanilla Hawkes process, which reduces the model complexity but introduces a strong assumption. In this paper, we propose a fast multi-resolution segmentation algorithm to capture the time-varying characteristics of nonstationary Hawkes process. The proposed algorithm is based on the first and second order cumulants. Except for the computation efficiency, the algorith… ▽ More The stationarity is assumed in vanilla Hawkes process, which reduces the model complexity but introduces a strong assumption. In this paper, we propose a fast multi-resolution segmentation algorithm to capture the time-varying characteristics of nonstationary Hawkes process. The proposed algorithm is based on the first and second order cumulants. Except for the computation efficiency, the algorithm can provide a hierarchical view of the segmentation at different resolutions. We extensively investigate the impact of hyperparameters on the performance of this algorithm. To ease the choice of one hyperparameter, a refined Gaussian process based segmentation algorithm is also proposed which proves to be robust. The proposed algorithm is applied to a real vehicle collision dataset and the outcome shows some interesting hierarchical dynamic time-varying characteristics. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1905.12251 [pdf, other]

Efficient EM-Variational Inference for Hawkes Process

Authors: Feng Zhou, Zhidong Li, Xuhui Fan, Yang Wang, Arcot Sowmya, Fang Chen

Abstract: In classical Hawkes process, the baseline intensity and triggering kernel are assumed to be a constant and parametric function respectively, which limits the model flexibility. To generalize it, we present a fully Bayesian nonparametric model, namely Gaussian process modulated Hawkes process and propose an EM-variational inference scheme. In this model, a transformation of Gaussian process is used… ▽ More In classical Hawkes process, the baseline intensity and triggering kernel are assumed to be a constant and parametric function respectively, which limits the model flexibility. To generalize it, we present a fully Bayesian nonparametric model, namely Gaussian process modulated Hawkes process and propose an EM-variational inference scheme. In this model, a transformation of Gaussian process is used as a prior on the baseline intensity and triggering kernel. By introducing a latent branching structure, the inference of baseline intensity and triggering kernel is decoupled and the variational inference scheme is embedded into an EM framework naturally. We also provide a series of schemes to accelerate the inference. Results of synthetic and real data experiments show that the underlying baseline intensity and triggering kernel can be recovered without parametric restriction and our Bayesian nonparametric estimation is superior to other state of the arts. △ Less

Submitted 28 October, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1903.09348 [pdf, other]

Binary Space Partitioning Forests

Authors: Xuhui Fan, Bin Li, Scott Anthony Sisson

Abstract: The Binary Space Partitioning~(BSP)-Tree process is proposed to produce flexible 2-D partition structures which are originally used as a Bayesian nonparametric prior for relational modelling. It can hardly be applied to other learning tasks such as regression trees because extending the BSP-Tree process to a higher dimensional space is nontrivial. This paper is the first attempt to extend the BSP-… ▽ More The Binary Space Partitioning~(BSP)-Tree process is proposed to produce flexible 2-D partition structures which are originally used as a Bayesian nonparametric prior for relational modelling. It can hardly be applied to other learning tasks such as regression trees because extending the BSP-Tree process to a higher dimensional space is nontrivial. This paper is the first attempt to extend the BSP-Tree process to a d-dimensional (d>2) space. We propose to generate a cutting hyperplane, which is assumed to be parallel to d-2 dimensions, to cut each node in the d-dimensional BSP-tree. By designing a subtle strategy to sample two free dimensions from d dimensions, the extended BSP-Tree process can inherit the essential self-consistency property from the original version. Based on the extended BSP-Tree process, an ensemble model, which is named the BSP-Forest, is further developed for regression tasks. Thanks to the retained self-consistency property, we can thus significantly reduce the geometric calculations in the inference stage. Compared to its counterpart, the Mondrian Forest, the BSP-Forest can achieve similar performance with fewer cuts due to its flexibility. The BSP-Forest also outperforms other (Bayesian) regression forests on a number of real-world data sets. △ Less

Submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.09343 [pdf, other]

The Binary Space Partitioning-Tree Process

Authors: Xuhui Fan, Bin Li, Scott Anthony Sisson

Abstract: The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that… ▽ More The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that allows uniformly distributed oblique cuts in a two-dimensional convex polygon. The BSP-Tree process can also be extended using a non-uniform probability measure to generate direction differentiated cuts. The process is also self-consistent, maintaining distributional invariance under a restricted subdomain. We use Conditional-Sequential Monte Carlo for inference using the tree structure as the high-dimensional variable. The BSP-Tree process's performance on synthetic data partitioning and relational modelling demonstrates clear inferential improvements over the standard Mondrian process and other related methods. △ Less

Submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.03906 [pdf, other]

Rectangular Bounding Process

Authors: Xuhui Fan, Bin Li, Scott Anthony Sisson

Abstract: Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimon… ▽ More Stochastic partition models divide a multi-dimensional space into a number of rectangular regions, such that the data within each region exhibit certain types of homogeneity. Due to the nature of their partition strategy, existing partition models may create many unnecessary divisions in sparse regions when trying to describe data in dense regions. To avoid this problem we introduce a new parsimonious partition model -- the Rectangular Bounding Process (RBP) -- to efficiently partition multi-dimensional spaces, by employing a bounding strategy to enclose data points within rectangular bounding boxes. Unlike existing approaches, the RBP possesses several attractive theoretical properties that make it a powerful nonparametric partition prior on a hypercube. In particular, the RBP is self-consistent and as such can be directly extended from a finite hypercube to infinite (unbounded) space. We apply the RBP to regression trees and relational models as a flexible partition prior. The experimental results validate the merit of the RBP {in rich yet parsimonious expressiveness} compared to the state-of-the-art methods. △ Less

Submitted 9 March, 2019; originally announced March 2019.

arXiv:1808.08811 [pdf, other]

doi 10.1515/demo-2019-0007

Exponential inequalities for nonstationary Markov Chains

Authors: Pierre Alquier, Paul Doukhan, Xiequan Fan

Abstract: Exponential inequalities are main tools in machine learning theory. To prove exponential inequalities for non i.i.d random variables allows to extend many learning techniques to these variables. Indeed, much work has been done both on inequalities and learning theory for time series, in the past 15 years. However, for the non independent case, almost all the results concern stationary time series.… ▽ More Exponential inequalities are main tools in machine learning theory. To prove exponential inequalities for non i.i.d random variables allows to extend many learning techniques to these variables. Indeed, much work has been done both on inequalities and learning theory for time series, in the past 15 years. However, for the non independent case, almost all the results concern stationary time series. This excludes many important applications: for example any series with a periodic behavior is non-stationary. In this paper, we extend the basic tools of Dedecker and Fan (2015) to nonstationary Markov chains. As an application, we provide a Bernstein-type inequality, and we deduce risk bounds for the prediction of periodic autoregressive processes with an unknown period. △ Less

Submitted 4 May, 2019; v1 submitted 27 August, 2018; originally announced August 2018.

Journal ref: Dependence Modeling, 2019, vol. 7, pp. 150-168

Showing 1–50 of 62 results for author: Fan, X