Search | arXiv e-print repository

Deep Hierarchical Graph Alignment Kernels

Authors: Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

Abstract: Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relati… ▽ More Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relational substructures are hierarchically aligned to cluster distributions in their deep embedding space. The substructures belonging to the same cluster are assigned the same feature map in the Reproducing Kernel Hilbert Space (RKHS), where graph feature maps are derived by kernel mean embedding. Theoretical analysis guarantees that DHGAK is positive semi-definite and has linear separability in the RKHS. Comparison with state-of-the-art graph kernels on various benchmark datasets demonstrates the effectiveness and efficiency of DHGAK. The code is available at Github (https://github.com/EWesternRa/DHGAK). △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.10004 [pdf]

A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in this method: (1) The similarity evaluation indicators are determined from three dimensions, i.e., the Basis of National Epidemic Prevention & Control, Social Resilience, and Infection Situation. (2) The data related to the indicators are collected and preprocessed. (3) The first round of screening on the preprocessed dataset is conducted through an improved collaborative filtering algorithm to calculate the preliminary similarity result from the perspective of the infection situation. (4) Finally, the K-Means model is used for the second round of screening to obtain the final similarity values. The approach will be applied to decision-making support in the context of COVID-19. Our results demonstrate that the recommendations generated by the STDSA model are more accurate and aligned better with the actual situation than those produced by pure K-means models. This study will provide new insights into preventing and controlling epidemics in regions that lack experience. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 20 pages, 9 figures

arXiv:2404.08898 [pdf, other]

Using early rejection Markov chain Monte Carlo and Gaussian processes to accelerate ABC methods

Authors: Xuefei Cao, Shijia Wang, Yongdao Zhou

Abstract: Approximate Bayesian computation (ABC) is a class of Bayesian inference algorithms that targets for problems with intractable or {unavailable} likelihood function. It uses synthetic data drawn from the simulation model to approximate the posterior distribution. However, ABC is computationally intensive for complex models in which simulating synthetic data is very expensive. In this article, we pro… ▽ More Approximate Bayesian computation (ABC) is a class of Bayesian inference algorithms that targets for problems with intractable or {unavailable} likelihood function. It uses synthetic data drawn from the simulation model to approximate the posterior distribution. However, ABC is computationally intensive for complex models in which simulating synthetic data is very expensive. In this article, we propose an early rejection Markov chain Monte Carlo (ejMCMC) sampler based on Gaussian processes to accelerate inference speed. We early reject samples in the first stage of the kernel using a discrepancy model, in which the discrepancy between the simulated and observed data is modeled by Gaussian process (GP). Hence, the synthetic data is generated only if the parameter space is worth exploring. We demonstrate from theory, simulation experiments, and real data analysis that the new algorithm significantly improves inference efficiency compared to existing early-rejection MCMC algorithms. In addition, we employ our proposed method within an ABC sequential Monte Carlo (SMC) sampler. In our numerical experiments, we use examples of ordinary differential equations, stochastic differential equations, and delay differential equations to demonstrate the effectiveness of the proposed algorithm. We develop an R package that is available at https://github.com/caofff/ejMCMC. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2311.01435 [pdf, other]

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

Authors: Xinyuan Cao, Santosh S. Vempala

Abstract: We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in $d$-dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the $d$-fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an $ε$ fraction of the data in one of the com… ▽ More We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in $d$-dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the $d$-fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an $ε$ fraction of the data in one of the component distributions. Notably, our algorithm does not need labels and establishes the unique (and efficient) identifiability of the hidden halfspace under this distributional assumption. The sample and time complexity of the algorithm are polynomial in the dimension and $1/ε$. The algorithm uses only the first two moments of suitable re-weightings of the empirical distribution, which we call contrastive moments; its analysis uses classical facts about generalized Dirichlet polynomials and relies crucially on a new monotonicity property of the moment ratio of truncations of logconcave distributions. Such algorithms, based only on first and second moments were suggested in earlier work, but hitherto eluded rigorous guarantees. Prior work addressed the special case when the underlying distribution is Gaussian via Non-Gaussian Component Analysis. We improve on this by providing polytime guarantees based on Total Variation (TV) distance, in place of existing moment-bound guarantees that can be super-polynomial. Our work is also the first to go beyond Gaussians in this setting. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Preliminary version in NeurIPS 2023

arXiv:2310.07990 [pdf]

Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved R^2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research. △ Less

Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: 19 pages, 3 figures

arXiv:2310.00864 [pdf, other]

Multi-Label Residual Weighted Learning for Individualized Combination Treatment Rule

Authors: Qi Xu, Xiaoke Cao, Ge** Chen, Hanqi Zeng, Haoda Fu, Annie Qu

Abstract: Individualized treatment rules (ITRs) have been widely applied in many fields such as precision medicine and personalized marketing. Beyond the extensive studies on ITR for binary or multiple treatments, there is considerable interest in applying combination treatments. This paper introduces a novel ITR estimation method for combination treatments incorporating interaction effects among treatments… ▽ More Individualized treatment rules (ITRs) have been widely applied in many fields such as precision medicine and personalized marketing. Beyond the extensive studies on ITR for binary or multiple treatments, there is considerable interest in applying combination treatments. This paper introduces a novel ITR estimation method for combination treatments incorporating interaction effects among treatments. Specifically, we propose the generalized $ψ$-loss as a non-convex surrogate in the residual weighted learning framework, offering desirable statistical and computational properties. Statistically, the minimizer of the proposed surrogate loss is Fisher-consistent with the optimal decision rules, incorporating interaction effects at any intensity level - a significant improvement over existing methods. Computationally, the proposed method applies the difference-of-convex algorithm for efficient computation. Through simulation studies and real-world data applications, we demonstrate the superior performance of the proposed method in recommending combination treatments. △ Less

Submitted 7 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2303.10388 [pdf]

doi 10.1093/bioinformatics/btad470

ggpicrust2: an R package for PICRUSt2 predicted functional profile analysis and visualization

Authors: Chen Yang, Jiahao Mai, Xuan Cao, Aaron Burberry, Fabio Cominelli, Liangliang Zhang

Abstract: Microbiome research is now moving beyond the compositional analysis of microbial taxa in a sample. Increasing evidence from large human microbiome studies suggests that functional consequences of changes in the intestinal microbiome may provide more power for studying their impact on inflammation and immune responses. Although 16S rRNA analysis is one of the most popular and a cost-effective metho… ▽ More Microbiome research is now moving beyond the compositional analysis of microbial taxa in a sample. Increasing evidence from large human microbiome studies suggests that functional consequences of changes in the intestinal microbiome may provide more power for studying their impact on inflammation and immune responses. Although 16S rRNA analysis is one of the most popular and a cost-effective method to profile the microbial compositions, marker-gene sequencing cannot provide direct information about the functional genes that are present in the genomes of community members. Bioinformatic tools have been developed to predict microbiome function with 16S rRNA gene data. Among them, PICRUSt2 has become one of the most popular functional profile prediction tools, which generates community-wide pathway abundances. However, no state-of-art inference tools are available to test the differences in pathway abundances between comparison groups. We have developed ggpicrust2, an R package, to do extensive differential abundance(DA) analyses and provide publishable visualization to highlight the signals. △ Less

Submitted 9 April, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

Comments: 4 pages, 1 figure

arXiv:2210.00415 [pdf, ps, other]

Metric Distribution to Vector: Constructing Data Representation via Broad-Scale Discrepancies

Authors: Xue Liu, Dan Sun, Xiaobo Cao, Hao Ye, Wei Wei

Abstract: Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by map** each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classific… ▽ More Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by map** each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classification for each targeted data is a qualitative issue based on understanding the overall discrepancies within the dataset scale. From the statistical point of view, these discrepancies manifest a metric distribution over the dataset scale if the distance metric is adopted to measure the pairwise similarity or dissimilarity. Therefore, we present a novel embedding strategy named $\mathbf{MetricDistribution2vec}$ to extract such distribution characteristics into the vectorial representation for each data. We demonstrate the application and effectiveness of our representation method in the supervised prediction tasks on extensive real-world structural graph datasets. The results have gained some unexpected increases compared with a surge of baselines on all the datasets, even if we take the lightweight models as classifiers. Moreover, the proposed methods also conducted experiments in Few-Shot classification scenarios, and the results still show attractive discrimination in rare training samples based inference. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2209.12611 [pdf, other]

doi 10.1109/TPAMI.2022.3208419

MaxMatch: Semi-Supervised Learning with Worst-Case Consistency

Authors: Yangbangyan Jiang, Xiaodan Li, Yuefeng Chen, Yuan He, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang

Abstract: In recent years, great progress has been made to incorporate unlabeled data to overcome the inefficiently supervised problem via semi-supervised learning (SSL). Most state-of-the-art models are based on the idea of pursuing consistent model predictions over unlabeled data toward the input noise, which is called consistency regularization. Nonetheless, there is a lack of theoretical insights into t… ▽ More In recent years, great progress has been made to incorporate unlabeled data to overcome the inefficiently supervised problem via semi-supervised learning (SSL). Most state-of-the-art models are based on the idea of pursuing consistent model predictions over unlabeled data toward the input noise, which is called consistency regularization. Nonetheless, there is a lack of theoretical insights into the reason behind its success. To bridge the gap between theoretical and practical results, we propose a worst-case consistency regularization technique for SSL in this paper. Specifically, we first present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants. We then provide a simple but effective algorithm to solve the proposed minimax problem, and theoretically prove that it converges to a stationary point. Experiments on five popular benchmark datasets validate the effectiveness of our proposed method. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted to IEEE TPAMI

arXiv:2209.05742 [pdf, other]

doi 10.1109/TPAMI.2022.3190939

A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game

Authors: Ke Ma, Qianqian Xu, **shan Zeng, Guorong Li, Xiaochun Cao, Qingming Huang

Abstract: Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and in… ▽ More Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 33 pages, https://github.com/alphaprime/Target_Attack_Rank_Aggregation

Journal ref: Early Access by TPAMI 2022 (https://ieeexplore.ieee.org/document/9830042)

arXiv:2206.11655 [pdf, other]

doi 10.1109/TPAMI.2022.3185311

Optimizing Two-way Partial AUC with an End-to-end Framework

Authors: Zhiyong Yang, Qianqian Xu, Shilong Bao, Yuan He, Xiaochun Cao, Qingming Huang

Abstract: The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which evaluates the average performance over all possible True Positive Rates (TPRs) and False Positive Rates (FPRs). Based on the knowledge that a skillful classifier should simultaneously embrace a high TPR and a low FPR, we turn to study a more general variant called Two-way Partial AUC (TPAUC), where only the region w… ▽ More The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which evaluates the average performance over all possible True Positive Rates (TPRs) and False Positive Rates (FPRs). Based on the knowledge that a skillful classifier should simultaneously embrace a high TPR and a low FPR, we turn to study a more general variant called Two-way Partial AUC (TPAUC), where only the region with $\mathsf{TPR} \ge α, \mathsf{FPR} \le β$ is included in the area. Moreover, recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics where only the FPR range is restricted, opening a new problem to seek solutions to leverage high TPAUC. Motivated by this, we present the first trial in this paper to optimize this new metric. The critical challenge along this course lies in the difficulty of performing gradient-based optimization with end-to-end stochastic training, even with a proper choice of surrogate loss. To address this issue, we propose a generic framework to construct surrogate optimization problems, which supports efficient end-to-end training with deep learning. Moreover, our theoretical analyses show that: 1) the objective function of the surrogate problems will achieve an upper bound of the original problem under mild conditions, and 2) optimizing the surrogate problems leads to good generalization performance in terms of TPAUC with a high probability. Finally, empirical studies over several benchmark datasets speak to the efficacy of our framework. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2203.07110 [pdf, other]

Bayesian inference on hierarchical nonlocal priors in generalized linear models

Authors: Xuan Cao, Kyoungjae Lee

Abstract: Variable selection methods with nonlocal priors have been widely studied in linear regression models, and their theoretical and empirical performances have been reported. However, the crucial model selection properties for hierarchical nonlocal priors in high-dimensional generalized linear regression have rarely been investigated. In this paper, we consider a hierarchical nonlocal prior for high-d… ▽ More Variable selection methods with nonlocal priors have been widely studied in linear regression models, and their theoretical and empirical performances have been reported. However, the crucial model selection properties for hierarchical nonlocal priors in high-dimensional generalized linear regression have rarely been investigated. In this paper, we consider a hierarchical nonlocal prior for high-dimensional logistic regression models and investigate theoretical properties of the posterior distribution. Specifically, a product moment (pMOM) nonlocal prior is imposed over the regression coefficients with an Inverse-Gamma prior on the tuning parameter. Under standard regularity assumptions, we establish strong model selection consistency in a high-dimensional setting, where the number of covariates is allowed to increase at a sub-exponential rate with the sample size. We implement the Laplace approximation for computing the posterior probabilities, and a modified shotgun stochastic search procedure is suggested for efficiently exploring the model space. We demonstrate the validity of the proposed method through simulation studies and an RNA-sequencing dataset for stratifying disease risk. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.07108 [pdf, other]

Consistent and scalable Bayesian joint variable and graph selection for disease diagnosis leveraging functional brain network

Authors: Xuan Cao, Kyoungjae Lee

Abstract: We consider the joint inference of regression coefficients and the inverse covariance matrix for covariates in high-dimensional probit regression, where the predictors are both relevant to the binary response and functionally related to one another. A hierarchical model with spike and slab priors over regression coefficients and the elements in the inverse covariance matrix is employed to simultan… ▽ More We consider the joint inference of regression coefficients and the inverse covariance matrix for covariates in high-dimensional probit regression, where the predictors are both relevant to the binary response and functionally related to one another. A hierarchical model with spike and slab priors over regression coefficients and the elements in the inverse covariance matrix is employed to simultaneously perform variable and graph selection. We establish joint selection consistency for both the variable and the underlying graph when the dimension of predictors is allowed to grow much larger than the sample size, which is the first theoretical result in the Bayesian literature. A scalable Gibbs sampler is derived that performs better in high-dimensional simulation studies compared with other state-of-art methods. We illustrate the practical impact and utilities of the proposed method via a functional MRI dataset, where both the regions of interest with altered functional activities and the underlying functional brain network are inferred and integrated together for stratifying disease risk. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2111.11676 [pdf, other]

RIO: Rotation-equivariance supervised learning of robust inertial odometry

Authors: Caifa Zhou, Xiya Cao, Dandan Zeng, Yongliang Wang

Abstract: This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Furthe… ▽ More This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Further, we propose adaptive Test-Time Training (TTT) based on uncertainty estimations in order to enhance the generalizability of the inertial odometry to various unseen data. We show in experiments that the Rotation-equivariance-supervised Inertial Odometry (RIO) trained with 30% data achieves on par performance with a model trained with the whole database. Adaptive TTT improves models performance in all cases and makes more than 25% improvements under several scenarios. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: 12 pages, 17 figures, 2 tables

arXiv:2110.14098 [pdf, other]

Provable Lifelong Learning of Representations

Authors: Xinyuan Cao, Weiyang Liu, Santosh S. Vempala

Abstract: In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a… ▽ More In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension $d$, $m$ tasks with $k$ features up to error $ε$ is $\tilde{O}(dk^{1.5}/ε+km/ε)$. We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods. △ Less

Submitted 1 March, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted to AISTATS 2022

arXiv:2110.00681 [pdf, other]

doi 10.3389/fgene.2022.836798

A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data

Authors: Xiaowen Cao, Li Xing, Elham Majd, Hua He, Junhua Gu, Xuekui Zhang

Abstract: Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Bes… ▽ More Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Besides cell-type annotation, supervised machine learning methods can also be applied to predict other cell phenotypes from genomic data. Despite the popularity of such applications, there is no existing study to systematically investigate the performance of those supervised algorithms in various sizes of scRNA-seq data sets. Methods and Results: This study evaluates 13 popular supervised machine learning algorithms to classify cell phenotypes, using published real and simulated data sets with diverse cell sizes. The benchmark contained two parts. In the first part, we used real data sets to assess the popular supervised algorithms' computing speed and cell phenotype classification performance. The classification performances were evaluated using AUC statistics, F1-score, precision, recall, and false-positive rate. In the second part, we evaluated gene selection performance using published simulated data sets with a known list of real genes. Conclusion: The study outcomes showed that ElasticNet with interactions performed best in small and medium data sets. NB was another appropriate method for medium data sets. In large data sets, XGB works excellent. Ensemble algorithms were not significantly superior to individual machine learning methods. Adding interactions to ElasticNet can help, and the improvement was significant in small data sets. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 21 pages, 4 figures, 1 table

Journal ref: Front. Genet. 13:836798 (2022)

arXiv:2109.14142 [pdf, ps, other]

On the Provable Generalization of Recurrent Neural Networks

Authors: Lifu Wang, Bo Shen, Bo Hu, Xing Cao

Abstract: Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and… ▽ More Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works study to learn functions that are summation of $f(β^T_lX_l)$ and require normalized conditions that $||X_l||\leqε$ with some very small $ε$ depending on the complexity of $f$. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length $L$. 2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form $f(β^T[X_{l_1},...,X_{l_N}])$, which do not belong to the "additive" concept class, i,e., the summation of function $f(X_l)$. And we show that when either $N$ or $l_0=\max(l_1,..,l_N)-\min(l_1,..,l_N)$ is small, $f(β^T[X_{l_1},...,X_{l_N}])$ will be learnable with the number iterations and samples scaling almost-polynomially in the input length $L$. △ Less

Submitted 26 January, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: Accepted to Neurips 2021

arXiv:2008.06190 [pdf, other]

Bayesian joint inference for multiple directed acyclic graphs

Authors: Kyoungjae Lee, Xuan Cao

Abstract: In many applications, data often arise from multiple groups that may share similar characteristics. A joint estimation method that models several groups simultaneously can be more efficient than estimating parameters in each group separately. We focus on unraveling the dependence structures of data based on directed acyclic graphs and propose a Bayesian joint inference method for multiple graphs.… ▽ More In many applications, data often arise from multiple groups that may share similar characteristics. A joint estimation method that models several groups simultaneously can be more efficient than estimating parameters in each group separately. We focus on unraveling the dependence structures of data based on directed acyclic graphs and propose a Bayesian joint inference method for multiple graphs. To encourage similar dependence structures across all groups, a Markov random field prior is adopted. We establish the joint selection consistency of the fractional posterior in high dimensions, and benefits of the joint inference are shown under the common support assumption. This is the first Bayesian method for joint estimation of multiple directed acyclic graphs. The performance of the proposed method is demonstrated using simulation studies, and it is shown that our joint inference outperforms other competitors. We apply our method to an fMRI data for simultaneously inferring multiple brain functional networks. △ Less

Submitted 14 August, 2020; originally announced August 2020.

arXiv:2007.08053 [pdf, other]

doi 10.24963/ijcai.2020/168

Inductive Link Prediction for Nodes Having Only Attribute Information

Authors: Yu Hao, Xin Cao, Yixiang Fang, Xike Xie, Sibo Wang

Abstract: Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute i… ▽ More Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute information. It is more challenging since the new nodes do not have structure information and cannot be seen during the model training. To solve this problem, we propose a model called DEAL, which consists of three components: two node embedding encoders and one alignment mechanism. The two encoders aim to output the attribute-oriented node embedding and the structure-oriented node embedding, and the alignment mechanism aligns the two types of embeddings to build the connections between the attributes and links. Our model DEAL is versatile in the sense that it works for both inductive and transductive link prediction. Extensive experiments on several benchmark datasets show that our proposed model significantly outperforms existing inductive link prediction methods, and also outperforms the state-of-the-art methods on transductive link prediction. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: IJCAI2020

arXiv:2006.09104 [pdf, other]

New Interpretations of Normalization Methods in Deep Learning

Authors: Jiacheng Sun, Xiangyong Cao, Hanwen Liang, Weiran Huang, Zewei Chen, Zhenguo Li

Abstract: In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, mathematical tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools… ▽ More In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, mathematical tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools to make a deep analysis on popular normalization methods and obtain the following conclusions: 1) Most of the normalization methods can be interpreted in a unified framework, namely normalizing pre-activations or weights onto a sphere; 2) Since most of the existing normalization methods are scaling invariant, we can conduct optimization on a sphere with scaling symmetry removed, which can help stabilize the training of network; 3) We prove that training with these normalization methods can make the norm of weights increase, which could cause adversarial vulnerability as it amplifies the attack. Finally, a series of experiments are conducted to verify these claims. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: Accepted by AAAI 2020

arXiv:2005.03694 [pdf, other]

High-Dimensional Inference Based on the Leave-One-Covariate-Out LASSO Path

Authors: Xiangyang Cao, Karl Gregory, Dewei Wang

Abstract: We propose a new measure of variable importance in high-dimensional regression based on the change in the LASSO solution path when one covariate is left out. The proposed procedure provides a novel way to calculate variable importance and conduct variable screening. In addition, our procedure allows for the construction of P-values for testing whether each coefficient is equal to zero as well as f… ▽ More We propose a new measure of variable importance in high-dimensional regression based on the change in the LASSO solution path when one covariate is left out. The proposed procedure provides a novel way to calculate variable importance and conduct variable screening. In addition, our procedure allows for the construction of P-values for testing whether each coefficient is equal to zero as well as for testing hypotheses involving multiple regression coefficients simultaneously; bootstrap techniques are used to construct the null distribution. For low-dimensional linear models, our method can achieve higher power than the $t$-test. Extensive simulations are provided to show the effectiveness of our method. In the high-dimensional setting, our proposed solution path based test achieves greater power than some other recently developed high-dimensional inference methods. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: 24 pages, 8 figures

MSC Class: 62J07 (Primary) 62F40 (Secondary)

arXiv:2004.13930 [pdf, other]

Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction

Authors: Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang

Abstract: As an effective learning paradigm against insufficient training samples, Multi-Task Learning (MTL) encourages knowledge sharing across multiple related tasks so as to improve the overall performance. In MTL, a major challenge springs from the phenomenon that sharing the knowledge with dissimilar and hard tasks, known as negative transfer, often results in a worsened performance. Though a substanti… ▽ More As an effective learning paradigm against insufficient training samples, Multi-Task Learning (MTL) encourages knowledge sharing across multiple related tasks so as to improve the overall performance. In MTL, a major challenge springs from the phenomenon that sharing the knowledge with dissimilar and hard tasks, known as negative transfer, often results in a worsened performance. Though a substantial amount of studies have been carried out against the negative transfer, most of the existing methods only model the transfer relationship as task correlations, with the transfer across features and tasks left unconsidered. Different from the existing methods, our goal is to alleviate negative transfer collaboratively across features and tasks. To this end, we propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL). Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grou** of features and tasks and suppressing inter-group knowledge sharing. We then propose an optimization method for the model. Extensive theoretical analysis shows that our proposed method has the following benefits: (a) it enjoys the global convergence property and (b) it provides a block-diagonal structure recovery guarantee. As a practical extension, we extend the base model by allowing overlap** features and differentiating the hard tasks. We further apply it to the personalized attribute prediction problem with fine-grained modeling of user behaviors. Finally, experimental results on both simulated dataset and real-world datasets demonstrate the effectiveness of our proposed method △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: To appear in T-PAMI

arXiv:2004.09306 [pdf, ps, other]

doi 10.5705/ss.202019.0202

Joint Bayesian Variable and DAG Selection Consistency for High-dimensional Regression Models with Network-structured Covariates

Authors: Xuan Cao, Kyoungjae Lee

Abstract: We consider the joint sparse estimation of regression coefficients and the covariance matrix for covariates in a high-dimensional regression model, where the predictors are both relevant to a response variable of interest and functionally related to one another via a Gaussian directed acyclic graph (DAG) model. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance… ▽ More We consider the joint sparse estimation of regression coefficients and the covariance matrix for covariates in a high-dimensional regression model, where the predictors are both relevant to a response variable of interest and functionally related to one another via a Gaussian directed acyclic graph (DAG) model. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independence assumptions on the underlying predictors. A variety of methods have been developed in recent years for Bayesian inference in identifying such network-structured predictors in regression setting, yet crucial sparsity selection properties for these models have not been thoroughly investigated. In this paper, we consider a hierarchical model with spike and slab priors on the regression coefficients and a flexible and general class of DAG-Wishart distributions with multiple shape parameters on the Cholesky factors of the inverse covariance matrix. Under mild regularity assumptions, we establish the joint selection consistency for both the variable and the underlying DAG of the covariates when the dimension of predictors is allowed to grow much larger than the sample size. We demonstrate that our method outperforms existing methods in selecting network-structured predictors in several simulation settings. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:2004.08237 [pdf, other]

CAggNet: Crossing Aggregation Network for Medical Image Segmentation

Authors: Xu Cao, Yanghao Lin

Abstract: In this paper, we present Crossing Aggregation Network (CAggNet), a novel densely connected semantic segmentation approach for medical image analysis. The crossing aggregation network improves the idea from deep layer aggregation and makes significant innovations in semantic and spatial information fusion. In CAggNet, the simple skip connection structure of general U-Net is replaced by aggregation… ▽ More In this paper, we present Crossing Aggregation Network (CAggNet), a novel densely connected semantic segmentation approach for medical image analysis. The crossing aggregation network improves the idea from deep layer aggregation and makes significant innovations in semantic and spatial information fusion. In CAggNet, the simple skip connection structure of general U-Net is replaced by aggregations of multi-level down-sampling and up-sampling layers, which is a new form of nested skip connection. This aggregation architecture enables the network to fuse both coarse and fine features interactively in semantic segmentation. It also introduces weighted aggregation module to up-sample multi-scale output at the end of the network. We have evaluated and compared our CAggNet with several advanced U-Net based methods in two public medical image datasets, including the 2018 Data Science Bowl nuclei detection dataset and the 2015 MICCAI gland segmentation competition dataset. Experimental results indicate that CAggNet improves medical object recognition and achieves a more accurate and efficient segmentation compared to existing improved U-Net and UNet++ structure. △ Less

Submitted 7 November, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

Comments: Accepted by ICPR 2020

arXiv:2002.06442 [pdf, other]

doi 10.1145/3318464.3380570

Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

Authors: Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, Makoto Onizuka

Abstract: Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applicat… ▽ More Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applications, especially for query optimization. Moreover, in some applications the estimated cardinality is supposed to be consistent and interpretable. Hence a monotonic estimation w.r.t. the query threshold is preferred. We propose a novel and generic method that can be applied to any data type and distance function. Our method consists of a feature extraction model and a regression model. The feature extraction model transforms original data and threshold to a Hamming space, in which a deep learning-based regression model is utilized to exploit the incremental property of cardinality w.r.t. the threshold for both accuracy and monotonicity. We develop a training strategy tailored to our model as well as techniques for fast estimation. We also discuss how to handle updates. We demonstrate the accuracy and the efficiency of our method through experiments, and show how it improves the performance of a query optimizer. △ Less

Submitted 24 September, 2021; v1 submitted 15 February, 2020; originally announced February 2020.

ACM Class: H.2.4; I.5.1

arXiv:1912.09899 [pdf, other]

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

Authors: **yuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong

Abstract: It is well-known that classifiers are vulnerable to adversarial perturbations. To defend against adversarial perturbations, various certified robustness results have been derived. However, existing certified robustnesses are limited to top-1 predictions. In many real-world applications, top-$k$ predictions are more relevant. In this work, we aim to derive certified robustness for top-$k$ predictio… ▽ More It is well-known that classifiers are vulnerable to adversarial perturbations. To defend against adversarial perturbations, various certified robustness results have been derived. However, existing certified robustnesses are limited to top-1 predictions. In many real-world applications, top-$k$ predictions are more relevant. In this work, we aim to derive certified robustness for top-$k$ predictions. In particular, our certified robustness is based on randomized smoothing, which turns any classifier to a new classifier via adding noise to an input example. We adopt randomized smoothing because it is scalable to large-scale neural networks and applicable to any classifier. We derive a tight robustness in $\ell_2$ norm for top-$k$ predictions when using randomized smoothing with Gaussian noise. We find that generalizing the certified robustness from top-1 to top-$k$ predictions faces significant technical challenges. We also empirically evaluate our method on CIFAR10 and ImageNet. For example, our method can obtain an ImageNet classifier with a certified top-5 accuracy of 62.8\% when the $\ell_2$-norms of the adversarial perturbations are less than 0.5 (=127/255). Our code is publicly available at: \url{https://github.com/jjy1994/Certify_Topk}. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Comments: ICLR 2020, code is available at this: https://github.com/jjy1994/Certify_Topk

arXiv:1912.01833 [pdf, other]

Bayesian Group Selection in Logistic Regression with Application to MRI Data Analysis

Authors: Kyoungjae Lee, Xuan Cao

Abstract: We consider Bayesian logistic regression models with group-structured covariates. In high-dimensional settings, it is often assumed that only small portion of groups are significant, thus consistent group selection is of significant importance. While consistent frequentist group selection methods have been proposed, theoretical properties of Bayesian group selection methods for logistic regression… ▽ More We consider Bayesian logistic regression models with group-structured covariates. In high-dimensional settings, it is often assumed that only small portion of groups are significant, thus consistent group selection is of significant importance. While consistent frequentist group selection methods have been proposed, theoretical properties of Bayesian group selection methods for logistic regression models have not been investigated yet. In this paper, we consider a hierarchical group spike and slab prior for logistic regression models in high-dimensional settings. Under mild conditions, we establish strong group selection consistency of the induced posterior, which is the first theoretical result in the Bayesian literature. Through simulation studies, we demonstrate that the performance of the proposed method outperforms existing state-of-the-art methods in various settings. We further apply our method to an MRI data set for predicting Parkinson's disease and show its benefits over other contenders. △ Less

Submitted 4 December, 2019; originally announced December 2019.

arXiv:1912.00362 [pdf, other]

doi 10.1109/TKDE.2019.2956700

Fast Stochastic Ordinal Embedding with Variance Reduction and Adaptive Step Size

Authors: Ke Ma, **shan Zeng, Qianqian Xu, Xiaochun Cao, Wei Liu, Yuan Yao

Abstract: Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are based on semi-definite programming (\textit{SDP}), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. To overcome this challenge, we propose a stochastic algorithm called \textit{… ▽ More Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are based on semi-definite programming (\textit{SDP}), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. To overcome this challenge, we propose a stochastic algorithm called \textit{SVRG-SBB}, which has the following features: i) achieving good scalability via drop** positive semi-definite (\textit{PSD}) constraints as serving a fast algorithm, i.e., stochastic variance reduced gradient (\textit{SVRG}) method, and ii) adaptive learning via introducing a new, adaptive step size called the stabilized Barzilai-Borwein (\textit{SBB}) step size. Theoretically, under some natural assumptions, we show the $\boldsymbol{O}(\frac{1}{T})$ rate of convergence to a stationary point of the proposed algorithm, where $T$ is the number of total iterations. Under the further Polyak-Łojasiewicz assumption, we can show the global linear convergence (i.e., exponentially fast converging to a global optimum) of the proposed algorithm. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm by comparing with the state-of-the-art methods, notably, much lower computational cost with good prediction performance. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Comments: 19 pages, 5 figures, accepted by IEEE Transaction on Knowledge and Data Engineering, Conference Version: arXiv:1711.06446

arXiv:1910.05905 [pdf, other]

iSplit LBI: Individualized Partial Ranking with Ties via Split LBI

Authors: Qianqian Xu, Xinwei Sun, Zhiyong Yang, Xiaochun Cao, Qingming Huang, Yuan Yao

Abstract: Due to the inherent uncertainty of data, the problem of predicting partial ranking from pairwise comparison data with ties has attracted increasing interest in recent years. However, in real-world scenarios, different individuals often hold distinct preferences. It might be misleading to merely look at a global partial ranking while ignoring personal diversity. In this paper, instead of learning a… ▽ More Due to the inherent uncertainty of data, the problem of predicting partial ranking from pairwise comparison data with ties has attracted increasing interest in recent years. However, in real-world scenarios, different individuals often hold distinct preferences. It might be misleading to merely look at a global partial ranking while ignoring personal diversity. In this paper, instead of learning a global ranking which is agreed with the consensus, we pursue the tie-aware partial ranking from an individualized perspective. Particularly, we formulate a unified framework which not only can be used for individualized partial ranking prediction, but also be helpful for abnormal user selection. This is realized by a variable splitting-based algorithm called \ilbi. Specifically, our algorithm generates a sequence of estimations with a regularization path, where both the hyperparameters and model parameters are updated. At each step of the path, the parameters can be decomposed into three orthogonal parts, namely, abnormal signals, personalized signals and random noise. The abnormal signals can serve the purpose of abnormal user selection, while the abnormal signals and personalized signals together are mainly responsible for individual partial ranking prediction. Extensive experiments on simulated and real-world datasets demonstrate that our new approach significantly outperforms state-of-the-art alternatives. The code is now availiable at https://github.com/qianqianxu010/NeurIPS2019-iSplitLBI. △ Less

Submitted 13 October, 2019; originally announced October 2019.

Comments: Accepted by NeurIPS 2019

arXiv:1909.04837 [pdf, other]

Identifying and Resisting Adversarial Videos Using Temporal Consistency

Authors: Xiaojun Jia, Xingxing Wei, Xiaochun Cao

Abstract: Video classification is a challenging task in computer vision. Although Deep Neural Networks (DNNs) have achieved excellent performance in video classification, recent research shows adding imperceptible perturbations to clean videos can make the well-trained models output wrong labels with high confidence. In this paper, we propose an effective defense framework to characterize and defend adversa… ▽ More Video classification is a challenging task in computer vision. Although Deep Neural Networks (DNNs) have achieved excellent performance in video classification, recent research shows adding imperceptible perturbations to clean videos can make the well-trained models output wrong labels with high confidence. In this paper, we propose an effective defense framework to characterize and defend adversarial videos. The proposed method contains two phases: (1) adversarial video detection using temporal consistency between adjacent frames, and (2) adversarial perturbation reduction via denoisers in the spatial and temporal domains respectively. Specifically, because of the linear nature of DNNs, the imperceptible perturbations will enlarge with the increasing of DNNs depth, which leads to the inconsistency of DNNs output between adjacent frames. However, the benign video frames often have the same outputs with their neighbor frames owing to the slight changes. Based on this observation, we can distinguish between adversarial videos and benign videos. After that, we utilize different defense strategies against different attacks. We propose the temporal defense, which reconstructs the polluted frames with their temporally neighbor clean frames, to deal with the adversarial videos with sparse polluted frames. For the videos with dense polluted frames, we use an efficient adversarial denoiser to process each frame in the spatial domain, and thus purify the perturbations (we call it as spatial defense). A series of experiments conducted on the UCF-101 dataset demonstrate that the proposed method significantly improves the robustness of video classifiers against adversarial attacks. △ Less

Submitted 10 September, 2019; originally announced September 2019.

arXiv:1906.07341 [pdf, other]

Learning Personalized Attribute Preference via Multi-task AUC Optimization

Authors: Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang

Abstract: Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understa… ▽ More Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understand and predict personalized attribute annotations. Regarding the attribute preference learning for each annotator as a specific task, we first propose a multi-level task parameter decomposition to capture the evolution from a highly popular opinion of the mass to highly personalized choices that are special for each person. Meanwhile, for personalized learning methods, ranking prediction is much more important than accurate classification. This motivates us to employ an Area Under ROC Curve (AUC) based loss function to improve our model. On top of the AUC-based loss, we propose an efficient method to evaluate the loss and gradients. Theoretically, we propose a novel closed-form solution for one of our non-convex subproblem, which leads to provable convergence behaviors. Furthermore, we also provide a generalization bound to guarantee a reasonable performance. Finally, empirical analysis consistently speaks to the efficacy of our proposed method. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: AAAI2019 oral

Journal ref: AAAI2019 oral

arXiv:1903.03531 [pdf, other]

Consistent Bayesian Sparsity Selection for High-dimensional Gaussian DAG Models with Multiplicative and Beta-mixture Priors

Authors: Xuan Cao, Kshitij Khare, Malay Ghosh

Abstract: Estimation of the covariance matrix for high-dimensional multivariate datasets is a challenging and important problem in modern statistics. In this paper, we focus on high-dimensional Gaussian DAG models where sparsity is induced on the Cholesky factor L of the inverse covariance matrix. In recent work, ([Cao, Khare, and Ghosh, 2019]), we established high-dimensional sparsity selection consistency… ▽ More Estimation of the covariance matrix for high-dimensional multivariate datasets is a challenging and important problem in modern statistics. In this paper, we focus on high-dimensional Gaussian DAG models where sparsity is induced on the Cholesky factor L of the inverse covariance matrix. In recent work, ([Cao, Khare, and Ghosh, 2019]), we established high-dimensional sparsity selection consistency for a hierarchical Bayesian DAG model, where an Erdos-Renyi prior is placed on the sparsity pattern in the Cholesky factor L, and a DAG-Wishart prior is placed on the resulting non-zero Cholesky entries. In this paper we significantly improve and extend this work, by (a) considering more diverse and effective priors on the sparsity pattern in L, namely the beta-mixture prior and the multiplicative prior, and (b) establishing sparsity selection consistency under significantly relaxed conditions on p, and the sparsity pattern of the true model. We demonstrate the validity of our theoretical results via numerical simulations, and also use further simulations to demonstrate that our sparsity selection approach is competitive with existing state-of-the-art methods including both frequentist and Bayesian approaches in various settings. △ Less

Submitted 8 March, 2019; originally announced March 2019.

arXiv:1902.09353 [pdf, other]

A permutation-based Bayesian approach for inverse covariance estimation

Authors: Xuan Cao, Shaojun Zhang

Abstract: Covariance estimation and selection for multivariate datasets in a high-dimensional regime is a fundamental problem in modern statistics. Gaussian graphical models are a popular class of models used for this purpose. Current Bayesian methods for inverse covariance matrix estimation under Gaussian graphical models require the underlying graph and hence the ordering of variables to be known. However… ▽ More Covariance estimation and selection for multivariate datasets in a high-dimensional regime is a fundamental problem in modern statistics. Gaussian graphical models are a popular class of models used for this purpose. Current Bayesian methods for inverse covariance matrix estimation under Gaussian graphical models require the underlying graph and hence the ordering of variables to be known. However, in practice, such information on the true underlying model is often unavailable. We therefore propose a novel permutation-based Bayesian approach to tackle the unknown variable ordering issue. In particular, we utilize multiple maximum a posteriori estimates under the DAG-Wishart prior for each permutation, and subsequently construct the final estimate of the inverse covariance matrix. The proposed estimator has smaller variability and yields order-invariant property. We establish posterior convergence rates under mild assumptions and illustrate that our method outperforms existing approaches in estimating the inverse covariance matrices via simulation studies. △ Less

Submitted 25 February, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

Comments: The proof for posterior convergence rate for DAG-Wishart priors in this work can be found in arXiv:1611.01205

arXiv:1812.01945 [pdf, other]

Robust Ordinal Embedding from Contaminated Relative Comparisons

Authors: Ke Ma, Qianqian Xu, Xiaochun Cao

Abstract: Existing ordinal embedding methods usually follow a two-stage routine: outlier detection is first employed to pick out the inconsistent comparisons; then an embedding is learned from the clean data. However, learning in a multi-stage manner is well-known to suffer from sub-optimal solutions. In this paper, we propose a unified framework to jointly identify the contaminated comparisons and derive r… ▽ More Existing ordinal embedding methods usually follow a two-stage routine: outlier detection is first employed to pick out the inconsistent comparisons; then an embedding is learned from the clean data. However, learning in a multi-stage manner is well-known to suffer from sub-optimal solutions. In this paper, we propose a unified framework to jointly identify the contaminated comparisons and derive reliable embeddings. The merits of our method are three-fold: (1) By virtue of the proposed unified framework, the sub-optimality of traditional methods is largely alleviated; (2) The proposed method is aware of global inconsistency by minimizing a corresponding cost, while traditional methods only involve local inconsistency; (3) Instead of considering the nuclear norm heuristics, we adopt an exact solution for rank equality constraint. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ordinal embedding from the contaminated comparisons. △ Less

Submitted 5 December, 2018; originally announced December 2018.

Comments: Accepted by AAAI 2019

arXiv:1812.01939 [pdf, other]

Less but Better: Generalization Enhancement of Ordinal Embedding via Distributional Margin

Authors: Ke Ma, Qianqian Xu, Zhiyong Yang, Xiaochun Cao

Abstract: In the absence of prior knowledge, ordinal embedding methods obtain new representation for items in a low-dimensional Euclidean space via a set of quadruple-wise comparisons. These ordinal comparisons often come from human annotators, and sufficient comparisons induce the success of classical approaches. However, collecting a large number of labeled data is known as a hard task, and most of the ex… ▽ More In the absence of prior knowledge, ordinal embedding methods obtain new representation for items in a low-dimensional Euclidean space via a set of quadruple-wise comparisons. These ordinal comparisons often come from human annotators, and sufficient comparisons induce the success of classical approaches. However, collecting a large number of labeled data is known as a hard task, and most of the existing work pay little attention to the generalization ability with insufficient samples. Meanwhile, recent progress in large margin theory discloses that rather than just maximizing the minimum margin, both the margin mean and variance, which characterize the margin distribution, are more crucial to the overall generalization performance. To address the issue of insufficient training samples, we propose a margin distribution learning paradigm for ordinal embedding, entitled Distributional Margin based Ordinal Embedding (\textit{DMOE}). Precisely, we first define the margin for ordinal embedding problem. Secondly, we formulate a concise objective function which avoids maximizing margin mean and minimizing margin variance directly but exhibits the similar effect. Moreover, an Augmented Lagrange Multiplier based algorithm is customized to seek the optimal solution of \textit{DMOE} effectively. Experimental studies on both simulated and real-world datasets are provided to show the effectiveness of the proposed algorithm. △ Less

Submitted 5 December, 2018; originally announced December 2018.

Comments: Accepted by AAAI 2019

arXiv:1809.10962

Target-Independent Active Learning via Distribution-Splitting

Authors: Xiaofeng Cao, Ivor W. Tsang, Xiaofeng Xu, Guandong Xu

Abstract: To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by int… ▽ More To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by introducing a novel notion of number density which provides a more natural and direct way to describe the hypothesis distribution than volume. By discovering the connections between hypothesis and input distribution, we map the volume of version space into the number density and propose a target-independent distribution-splitting strategy with the following advantages: 1) provide theoretical guarantees on reducing label complexity and error rate as volume-splitting; 2) break the curse of initial hypothesis; 3) provide model guidance for a target-independent AL algorithm in real AL tasks. With these guarantees, for AL application, we then split the input distribution into more near-optimal spheres and develop an application algorithm called Distribution-based A^2 (DA^2). Experiments further verify the effectiveness of the halving and querying abilities of DA^2. Contributions of this paper are as follows. △ Less

Submitted 25 September, 2020; v1 submitted 28 September, 2018; originally announced September 2018.

Comments: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updating

arXiv:1809.07694 [pdf]

Improved Online Wilson Score Interval Method for Community Answer Quality Ranking

Authors: Xin Cao

Abstract: In this paper, a fast and easy-to-deploy method with a strong interpretability for community answer quality ranking is proposed. This method is improved based on the Wilson score interval method [Wilson, 1927], which retains its advantages and simultaneously improve the degree of satisfaction with the ranking of the high-quality answers. The improved answer quality score considers both Wilson scor… ▽ More In this paper, a fast and easy-to-deploy method with a strong interpretability for community answer quality ranking is proposed. This method is improved based on the Wilson score interval method [Wilson, 1927], which retains its advantages and simultaneously improve the degree of satisfaction with the ranking of the high-quality answers. The improved answer quality score considers both Wilson score interval and the spotlight index, the latter of which will be introduced in the article. This method could significantly improve the ranking of the best answers with high attention in diverse scenarios. △ Less

Submitted 4 September, 2018; originally announced September 2018.

Comments: 8 pages, 13 figures

arXiv:1809.01804 [pdf, other]

Discovering Influential Factors in Variational Autoencoder

Authors: Shiqi Liu, **gxin Liu, Qian Zhao, Xiangyong Cao, Huibin Li, Hongying Meng, Sheng Liu, Deyu Meng

Abstract: In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervening or intuition assistance to extract useful knowledge or serve for the downstream tasks. In this work, we focus on supervising the influential factors extracted by the variational autoencoder(VAE). The VAE is proposed to learn independent low dimension repr… ▽ More In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervening or intuition assistance to extract useful knowledge or serve for the downstream tasks. In this work, we focus on supervising the influential factors extracted by the variational autoencoder(VAE). The VAE is proposed to learn independent low dimension representation while facing the problem that sometimes pre-set factors are ignored. We argue that the mutual information of the input and each learned factor of the representation plays a necessary indicator of discovering the influential factors. We find the VAE objective inclines to induce mutual information sparsity in factor dimension over the data intrinsic dimension and results in some non-influential factors whose function on data reconstruction could be ignored. We show mutual information also influences the lower bound of VAE's reconstruction error and downstream classification task. To make such indicator applicable, we design an algorithm for calculating the mutual information for VAE and prove its consistency. Experimental results on MNIST, CelebA and DEAP datasets show that mutual information can help determine influential factors, of which some are interpretable and can be used to further generation and classification tasks, and help discover the variant that connects with emotion on DEAP dataset. △ Less

Submitted 5 April, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: 15 pages, 8 figures

arXiv:1807.11014 [pdf, other]

doi 10.1145/3240508.3240597

A Margin-based MLE for Crowdsourced Partial Ranking

Authors: Qianqian Xu, Jiechao Xiong, Xinwei Sun, Zhiyong Yang, Xiaochun Cao, Qingming Huang, Yuan Yao

Abstract: A preference order or ranking aggregated from pairwise comparison data is commonly understood as a strict total order. However, in real-world scenarios, some items are intrinsically ambiguous in comparisons, which may very well be an inherent uncertainty of the data. In this case, the conventional total order ranking can not capture such uncertainty with mere global ranking or utility scores. In t… ▽ More A preference order or ranking aggregated from pairwise comparison data is commonly understood as a strict total order. However, in real-world scenarios, some items are intrinsically ambiguous in comparisons, which may very well be an inherent uncertainty of the data. In this case, the conventional total order ranking can not capture such uncertainty with mere global ranking or utility scores. In this paper, we are specifically interested in the recent surge in crowdsourcing applications to predict partial but more accurate (i.e., making less incorrect statements) orders rather than complete ones. To do so, we propose a novel framework to learn some probabilistic models of partial orders as a \emph{margin-based Maximum Likelihood Estimate} (MLE) method. We prove that the induced MLE is a joint convex optimization problem with respect to all the parameters, including the global ranking scores and margin parameter. Moreover, three kinds of generalized linear models are studied, including the basic uniform model, Bradley-Terry model, and Thurstone-Mosteller model, equipped with some theoretical analysis on FDR and Power control for the proposed methods. The validity of these models are supported by experiments with both simulated and real-world datasets, which shows that the proposed models exhibit improvements compared with traditional state-of-the-art algorithms. △ Less

Submitted 29 July, 2018; originally announced July 2018.

Comments: 9 pages, Accepted by ACM Multimedia 2018 as a full paper

arXiv:1807.08904

A Structured Perspective of Volumes on Active Learning

Authors: Xiaofeng Cao, Ivor W. Tsang, Guandong Xu

Abstract: Active Learning (AL) is a learning task that requires learners interactively query the labels of the sampled unlabeled instances to minimize the training outputs with human supervisions. In theoretical study, learners approximate the version space which covers all possible classification hypothesis into a bounded convex body and try to shrink the volume of it into a half-space by a given cut size.… ▽ More Active Learning (AL) is a learning task that requires learners interactively query the labels of the sampled unlabeled instances to minimize the training outputs with human supervisions. In theoretical study, learners approximate the version space which covers all possible classification hypothesis into a bounded convex body and try to shrink the volume of it into a half-space by a given cut size. However, only the hypersphere with finite VC dimensions has obtained formal approximation guarantees that hold when the classes of Euclidean space are separable with a margin. In this paper, we approximate the version space to a structured {hypersphere} that covers most of the hypotheses, and then divide the available AL sampling approaches into two kinds of strategies: Outer Volume Sampling and Inner Volume Sampling. After providing provable guarantees for the performance of AL in version space, we aggregate the two kinds of volumes to eliminate their sampling biases via finding the optimal inscribed hyperspheres in the enclosing space of outer volume. To touch the version space from Euclidean space, we propose a theoretical bridge called Volume-based Model that increases the `sampling target-independent'. In non-linear feature space, spanned by kernel, we use sequential optimization to globally optimize the original space to a sparse space by halving the size of the kernel space. Then, the EM (Expectation Maximization) model which returns the local center helps us to find a local representation. To describe this process, we propose an easy-to-implement algorithm called Volume-based AL (VAL). △ Less

Submitted 25 September, 2020; v1 submitted 24 July, 2018; originally announced July 2018.

Comments: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updating

arXiv:1805.12321

A Divide-and-Conquer Approach to Geometric Sampling for Active Learning

Authors: Xiaofeng Cao

Abstract: Active learning (AL) repeatedly trains the classifier with the minimum labeling budget to improve the current classification model. The training process is usually supervised by an uncertainty evaluation strategy. However, the uncertainty evaluation always suffers from performance degeneration when the initial labeled set has insufficient labels. To completely eliminate the dependence on the uncer… ▽ More Active learning (AL) repeatedly trains the classifier with the minimum labeling budget to improve the current classification model. The training process is usually supervised by an uncertainty evaluation strategy. However, the uncertainty evaluation always suffers from performance degeneration when the initial labeled set has insufficient labels. To completely eliminate the dependence on the uncertainty evaluation sampling in AL, this paper proposes a divide-and-conquer idea that directly transfers the AL sampling as the geometric sampling over the clusters. By dividing the points of the clusters into cluster boundary and core points, we theoretically discuss their margin distance and {hypothesis relationship}. With the advantages of cluster boundary points in the above two properties, we propose a Geometric Active Learning (GAL) algorithm by knight's tour. Experimental studies of the two reported experimental tasks including cluster boundary detection and AL classification show that the proposed GAL method significantly outperforms the state-of-the-art baselines. △ Less

Submitted 25 September, 2020; v1 submitted 31 May, 2018; originally announced May 2018.

Comments: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updating

arXiv:1803.06711 [pdf, other]

A Dynamic Additive and Multiplicative Effects Model with Application to the United Nations Voting Behaviors

Authors: Bomin Kim, Xiaoyue Niu, David R. Hunter, Xun Cao

Abstract: Motivated by a study of United Nations voting behaviors, we introduce a regression model for a series of networks that are correlated over time. Our model is a dynamic extension of the additive and multiplicative effects network model (AMEN) of Hoff (2019). In addition to incorporating a temporal structure, the model accommodates two types of missing data thus allows the size of the network to var… ▽ More Motivated by a study of United Nations voting behaviors, we introduce a regression model for a series of networks that are correlated over time. Our model is a dynamic extension of the additive and multiplicative effects network model (AMEN) of Hoff (2019). In addition to incorporating a temporal structure, the model accommodates two types of missing data thus allows the size of the network to vary over time. We demonstrate via simulations the necessity of various components of the model. We apply the model to the United Nations General Assembly voting data from 1983 to 2014 (Voeten (2013)) to answer interesting research questions regarding international voting behaviors. In addition to finding important factors that could explain the voting behaviors, the model-estimated additive effects, multiplicative effects, and their movements reveal meaningful foreign policy positions and alliances of various countries. △ Less

Submitted 21 March, 2023; v1 submitted 18 March, 2018; originally announced March 2018.

arXiv:1712.09520 [pdf, other]

Tensor Regression Networks with various Low-Rank Tensor Approximations

Authors: Xingwei Cao, Guillaume Rabusseau

Abstract: Tensor regression networks achieve high compression rate of neural networks while having slight impact on performances. They do so by imposing low tensor rank structure on the weight matrices of fully connected layers. In recent years, tensor regression networks have been investigated from the perspective of their compressive power, however, the regularization effect of enforcing low-rank tensor s… ▽ More Tensor regression networks achieve high compression rate of neural networks while having slight impact on performances. They do so by imposing low tensor rank structure on the weight matrices of fully connected layers. In recent years, tensor regression networks have been investigated from the perspective of their compressive power, however, the regularization effect of enforcing low-rank tensor structure has not been investigated enough. We study tensor regression networks using various low-rank tensor approximations, aiming to compare the compressive and regularization power of different low-rank constraints. We evaluate the compressive and regularization performances of the proposed model with both deep and shallow convolutional neural networks. The outcome of our experiment suggests the superiority of Global Average Pooling Layer over Tensor Regression Layer when applied to deep convolutional neural network with CIFAR-10 dataset. On the contrary, shallow convolutional neural networks with tensor regression layer and dropout achieved lower test error than both Global Average Pooling and fully-connected layer with dropout function when trained with a small number of samples. △ Less

Submitted 28 November, 2018; v1 submitted 27 December, 2017; originally announced December 2017.

arXiv:1711.06446 [pdf, other]

Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

Authors: Ke Ma, **shan Zeng, Jiechao Xiong, Qianqian Xu, Xiaochun Cao, Wei Liu, Yuan Yao

Abstract: Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the… ▽ More Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the update, especially when the data size is very large. To overcome this challenge, we propose a stochastic algorithm called SVRG-SBB, which has the following features: (a) SVD-free via drop** convexity, with good scalability by the use of stochastic algorithm, i.e., stochastic variance reduced gradient (SVRG), and (b) adaptive step size choice via introducing a new stabilized Barzilai-Borwein (SBB) method as the original version for convex problems might fail for the considered stochastic \textit{non-convex} optimization problem. Moreover, we show that the proposed algorithm converges to a stationary point at a rate $\mathcal{O}(\frac{1}{T})$ in our setting, where $T$ is the number of total iterations. Numerous simulations and real-world data experiments are conducted to show the effectiveness of the proposed algorithm via comparing with the state-of-the-art methods, particularly, much lower computational cost with good prediction performance. △ Less

Submitted 30 January, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: 11 pages, 3 figures, 2 tables, accepted by AAAI2018

MSC Class: aaai.org

arXiv:1710.10772 [pdf, other]

Tensorizing Generative Adversarial Nets

Authors: Xingwei Cao, Xuyang Zhao, Qibin Zhao

Abstract: Generative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such a… ▽ More Generative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such as mobile phones. In this paper, we present a new generative adversarial framework by representing each layer as a tensor structure connected by multilinear operations, aiming to reduce the number of model parameters by a large factor while preserving the generative performance and sample quality. To learn the model, we employ an efficient algorithm which alternatively optimizes both discriminator and generator. Experimental outcomes demonstrate that our model can achieve high compression rate for model parameters up to $35$ times when compared to the original GAN for MNIST dataset. △ Less

Submitted 29 March, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: 4 pages, 3 figures

arXiv:1709.05583 [pdf, other]

Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification

Authors: Xiaoyu Cao, Neil Zhenqiang Gong

Abstract: Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppose we have a testing example, whose label can be correctly predicted by a DNN classifier. An attacker… ▽ More Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppose we have a testing example, whose label can be correctly predicted by a DNN classifier. An attacker can add a small carefully crafted noise to the testing example such that the DNN classifier predicts an incorrect label, where the crafted testing example is called adversarial example. Such attacks are called evasion attacks. Evasion attacks are one of the biggest challenges for deploying DNNs in safety and security critical applications such as self-driving cars. In this work, we develop new methods to defend against evasion attacks. Our key observation is that adversarial examples are close to the classification boundary. Therefore, we propose region-based classification to be robust to adversarial examples. For a benign/adversarial testing example, we ensemble information in a hypercube centered at the example to predict its label. In contrast, traditional classifiers are point-based classification, i.e., given a testing example, the classifier predicts its label based on the testing example alone. Our evaluation results on MNIST and CIFAR-10 datasets demonstrate that our region-based classification can significantly mitigate evasion attacks without sacrificing classification accuracy on benign examples. Specifically, our region-based classification achieves the same classification accuracy on testing benign examples as point-based classification, but our region-based classification is significantly more robust than point-based classification to various evasion attacks. △ Less

Submitted 31 December, 2019; v1 submitted 16 September, 2017; originally announced September 2017.

Comments: 33rd Annual Computer Security Applications Conference (ACSAC), 2017

arXiv:1611.01205 [pdf, ps, other]

Posterior Graph Selection and Estimation Consistency for High-dimensional Bayesian DAG Models

Authors: Xuan Cao, Kshitij Khare, Malay Ghosh

Abstract: Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independenc… ▽ More Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independence assumptions on the underlying variables. A variety of priors have been developed in recent years for Bayesian inference in DAG models, yet crucial convergence and sparsity selection properties for these models have not been thoroughly investigated. Most of these priors are adaptations or generalizations of the Wishart distribution in the DAG context. In this paper, we consider a flexible and general class of these 'DAG-Wishart' priors with multiple shape parameters. Under mild regularity assumptions, we establish strong graph selection consistency and establish posterior convergence rates for estimation when the number of variables p is allowed to grow at an appropriate sub-exponential rate with the sample size n. △ Less

Submitted 11 October, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

MSC Class: 62F15; 62G20

arXiv:1605.05860 [pdf, other]

False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking

Authors: Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao

Abstract: With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time. However due to the lack of control over the quality of the annotators, some abnormal annotators may be affected by position bias which can potentially degrade the quality of the final consensus labels. In this paper we introduce a stati… ▽ More With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time. However due to the lack of control over the quality of the annotators, some abnormal annotators may be affected by position bias which can potentially degrade the quality of the final consensus labels. In this paper we introduce a statistical framework to model and detect annotator's position bias in order to control the false discovery rate (FDR) without a prior knowledge on the amount of biased annotators - the expected fraction of false discoveries among all discoveries being not too high, in order to assure that most of the discoveries are indeed true and replicable. The key technical development relies on some new knockoff filters adapted to our problem and new algorithms based on the Inverse Scale Space dynamics whose discretization is potentially suitable for large scale crowdsourcing data analysis. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a useful tool for quantitatively studying annotator's abnormal behavior in crowdsourcing data arising from machine learning, sociology, computer vision, multimedia, etc. △ Less

Submitted 16 June, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: ICML 2016 accepted

arXiv:1408.3467 [pdf, other]

Evaluating Visual Properties via Robust HodgeRank

Authors: Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan Yao

Abstract: Nowadays, how to effectively evaluate visual properties has become a popular topic for fine-grained visual comprehension. In this paper we study the problem of how to estimate such visual properties from a ranking perspective with the help of the annotators from online crowdsourcing platforms. The main challenges of our task are two-fold. On one hand, the annotations often contain contaminated inf… ▽ More Nowadays, how to effectively evaluate visual properties has become a popular topic for fine-grained visual comprehension. In this paper we study the problem of how to estimate such visual properties from a ranking perspective with the help of the annotators from online crowdsourcing platforms. The main challenges of our task are two-fold. On one hand, the annotations often contain contaminated information, where a small fraction of label flips might ruin the global ranking of the whole dataset. On the other hand, considering the large data capacity, the annotations are often far from being complete. What is worse, there might even exist imbalanced annotations where a small subset of samples are frequently annotated. Facing such challenges, we propose a robust ranking framework based on the principle of Hodge decomposition of imbalanced and incomplete ranking data. According to the HodgeRank theory, we find that the major source of the contamination comes from the cyclic ranking component of the Hodge decomposition. This leads us to an outlier detection formulation as sparse approximations of the cyclic ranking projection. Taking a step further, it facilitates a novel outlier detection model as Huber's LASSO in robust statistics. Moreover, simple yet scalable algorithms are developed based on Linearized Bregman Iteration to achieve an even less biased estimator. Statistical consistency of outlier detection is established in both cases under nearly the same conditions. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a promising tool for robust ranking with large scale crowdsourcing data arising from computer vision. △ Less

Submitted 17 January, 2021; v1 submitted 15 August, 2014; originally announced August 2014.

Comments: 25 pages, 24 figures

arXiv:1405.1491 [pdf]

Demonstration of Enhanced Monte Carlo Computation of the Fisher Information for Complex Problems

Authors: Xumeng Cao

Abstract: The Fisher information matrix summarizes the amount of information in a set of data relative to the quantities of interest. There are many applications of the information matrix in statistical modeling, system identification and parameter estimation. This short paper reviews a feedback-based method and an independent perturbation approach for computing the information matrix for complex problems,… ▽ More The Fisher information matrix summarizes the amount of information in a set of data relative to the quantities of interest. There are many applications of the information matrix in statistical modeling, system identification and parameter estimation. This short paper reviews a feedback-based method and an independent perturbation approach for computing the information matrix for complex problems, where a closed form of the information matrix is not achievable. We show through numerical examples how these methods improve the accuracy of the estimate of the information matrix compared to the basic resampling-based approach. Some relevant theory is summarized. △ Less

Submitted 6 May, 2014; originally announced May 2014.

Showing 1–50 of 52 results for author: Cao, X