Skip to main content

Showing 1–50 of 52 results for author: Cao, X

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.05545  [pdf, other

    cs.LG stat.ML

    Deep Hierarchical Graph Alignment Kernels

    Authors: Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

    Abstract: Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relati… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2404.10004  [pdf

    cs.LG physics.soc-ph stat.AP

    A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

    Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

    Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 20 pages, 9 figures

  3. arXiv:2404.08898  [pdf, other

    stat.CO

    Using early rejection Markov chain Monte Carlo and Gaussian processes to accelerate ABC methods

    Authors: Xuefei Cao, Shijia Wang, Yongdao Zhou

    Abstract: Approximate Bayesian computation (ABC) is a class of Bayesian inference algorithms that targets for problems with intractable or {unavailable} likelihood function. It uses synthetic data drawn from the simulation model to approximate the posterior distribution. However, ABC is computationally intensive for complex models in which simulating synthetic data is very expensive. In this article, we pro… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  4. arXiv:2311.01435  [pdf, other

    cs.LG math.PR stat.ML

    Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

    Authors: Xinyuan Cao, Santosh S. Vempala

    Abstract: We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in $d$-dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the $d$-fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an $ε$ fraction of the data in one of the com… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Preliminary version in NeurIPS 2023

  5. arXiv:2310.07990  [pdf

    q-bio.GN cs.IR cs.LG stat.AP

    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

    Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

    Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  6. arXiv:2310.00864  [pdf, other

    stat.ME

    Multi-Label Residual Weighted Learning for Individualized Combination Treatment Rule

    Authors: Qi Xu, Xiaoke Cao, Ge** Chen, Hanqi Zeng, Haoda Fu, Annie Qu

    Abstract: Individualized treatment rules (ITRs) have been widely applied in many fields such as precision medicine and personalized marketing. Beyond the extensive studies on ITR for binary or multiple treatments, there is considerable interest in applying combination treatments. This paper introduces a novel ITR estimation method for combination treatments incorporating interaction effects among treatments… ▽ More

    Submitted 7 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  7. ggpicrust2: an R package for PICRUSt2 predicted functional profile analysis and visualization

    Authors: Chen Yang, Jiahao Mai, Xuan Cao, Aaron Burberry, Fabio Cominelli, Liangliang Zhang

    Abstract: Microbiome research is now moving beyond the compositional analysis of microbial taxa in a sample. Increasing evidence from large human microbiome studies suggests that functional consequences of changes in the intestinal microbiome may provide more power for studying their impact on inflammation and immune responses. Although 16S rRNA analysis is one of the most popular and a cost-effective metho… ▽ More

    Submitted 9 April, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: 4 pages, 1 figure

  8. arXiv:2210.00415  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Metric Distribution to Vector: Constructing Data Representation via Broad-Scale Discrepancies

    Authors: Xue Liu, Dan Sun, Xiaobo Cao, Hao Ye, Wei Wei

    Abstract: Graph embedding provides a feasible methodology to conduct pattern classification for graph-structured data by map** each data into the vectorial space. Various pioneering works are essentially coding method that concentrates on a vectorial representation about the inner properties of a graph in terms of the topological constitution, node attributions, link relations, etc. However, the classific… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  9. arXiv:2209.12611  [pdf, other

    cs.LG cs.CV stat.ML

    MaxMatch: Semi-Supervised Learning with Worst-Case Consistency

    Authors: Yangbangyan Jiang, Xiaodan Li, Yuefeng Chen, Yuan He, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang

    Abstract: In recent years, great progress has been made to incorporate unlabeled data to overcome the inefficiently supervised problem via semi-supervised learning (SSL). Most state-of-the-art models are based on the idea of pursuing consistent model predictions over unlabeled data toward the input noise, which is called consistency regularization. Nonetheless, there is a lack of theoretical insights into t… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Accepted to IEEE TPAMI

  10. arXiv:2209.05742  [pdf, other

    cs.LG cs.CR cs.GT stat.ML

    A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game

    Authors: Ke Ma, Qianqian Xu, **shan Zeng, Guorong Li, Xiaochun Cao, Qingming Huang

    Abstract: Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and in… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 33 pages, https://github.com/alphaprime/Target_Attack_Rank_Aggregation

    Journal ref: Early Access by TPAMI 2022 (https://ieeexplore.ieee.org/document/9830042)

  11. Optimizing Two-way Partial AUC with an End-to-end Framework

    Authors: Zhiyong Yang, Qianqian Xu, Shilong Bao, Yuan He, Xiaochun Cao, Qingming Huang

    Abstract: The Area Under the ROC Curve (AUC) is a crucial metric for machine learning, which evaluates the average performance over all possible True Positive Rates (TPRs) and False Positive Rates (FPRs). Based on the knowledge that a skillful classifier should simultaneously embrace a high TPR and a low FPR, we turn to study a more general variant called Two-way Partial AUC (TPAUC), where only the region w… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  12. arXiv:2203.07110  [pdf, other

    stat.ME math.ST

    Bayesian inference on hierarchical nonlocal priors in generalized linear models

    Authors: Xuan Cao, Kyoungjae Lee

    Abstract: Variable selection methods with nonlocal priors have been widely studied in linear regression models, and their theoretical and empirical performances have been reported. However, the crucial model selection properties for hierarchical nonlocal priors in high-dimensional generalized linear regression have rarely been investigated. In this paper, we consider a hierarchical nonlocal prior for high-d… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  13. arXiv:2203.07108  [pdf, other

    stat.ME math.ST

    Consistent and scalable Bayesian joint variable and graph selection for disease diagnosis leveraging functional brain network

    Authors: Xuan Cao, Kyoungjae Lee

    Abstract: We consider the joint inference of regression coefficients and the inverse covariance matrix for covariates in high-dimensional probit regression, where the predictors are both relevant to the binary response and functionally related to one another. A hierarchical model with spike and slab priors over regression coefficients and the elements in the inverse covariance matrix is employed to simultan… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  14. arXiv:2111.11676  [pdf, other

    stat.ML cs.LG

    RIO: Rotation-equivariance supervised learning of robust inertial odometry

    Authors: Caifa Zhou, Xiya Cao, Dandan Zeng, Yongliang Wang

    Abstract: This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Furthe… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: 12 pages, 17 figures, 2 tables

  15. arXiv:2110.14098  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Lifelong Learning of Representations

    Authors: Xinyuan Cao, Weiyang Liu, Santosh S. Vempala

    Abstract: In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a… ▽ More

    Submitted 1 March, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted to AISTATS 2022

  16. arXiv:2110.00681  [pdf, other

    q-bio.GN cs.LG stat.AP

    A systematic evaluation of methods for cell phenotype classification using single-cell RNA sequencing data

    Authors: Xiaowen Cao, Li Xing, Elham Majd, Hua He, Junhua Gu, Xuekui Zhang

    Abstract: Background: Single-cell RNA sequencing (scRNA-seq) yields valuable insights about gene expression and gives critical information about complex tissue cellular composition. In the analysis of single-cell RNA sequencing, the annotations of cell subtypes are often done manually, which is time-consuming and irreproducible. Garnett is a cell-type annotation software based the on elastic net method. Bes… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: 21 pages, 4 figures, 1 table

    Journal ref: Front. Genet. 13:836798 (2022)

  17. arXiv:2109.14142  [pdf, ps, other

    cs.LG stat.ML

    On the Provable Generalization of Recurrent Neural Networks

    Authors: Lifu Wang, Bo Shen, Bo Hu, Xing Cao

    Abstract: Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and… ▽ More

    Submitted 26 January, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: Accepted to Neurips 2021

  18. arXiv:2008.06190  [pdf, other

    stat.ME

    Bayesian joint inference for multiple directed acyclic graphs

    Authors: Kyoungjae Lee, Xuan Cao

    Abstract: In many applications, data often arise from multiple groups that may share similar characteristics. A joint estimation method that models several groups simultaneously can be more efficient than estimating parameters in each group separately. We focus on unraveling the dependence structures of data based on directed acyclic graphs and propose a Bayesian joint inference method for multiple graphs.… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  19. arXiv:2007.08053  [pdf, other

    cs.LG cs.SI stat.ML

    Inductive Link Prediction for Nodes Having Only Attribute Information

    Authors: Yu Hao, Xin Cao, Yixiang Fang, Xike Xie, Sibo Wang

    Abstract: Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute i… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: IJCAI2020

  20. arXiv:2006.09104  [pdf, other

    cs.LG stat.ML

    New Interpretations of Normalization Methods in Deep Learning

    Authors: Jiacheng Sun, Xiangyong Cao, Hanwen Liang, Weiran Huang, Zewei Chen, Zhenguo Li

    Abstract: In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However, mathematical tools to analyze all these normalization methods are lacking. In this paper, we first propose a lemma to define some necessary tools. Then, we use these tools… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted by AAAI 2020

  21. arXiv:2005.03694  [pdf, other

    stat.ME

    High-Dimensional Inference Based on the Leave-One-Covariate-Out LASSO Path

    Authors: Xiangyang Cao, Karl Gregory, Dewei Wang

    Abstract: We propose a new measure of variable importance in high-dimensional regression based on the change in the LASSO solution path when one covariate is left out. The proposed procedure provides a novel way to calculate variable importance and conduct variable screening. In addition, our procedure allows for the construction of P-values for testing whether each coefficient is equal to zero as well as f… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: 24 pages, 8 figures

    MSC Class: 62J07 (Primary) 62F40 (Secondary)

  22. arXiv:2004.13930  [pdf, other

    cs.LG stat.ML

    Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction

    Authors: Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang

    Abstract: As an effective learning paradigm against insufficient training samples, Multi-Task Learning (MTL) encourages knowledge sharing across multiple related tasks so as to improve the overall performance. In MTL, a major challenge springs from the phenomenon that sharing the knowledge with dissimilar and hard tasks, known as negative transfer, often results in a worsened performance. Though a substanti… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: To appear in T-PAMI

  23. arXiv:2004.09306  [pdf, ps, other

    math.ST stat.ME

    Joint Bayesian Variable and DAG Selection Consistency for High-dimensional Regression Models with Network-structured Covariates

    Authors: Xuan Cao, Kyoungjae Lee

    Abstract: We consider the joint sparse estimation of regression coefficients and the covariance matrix for covariates in a high-dimensional regression model, where the predictors are both relevant to a response variable of interest and functionally related to one another via a Gaussian directed acyclic graph (DAG) model. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  24. arXiv:2004.08237  [pdf, other

    eess.IV cs.LG stat.ML

    CAggNet: Crossing Aggregation Network for Medical Image Segmentation

    Authors: Xu Cao, Yanghao Lin

    Abstract: In this paper, we present Crossing Aggregation Network (CAggNet), a novel densely connected semantic segmentation approach for medical image analysis. The crossing aggregation network improves the idea from deep layer aggregation and makes significant innovations in semantic and spatial information fusion. In CAggNet, the simple skip connection structure of general U-Net is replaced by aggregation… ▽ More

    Submitted 7 November, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted by ICPR 2020

  25. arXiv:2002.06442  [pdf, other

    cs.DB cs.LG stat.ML

    Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach

    Authors: Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, Makoto Onizuka

    Abstract: Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applicat… ▽ More

    Submitted 24 September, 2021; v1 submitted 15 February, 2020; originally announced February 2020.

    ACM Class: H.2.4; I.5.1

  26. arXiv:1912.09899  [pdf, other

    cs.LG cs.CR stat.ML

    Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

    Authors: **yuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong

    Abstract: It is well-known that classifiers are vulnerable to adversarial perturbations. To defend against adversarial perturbations, various certified robustness results have been derived. However, existing certified robustnesses are limited to top-1 predictions. In many real-world applications, top-$k$ predictions are more relevant. In this work, we aim to derive certified robustness for top-$k$ predictio… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

    Comments: ICLR 2020, code is available at this: https://github.com/jjy1994/Certify_Topk

  27. arXiv:1912.01833  [pdf, other

    stat.ME

    Bayesian Group Selection in Logistic Regression with Application to MRI Data Analysis

    Authors: Kyoungjae Lee, Xuan Cao

    Abstract: We consider Bayesian logistic regression models with group-structured covariates. In high-dimensional settings, it is often assumed that only small portion of groups are significant, thus consistent group selection is of significant importance. While consistent frequentist group selection methods have been proposed, theoretical properties of Bayesian group selection methods for logistic regression… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

  28. arXiv:1912.00362  [pdf, other

    cs.LG math.OC stat.ML

    Fast Stochastic Ordinal Embedding with Variance Reduction and Adaptive Step Size

    Authors: Ke Ma, **shan Zeng, Qianqian Xu, Xiaochun Cao, Wei Liu, Yuan Yao

    Abstract: Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are based on semi-definite programming (\textit{SDP}), which is generally time-consuming and degrades the scalability, especially confronting large-scale data. To overcome this challenge, we propose a stochastic algorithm called \textit{… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

    Comments: 19 pages, 5 figures, accepted by IEEE Transaction on Knowledge and Data Engineering, Conference Version: arXiv:1711.06446

  29. arXiv:1910.05905  [pdf, other

    cs.LG stat.ML

    iSplit LBI: Individualized Partial Ranking with Ties via Split LBI

    Authors: Qianqian Xu, Xinwei Sun, Zhiyong Yang, Xiaochun Cao, Qingming Huang, Yuan Yao

    Abstract: Due to the inherent uncertainty of data, the problem of predicting partial ranking from pairwise comparison data with ties has attracted increasing interest in recent years. However, in real-world scenarios, different individuals often hold distinct preferences. It might be misleading to merely look at a global partial ranking while ignoring personal diversity. In this paper, instead of learning a… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

    Comments: Accepted by NeurIPS 2019

  30. arXiv:1909.04837  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Identifying and Resisting Adversarial Videos Using Temporal Consistency

    Authors: Xiaojun Jia, Xingxing Wei, Xiaochun Cao

    Abstract: Video classification is a challenging task in computer vision. Although Deep Neural Networks (DNNs) have achieved excellent performance in video classification, recent research shows adding imperceptible perturbations to clean videos can make the well-trained models output wrong labels with high confidence. In this paper, we propose an effective defense framework to characterize and defend adversa… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  31. arXiv:1906.07341  [pdf, other

    cs.LG cs.CV stat.ML

    Learning Personalized Attribute Preference via Multi-task AUC Optimization

    Authors: Zhiyong Yang, Qianqian Xu, Xiaochun Cao, Qingming Huang

    Abstract: Traditionally, most of the existing attribute learning methods are trained based on the consensus of annotations aggregated from a limited number of annotators. However, the consensus might fail in settings, especially when a wide spectrum of annotators with different interests and comprehension about the attribute words are involved. In this paper, we develop a novel multi-task method to understa… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: AAAI2019 oral

    Journal ref: AAAI2019 oral

  32. arXiv:1903.03531  [pdf, other

    math.ST stat.ME

    Consistent Bayesian Sparsity Selection for High-dimensional Gaussian DAG Models with Multiplicative and Beta-mixture Priors

    Authors: Xuan Cao, Kshitij Khare, Malay Ghosh

    Abstract: Estimation of the covariance matrix for high-dimensional multivariate datasets is a challenging and important problem in modern statistics. In this paper, we focus on high-dimensional Gaussian DAG models where sparsity is induced on the Cholesky factor L of the inverse covariance matrix. In recent work, ([Cao, Khare, and Ghosh, 2019]), we established high-dimensional sparsity selection consistency… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

  33. arXiv:1902.09353  [pdf, other

    stat.ME

    A permutation-based Bayesian approach for inverse covariance estimation

    Authors: Xuan Cao, Shaojun Zhang

    Abstract: Covariance estimation and selection for multivariate datasets in a high-dimensional regime is a fundamental problem in modern statistics. Gaussian graphical models are a popular class of models used for this purpose. Current Bayesian methods for inverse covariance matrix estimation under Gaussian graphical models require the underlying graph and hence the ordering of variables to be known. However… ▽ More

    Submitted 25 February, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

    Comments: The proof for posterior convergence rate for DAG-Wishart priors in this work can be found in arXiv:1611.01205

  34. arXiv:1812.01945  [pdf, other

    cs.LG stat.ML

    Robust Ordinal Embedding from Contaminated Relative Comparisons

    Authors: Ke Ma, Qianqian Xu, Xiaochun Cao

    Abstract: Existing ordinal embedding methods usually follow a two-stage routine: outlier detection is first employed to pick out the inconsistent comparisons; then an embedding is learned from the clean data. However, learning in a multi-stage manner is well-known to suffer from sub-optimal solutions. In this paper, we propose a unified framework to jointly identify the contaminated comparisons and derive r… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

    Comments: Accepted by AAAI 2019

  35. arXiv:1812.01939  [pdf, other

    cs.LG stat.ML

    Less but Better: Generalization Enhancement of Ordinal Embedding via Distributional Margin

    Authors: Ke Ma, Qianqian Xu, Zhiyong Yang, Xiaochun Cao

    Abstract: In the absence of prior knowledge, ordinal embedding methods obtain new representation for items in a low-dimensional Euclidean space via a set of quadruple-wise comparisons. These ordinal comparisons often come from human annotators, and sufficient comparisons induce the success of classical approaches. However, collecting a large number of labeled data is known as a hard task, and most of the ex… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

    Comments: Accepted by AAAI 2019

  36. arXiv:1809.10962   

    cs.LG stat.ML

    Target-Independent Active Learning via Distribution-Splitting

    Authors: Xiaofeng Cao, Ivor W. Tsang, Xiaofeng Xu, Guandong Xu

    Abstract: To reduce the label complexity in Agnostic Active Learning (A^2 algorithm), volume-splitting splits the hypothesis edges to reduce the Vapnik-Chervonenkis (VC) dimension in version space. However, the effectiveness of volume-splitting critically depends on the initial hypothesis and this problem is also known as target-dependent label complexity gap. This paper attempts to minimize this gap by int… ▽ More

    Submitted 25 September, 2020; v1 submitted 28 September, 2018; originally announced September 2018.

    Comments: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updating

  37. arXiv:1809.07694  [pdf

    cs.SI stat.CO

    Improved Online Wilson Score Interval Method for Community Answer Quality Ranking

    Authors: Xin Cao

    Abstract: In this paper, a fast and easy-to-deploy method with a strong interpretability for community answer quality ranking is proposed. This method is improved based on the Wilson score interval method [Wilson, 1927], which retains its advantages and simultaneously improve the degree of satisfaction with the ranking of the high-quality answers. The improved answer quality score considers both Wilson scor… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

    Comments: 8 pages, 13 figures

  38. arXiv:1809.01804  [pdf, other

    cs.LG stat.ML

    Discovering Influential Factors in Variational Autoencoder

    Authors: Shiqi Liu, **gxin Liu, Qian Zhao, Xiangyong Cao, Huibin Li, Hongying Meng, Sheng Liu, Deyu Meng

    Abstract: In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervening or intuition assistance to extract useful knowledge or serve for the downstream tasks. In this work, we focus on supervising the influential factors extracted by the variational autoencoder(VAE). The VAE is proposed to learn independent low dimension repr… ▽ More

    Submitted 5 April, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: 15 pages, 8 figures

  39. arXiv:1807.11014  [pdf, other

    cs.LG cs.MM stat.ML

    A Margin-based MLE for Crowdsourced Partial Ranking

    Authors: Qianqian Xu, Jiechao Xiong, Xinwei Sun, Zhiyong Yang, Xiaochun Cao, Qingming Huang, Yuan Yao

    Abstract: A preference order or ranking aggregated from pairwise comparison data is commonly understood as a strict total order. However, in real-world scenarios, some items are intrinsically ambiguous in comparisons, which may very well be an inherent uncertainty of the data. In this case, the conventional total order ranking can not capture such uncertainty with mere global ranking or utility scores. In t… ▽ More

    Submitted 29 July, 2018; originally announced July 2018.

    Comments: 9 pages, Accepted by ACM Multimedia 2018 as a full paper

  40. arXiv:1807.08904   

    cs.LG stat.ML

    A Structured Perspective of Volumes on Active Learning

    Authors: Xiaofeng Cao, Ivor W. Tsang, Guandong Xu

    Abstract: Active Learning (AL) is a learning task that requires learners interactively query the labels of the sampled unlabeled instances to minimize the training outputs with human supervisions. In theoretical study, learners approximate the version space which covers all possible classification hypothesis into a bounded convex body and try to shrink the volume of it into a half-space by a given cut size.… ▽ More

    Submitted 25 September, 2020; v1 submitted 24 July, 2018; originally announced July 2018.

    Comments: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updating

  41. arXiv:1805.12321   

    cs.LG stat.ML

    A Divide-and-Conquer Approach to Geometric Sampling for Active Learning

    Authors: Xiaofeng Cao

    Abstract: Active learning (AL) repeatedly trains the classifier with the minimum labeling budget to improve the current classification model. The training process is usually supervised by an uncertainty evaluation strategy. However, the uncertainty evaluation always suffers from performance degeneration when the initial labeled set has insufficient labels. To completely eliminate the dependence on the uncer… ▽ More

    Submitted 25 September, 2020; v1 submitted 31 May, 2018; originally announced May 2018.

    Comments: This paper has been withdrawn. The first author quitted the PhD study from AAI, University of Technology Sydney. The manuscript stopped updating

  42. arXiv:1803.06711  [pdf, other

    stat.AP

    A Dynamic Additive and Multiplicative Effects Model with Application to the United Nations Voting Behaviors

    Authors: Bomin Kim, Xiaoyue Niu, David R. Hunter, Xun Cao

    Abstract: Motivated by a study of United Nations voting behaviors, we introduce a regression model for a series of networks that are correlated over time. Our model is a dynamic extension of the additive and multiplicative effects network model (AMEN) of Hoff (2019). In addition to incorporating a temporal structure, the model accommodates two types of missing data thus allows the size of the network to var… ▽ More

    Submitted 21 March, 2023; v1 submitted 18 March, 2018; originally announced March 2018.

  43. arXiv:1712.09520  [pdf, other

    cs.LG stat.ML

    Tensor Regression Networks with various Low-Rank Tensor Approximations

    Authors: Xingwei Cao, Guillaume Rabusseau

    Abstract: Tensor regression networks achieve high compression rate of neural networks while having slight impact on performances. They do so by imposing low tensor rank structure on the weight matrices of fully connected layers. In recent years, tensor regression networks have been investigated from the perspective of their compressive power, however, the regularization effect of enforcing low-rank tensor s… ▽ More

    Submitted 28 November, 2018; v1 submitted 27 December, 2017; originally announced December 2017.

  44. arXiv:1711.06446  [pdf, other

    stat.ML cs.IR cs.LG math.OC

    Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein Step Size

    Authors: Ke Ma, **shan Zeng, Jiechao Xiong, Qianqian Xu, Xiaochun Cao, Wei Liu, Yuan Yao

    Abstract: Learning representation from relative similarity comparisons, often called ordinal embedding, gains rising attention in recent years. Most of the existing methods are batch methods designed mainly based on the convex optimization, say, the projected gradient descent method. However, they are generally time-consuming due to that the singular value decomposition (SVD) is commonly adopted during the… ▽ More

    Submitted 30 January, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

    Comments: 11 pages, 3 figures, 2 tables, accepted by AAAI2018

    MSC Class: aaai.org

  45. arXiv:1710.10772  [pdf, other

    cs.LG cs.AI stat.ML

    Tensorizing Generative Adversarial Nets

    Authors: Xingwei Cao, Xuyang Zhao, Qibin Zhao

    Abstract: Generative Adversarial Network (GAN) and its variants exhibit state-of-the-art performance in the class of generative models. To capture higher-dimensional distributions, the common learning procedure requires high computational complexity and a large number of parameters. The problem of employing such massive framework arises when deploying it on a platform with limited computational power such a… ▽ More

    Submitted 29 March, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: 4 pages, 3 figures

  46. arXiv:1709.05583  [pdf, other

    cs.CR cs.LG stat.ML

    Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification

    Authors: Xiaoyu Cao, Neil Zhenqiang Gong

    Abstract: Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppose we have a testing example, whose label can be correctly predicted by a DNN classifier. An attacker… ▽ More

    Submitted 31 December, 2019; v1 submitted 16 September, 2017; originally announced September 2017.

    Comments: 33rd Annual Computer Security Applications Conference (ACSAC), 2017

  47. arXiv:1611.01205  [pdf, ps, other

    stat.ME math.ST

    Posterior Graph Selection and Estimation Consistency for High-dimensional Bayesian DAG Models

    Authors: Xuan Cao, Kshitij Khare, Malay Ghosh

    Abstract: Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independenc… ▽ More

    Submitted 11 October, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

    MSC Class: 62F15; 62G20

  48. arXiv:1605.05860  [pdf, other

    stat.ML

    False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking

    Authors: Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Yuan Yao

    Abstract: With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time. However due to the lack of control over the quality of the annotators, some abnormal annotators may be affected by position bias which can potentially degrade the quality of the final consensus labels. In this paper we introduce a stati… ▽ More

    Submitted 16 June, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

    Comments: ICML 2016 accepted

  49. arXiv:1408.3467  [pdf, other

    stat.ME cs.LG stat.ML

    Evaluating Visual Properties via Robust HodgeRank

    Authors: Qianqian Xu, Jiechao Xiong, Xiaochun Cao, Qingming Huang, Yuan Yao

    Abstract: Nowadays, how to effectively evaluate visual properties has become a popular topic for fine-grained visual comprehension. In this paper we study the problem of how to estimate such visual properties from a ranking perspective with the help of the annotators from online crowdsourcing platforms. The main challenges of our task are two-fold. On one hand, the annotations often contain contaminated inf… ▽ More

    Submitted 17 January, 2021; v1 submitted 15 August, 2014; originally announced August 2014.

    Comments: 25 pages, 24 figures

  50. arXiv:1405.1491  [pdf

    stat.CO

    Demonstration of Enhanced Monte Carlo Computation of the Fisher Information for Complex Problems

    Authors: Xumeng Cao

    Abstract: The Fisher information matrix summarizes the amount of information in a set of data relative to the quantities of interest. There are many applications of the information matrix in statistical modeling, system identification and parameter estimation. This short paper reviews a feedback-based method and an independent perturbation approach for computing the information matrix for complex problems,… ▽ More

    Submitted 6 May, 2014; originally announced May 2014.