Search | arXiv e-print repository

Efficient Nonparametric Tensor Decomposition for Binary and Count Data

Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao

Abstract: In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-l… ▽ More In numerous applications, binary reactions or event counts are observed and stored within high-order tensors. Tensor decompositions (TDs) serve as a powerful tool to handle such high-dimensional and sparse data. However, many traditional TDs are explicitly or implicitly designed based on the Gaussian distribution, which is unsuitable for discrete data. Moreover, most TDs rely on predefined multi-linear structures, such as CP and Tucker formats. Therefore, they may not be effective enough to handle complex real-world datasets. To address these issues, we propose ENTED, an \underline{E}fficient \underline{N}onparametric \underline{TE}nsor \underline{D}ecomposition for binary and count tensors. Specifically, we first employ a nonparametric Gaussian process (GP) to replace traditional multi-linear structures. Next, we utilize the \pg augmentation which provides a unified framework to establish conjugate models for binary and count distributions. Finally, to address the computational issue of GPs, we enhance the model by incorporating sparse orthogonal variational inference of inducing points, which offers a more effective covariance approximation within GPs and stochastic natural gradient updates for nonparametric models. We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: AAAI-24

arXiv:2306.00638 [pdf, other]

Byzantine-Robust Clustered Federated Learning

Authors: Zhixu Tao, Kun Yang, Sanjeev R. Kulkarni

Abstract: This paper focuses on the problem of adversarial attacks from Byzantine machines in a Federated Learning setting where non-Byzantine machines can be partitioned into disjoint clusters. In this setting, non-Byzantine machines in the same cluster have the same underlying data distribution, and different clusters of non-Byzantine machines have different learning tasks. Byzantine machines can adversar… ▽ More This paper focuses on the problem of adversarial attacks from Byzantine machines in a Federated Learning setting where non-Byzantine machines can be partitioned into disjoint clusters. In this setting, non-Byzantine machines in the same cluster have the same underlying data distribution, and different clusters of non-Byzantine machines have different learning tasks. Byzantine machines can adversarially attack any cluster and disturb the training process on clusters they attack. In the presence of Byzantine machines, the goal of our work is to identify cluster membership of non-Byzantine machines and optimize the models learned by each cluster. We adopt the Iterative Federated Clustering Algorithm (IFCA) framework of Ghosh et al. (2020) to alternatively estimate cluster membership and optimize models. In order to make this framework robust against adversarial attacks from Byzantine machines, we use coordinate-wise trimmed mean and coordinate-wise median aggregation methods used by Yin et al. (2018). Specifically, we propose a new Byzantine-Robust Iterative Federated Clustering Algorithm to improve on the results in Ghosh et al. (2019). We prove a convergence rate for this algorithm for strongly convex loss functions. We compare our convergence rate with the convergence rate of an existing algorithm, and we demonstrate the performance of our algorithm on simulated data. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2303.17482 [pdf]

Three-way causal attribute partial order structure analysis

Authors: Xue Zaifa, Lu Huibin, Zhang Tao, Li Tao, Lu Xin

Abstract: As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the mod… ▽ More As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the model. First, the concept of causal factor (CF) is proposed to evaluate the causal correlation between attributes and decision attributes in the formal decision context. Then, combining CF with attribute partial order structure, the concept of causal attribute partial order structure is defined and makes set coverage evolve into causal coverage. Finally, combined with the idea of three-way decision, 3WCAPOS is formed, which makes the purity of nodes in the structure clearer and the changes between levels more obviously. In addition, the experiments are carried out from the classification ability and the interpretability of the structure through the six datasets. Through these experiments, it is concluded the accuracy of 3WCAPOS is improved by 1% - 9% compared with classification and regression tree, and more interpretable and the processing of knowledge is more reasonable compared with attribute partial order structure. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2105.07767 [pdf, other]

Projections with logarithmic divergences

Authors: Zhixu Tao, Ting-Kam Leonard Wong

Abstract: In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is cl… ▽ More In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is closely related to the $\mathcal{F}^{(α)}$-family, a generalized exponential family introduced by the second author. Our main result constructs a dual foliation of the statistical manifold, i.e., an orthogonal decomposition consisting of primal and dual autoparallel submanifolds. This decomposition, which can be naturally interpreted in terms of primal and dual projections with respect to the logarithmic divergence, extends the dual foliation of a dually flat manifold studied by Amari. As an application, we formulate a new $L^{(α)}$-PCA problem which generalizes the exponential family PCA. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: 9 pages, 2 figures. To appear in GSI2021

arXiv:2008.02014 [pdf, other]

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Authors: Yijiang Lian, Zhijie Chen, Xin Pei, Shuang Li, Yifei Wang, Yuefeng Qiu, Zhiheng Zhang, Zhipeng Tao, Liang Yuan, Hanju Guan, Kefeng Zhang, Zhigang Li, Xiaochun Liu

Abstract: Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned… ▽ More Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue. △ Less

Submitted 5 August, 2020; originally announced August 2020.

arXiv:2006.15632 [pdf, other]

doi 10.1109/TII.2020.3005969

FDA3 : Federated Defense Against Adversarial Attacks for Cloud-Based IIoT Applications

Authors: Yunfei Song, Tian Liu, Tongquan Wei, Xiangfeng Wang, Zhe Tao, Mingsong Chen

Abstract: Along with the proliferation of Artificial Intelligence (AI) and Internet of Things (IoT) techniques, various kinds of adversarial attacks are increasingly emerging to fool Deep Neural Networks (DNNs) used by Industrial IoT (IIoT) applications. Due to biased training data or vulnerable underlying models, imperceptible modifications on inputs made by adversarial attacks may result in devastating co… ▽ More Along with the proliferation of Artificial Intelligence (AI) and Internet of Things (IoT) techniques, various kinds of adversarial attacks are increasingly emerging to fool Deep Neural Networks (DNNs) used by Industrial IoT (IIoT) applications. Due to biased training data or vulnerable underlying models, imperceptible modifications on inputs made by adversarial attacks may result in devastating consequences. Although existing methods are promising in defending such malicious attacks, most of them can only deal with limited existing attack types, which makes the deployment of large-scale IIoT devices a great challenge. To address this problem, we present an effective federated defense approach named FDA3 that can aggregate defense knowledge against adversarial examples from different sources. Inspired by federated learning, our proposed cloud-based architecture enables the sharing of defense capabilities against different attacks among IIoT devices. Comprehensive experimental results show that the generated DNNs by our approach can not only resist more malicious attacks than existing attack-specific adversarial training methods, but also can prevent IIoT applications from new attacks. △ Less

Submitted 28 June, 2020; originally announced June 2020.

Journal ref: IEEE Transactions on Industrial Informatics, 2020

arXiv:2004.04520 [pdf, other]

Learnable Subspace Clustering

Authors: Jun Li, Hongfu Liu, Zhiqiang Tao, Handong Zhao, Yun Fu

Abstract: This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in… ▽ More This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in a high time and space complexity. In this paper, we develop a learnable subspace clustering paradigm to efficiently solve the LSSC problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the expensive costs of the classical coding models. Moreover, we propose a unified robust predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. In addition, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale datasets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness. △ Less

Submitted 9 April, 2020; originally announced April 2020.

Comments: IEEE Transactions on Neural Networks and Learning Systems (accepted with minor revision)

arXiv:2001.00745 [pdf, other]

Automated Relational Meta-learning

Authors: Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Ruirui Li, Zhenhui Li

Abstract: In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted st… ▽ More In order to efficiently learn with small amount of data on new tasks, meta-learning transfers knowledge learned from previous tasks to the new ones. However, a critical challenge in meta-learning is the task heterogeneity which cannot be well handled by traditional globally shared meta-learning methods. In addition, current task-specific meta-learning methods may either suffer from hand-crafted structure design or lack the capability to capture complex relations between tasks. In this paper, motivated by the way of knowledge organization in knowledge bases, we propose an automated relational meta-learning (ARML) framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. When a new task arrives, it can quickly find the most relevant structure and tailor the learned structure knowledge to the meta-learner. As a result, the proposed framework not only addresses the challenge of task heterogeneity by a learned meta-knowledge graph, but also increases the model interpretability. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: Accepted by ICLR 2020

arXiv:1911.11561 [pdf, other]

Correlative Channel-Aware Fusion for Multi-View Time Series Classification

Authors: Yue Bai, Lichen Wang, Zhiqiang Tao, Sheng Li, Yun Fu

Abstract: Multi-view time series classification (MVTSC) aims to improve the performance by fusing the distinctive temporal information from multiple views. Existing methods mainly focus on fusing multi-view information at an early stage, e.g., by learning a common feature subspace among multiple views. However, these early fusion methods may not fully exploit the unique temporal patterns of each view in com… ▽ More Multi-view time series classification (MVTSC) aims to improve the performance by fusing the distinctive temporal information from multiple views. Existing methods mainly focus on fusing multi-view information at an early stage, e.g., by learning a common feature subspace among multiple views. However, these early fusion methods may not fully exploit the unique temporal patterns of each view in complicated time series. Moreover, the label correlations of multiple views, which are critical to boost-ing, are usually under-explored for the MVTSC problem. To address the aforementioned issues, we propose a Correlative Channel-Aware Fusion (C2AF) network. First, C2AF extracts comprehensive and robust temporal patterns by a two-stream structured encoder for each view, and captures the intra-view and inter-view label correlations with a graph-based correlation matrix. Second, a channel-aware learnable fusion mechanism is implemented through convolutional neural networks to further explore the global correlative patterns. These two steps are trained end-to-end in the proposed C2AF network. Extensive experimental results on three real-world datasets demonstrate the superiority of our approach over the state-of-the-art methods. A detailed ablation study is also provided to show the effectiveness of each model component. △ Less

Submitted 20 November, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

arXiv:1906.00120 [pdf, ps, other]

Consensus Clustering: An Embedding Perspective, Extension and Beyond

Authors: Hongfu Liu, Zhiqiang Tao, Zhengming Ding

Abstract: Consensus clustering fuses diverse basic partitions (i.e., clustering results obtained from conventional clustering methods) into an integrated one, which has attracted increasing attention in both academic and industrial areas due to its robust and effective performance. Tremendous research efforts have been made to thrive this domain in terms of algorithms and applications. Although there are so… ▽ More Consensus clustering fuses diverse basic partitions (i.e., clustering results obtained from conventional clustering methods) into an integrated one, which has attracted increasing attention in both academic and industrial areas due to its robust and effective performance. Tremendous research efforts have been made to thrive this domain in terms of algorithms and applications. Although there are some survey papers to summarize the existing literature, they neglect to explore the underlying connection among different categories. Differently, in this paper we aim to provide an embedding prospective to illustrate the consensus mechanism, which transfers categorical basic partitions to other representations (e.g., binary coding, spectral embedding, etc) for the clustering purpose. To this end, we not only unify two major categories of consensus clustering, but also build an intuitive connection between consensus clustering and graph embedding. Moreover, we elaborate several extensions of classical consensus clustering from different settings and problems. Beyond this, we demonstrate how to leverage consensus clustering to address other tasks, such as constrained clustering, domain adaptation, feature selection, and outlier detection. Finally, we conclude this survey with future work in terms of interpretability, learnability and theoretical analysis. △ Less

Submitted 31 May, 2019; originally announced June 2019.

Showing 1–10 of 10 results for author: Tao, Z