-
Leveraging Contrastive Learning for Enhanced Node Representations in Tokenized Graph Transformers
Authors:
**song Chen,
Hanpeng Liu,
John E. Hopcroft,
Kun He
Abstract:
While tokenized graph Transformers have demonstrated strong performance in node classification tasks, their reliance on a limited subset of nodes with high similarity scores for constructing token sequences overlooks valuable information from other nodes, hindering their ability to fully harness graph information for learning optimal node representations. To address this limitation, we propose a n…
▽ More
While tokenized graph Transformers have demonstrated strong performance in node classification tasks, their reliance on a limited subset of nodes with high similarity scores for constructing token sequences overlooks valuable information from other nodes, hindering their ability to fully harness graph information for learning optimal node representations. To address this limitation, we propose a novel graph Transformer called GCFormer. Unlike previous approaches, GCFormer develops a hybrid token generator to create two types of token sequences, positive and negative, to capture diverse graph information. And a tailored Transformer-based backbone is adopted to learn meaningful node representations from these generated token sequences. Additionally, GCFormer introduces contrastive learning to extract valuable information from both positive and negative token sequences, enhancing the quality of learned node representations. Extensive experimental results across various datasets, including homophily and heterophily graphs, demonstrate the superiority of GCFormer in node classification, when compared to representative graph neural networks (GNNs) and graph Transformers.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Diversified Node Sampling based Hierarchical Transformer Pooling for Graph Representation Learning
Authors:
Gaichao Li,
**song Chen,
John E. Hopcroft,
Kun He
Abstract:
Graph pooling methods have been widely used on downsampling graphs, achieving impressive results on multiple graph-level tasks like graph classification and graph generation. An important line called node drop** pooling aims at exploiting learnable scoring functions to drop nodes with comparatively lower significance scores. However, existing node drop** methods suffer from two limitations: (1…
▽ More
Graph pooling methods have been widely used on downsampling graphs, achieving impressive results on multiple graph-level tasks like graph classification and graph generation. An important line called node drop** pooling aims at exploiting learnable scoring functions to drop nodes with comparatively lower significance scores. However, existing node drop** methods suffer from two limitations: (1) for each pooled node, these models struggle to capture long-range dependencies since they mainly take GNNs as the backbones; (2) pooling only the highest-scoring nodes tends to preserve similar nodes, thus discarding the affluent information of low-scoring nodes. To address these issues, we propose a Graph Transformer Pooling method termed GTPool, which introduces Transformer to node drop** pooling to efficiently capture long-range pairwise interactions and meanwhile sample nodes diversely. Specifically, we design a scoring module based on the self-attention mechanism that takes both global context and local context into consideration, measuring the importance of nodes more comprehensively. GTPool further utilizes a diversified sampling method named Roulette Wheel Sampling (RWS) that is able to flexibly preserve nodes across different scoring intervals instead of only higher scoring nodes. In this way, GTPool could effectively obtain long-range information and select more representative nodes. Extensive experiments on 11 benchmark datasets demonstrate the superiority of GTPool over existing popular graph pooling methods.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
SignGT: Signed Attention-based Graph Transformer for Graph Representation Learning
Authors:
**song Chen,
Gaichao Li,
John E. Hopcroft,
Kun He
Abstract:
The emerging graph Transformers have achieved impressive performance for graph representation learning over graph neural networks (GNNs). In this work, we regard the self-attention mechanism, the core module of graph Transformers, as a two-step aggregation operation on a fully connected graph. Due to the property of generating positive attention values, the self-attention mechanism is equal to con…
▽ More
The emerging graph Transformers have achieved impressive performance for graph representation learning over graph neural networks (GNNs). In this work, we regard the self-attention mechanism, the core module of graph Transformers, as a two-step aggregation operation on a fully connected graph. Due to the property of generating positive attention values, the self-attention mechanism is equal to conducting a smooth operation on all nodes, preserving the low-frequency information. However, only capturing the low-frequency information is inefficient in learning complex relations of nodes on diverse graphs, such as heterophily graphs where the high-frequency information is crucial. To this end, we propose a Signed Attention-based Graph Transformer (SignGT) to adaptively capture various frequency information from the graphs. Specifically, SignGT develops a new signed self-attention mechanism (SignSA) that produces signed attention values according to the semantic relevance of node pairs. Hence, the diverse frequency information between different node pairs could be carefully preserved. Besides, SignGT proposes a structure-aware feed-forward network (SFFN) that introduces the neighborhood bias to preserve the local topology information. In this way, SignGT could learn informative node representations from both long-range dependencies and local topology information. Extensive empirical results on both node-level and graph-level tasks indicate the superiority of SignGT against state-of-the-art graph Transformers as well as advanced GNNs.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
PIAT: Parameter Interpolation based Adversarial Training for Image Classification
Authors:
Kun He,
Xin Liu,
Yichen Yang,
Zhou Qin,
Weigao Wen,
Hui Xue,
John E. Hopcroft
Abstract:
Adversarial training has been demonstrated to be the most effective approach to defend against adversarial attacks. However, existing adversarial training methods show apparent oscillations and overfitting issue in the training process, degrading the defense efficacy. In this work, we propose a novel framework, termed Parameter Interpolation based Adversarial Training (PIAT), that makes full use o…
▽ More
Adversarial training has been demonstrated to be the most effective approach to defend against adversarial attacks. However, existing adversarial training methods show apparent oscillations and overfitting issue in the training process, degrading the defense efficacy. In this work, we propose a novel framework, termed Parameter Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training. Specifically, at the end of each epoch, PIAT tunes the model parameters as the interpolation of the parameters of the previous and current epochs. Besides, we suggest to use the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the clean and adversarial examples. Compared with other regularization methods, NMSE focuses more on the relative magnitude of the logits rather than the absolute magnitude. Extensive experiments on several benchmark datasets and various networks show that our method could prominently improve the model robustness and reduce the generalization error. Moreover, our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Class-aware Information for Logit-based Knowledge Distillation
Authors:
Shuoxi Zhang,
Hanpeng Liu,
John E. Hopcroft,
Kun He
Abstract:
Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due to the cumbersome computation and storage of extra feature transformation, the training overhead of feature-based methods is much higher than that of logit-bas…
▽ More
Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due to the cumbersome computation and storage of extra feature transformation, the training overhead of feature-based methods is much higher than that of logit-based distillation. In this work, we revisit the logit-based knowledge distillation, and observe that the existing logit-based distillation methods treat the prediction logits only in the instance level, while many other useful semantic information is overlooked. To address this issue, we propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance. We further introduce a novel loss called Class Correlation Loss to force the student learn the inherent class-level correlation of the teacher. Empirical comparisons demonstrate the superiority of the proposed method over several prevailing logit-based methods and feature-based methods, in which CLKD achieves compelling results on various visual classification tasks and outperforms the state-of-the-art baselines.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
On the Complexity of Bayesian Generalization
Authors:
Yu-Zhe Shi,
Manjie Xu,
John E. Hopcroft,
Kun He,
Joshua B. Tenenbaum,
Song-Chun Zhu,
Ying Nian Wu,
Wenjuan Han,
Yixin Zhu
Abstract:
We consider concept generalization at a large scale in the diverse and natural visual spectrum. Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated and focus on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up, and the $complexity$ of concepts becomes diverse. Specifically, at the…
▽ More
We consider concept generalization at a large scale in the diverse and natural visual spectrum. Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated and focus on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up, and the $complexity$ of concepts becomes diverse. Specifically, at the $representational \ level$, we seek to answer how the complexity varies when a visual concept is mapped to the representation space. Prior psychology literature has shown that two types of complexities (i.e., subjective complexity and visual complexity) (Griffiths and Tenenbaum, 2003) build an inverted-U relation (Donderi, 2006; Sun and Firestone, 2021). Leveraging Representativeness of Attribute (RoA), we computationally confirm the following observation: Models use attributes with high RoA to describe visual concepts, and the description length falls in an inverted-U relation with the increment in visual complexity. At the $computational \ level$, we aim to answer how the complexity of representation affects the shift between the rule- and similarity-based generalization. We hypothesize that category-conditioned visual modeling estimates the co-occurrence frequency between visual and categorical attributes, thus potentially serving as the prior for the natural visual world. Experimental results show that representations with relatively high subjective complexity outperform those with relatively low subjective complexity in the rule-based generalization, while the trend is the opposite in the similarity-based generalization.
△ Less
Submitted 25 November, 2022; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Local Magnification for Data and Feature Augmentation
Authors:
Kun He,
Chang Liu,
Stephen Lin,
John E. Hopcroft
Abstract:
In recent years, many data augmentation techniques have been proposed to increase the diversity of input data and reduce the risk of overfitting on deep neural networks. In this work, we propose an easy-to-implement and model-free data augmentation method called Local Magnification (LOMA). Different from other geometric data augmentation methods that perform global transformations on images, LOMA…
▽ More
In recent years, many data augmentation techniques have been proposed to increase the diversity of input data and reduce the risk of overfitting on deep neural networks. In this work, we propose an easy-to-implement and model-free data augmentation method called Local Magnification (LOMA). Different from other geometric data augmentation methods that perform global transformations on images, LOMA generates additional training data by randomly magnifying a local area of the image. This local magnification results in geometric changes that significantly broaden the range of augmentations while maintaining the recognizability of objects. Moreover, we extend the idea of LOMA and random crop** to the feature space to augment the feature map, which further boosts the classification accuracy considerably. Experiments show that our proposed LOMA, though straightforward, can be combined with standard data augmentation to significantly improve the performance on image classification and object detection. And further combination with our feature augmentation techniques, termed LOMA_IF&FO, can continue to strengthen the model and outperform advanced intensity transformation methods for data augmentation.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
Authors:
Binghui Li,
Jikai **,
Han Zhong,
John E. Hopcroft,
Liwei Wang
Abstract:
It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon f…
▽ More
It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. This result holds even if the data is linear separable (which means achieving standard generalization is easy), and more generally for any parameterized function classes as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
△ Less
Submitted 14 October, 2022; v1 submitted 27 May, 2022;
originally announced May 2022.
-
HoSIM: Higher-order Structural Importance based Method for Multiple Local Community Detection
Authors:
Boyu Li,
Meng Wang,
John E. Hopcroft,
Kun He
Abstract:
Local community detection has attracted much research attention recently, and many methods have been proposed for the single local community detection that finds a community containing the given set of query nodes. However, nodes may belong to several communities in the network, and detecting all the communities for the query node set, termed as the multiple local community detection (MLCD), is mo…
▽ More
Local community detection has attracted much research attention recently, and many methods have been proposed for the single local community detection that finds a community containing the given set of query nodes. However, nodes may belong to several communities in the network, and detecting all the communities for the query node set, termed as the multiple local community detection (MLCD), is more important as it could uncover more potential information. MLCD is also more challenging because when a query node belongs to multiple communities, it always locates in the complicated overlap** region and the marginal region of communities. Accordingly, detecting multiple communities for such nodes by applying seed expansion methods is insufficient.
In this work, we address the MLCD based on higher-order structural importance (HoSI). First, to effectively estimate the influence of higher-order structures, we propose a new variant of random walk called Active Random Walk to measure the HoSI score between nodes. Then, we propose two new metrics to evaluate the HoSI score of a subgraph to a node and the HoSI score of a node, respectively. Based on the proposed metrics, we present a novel algorithm called HoSIM to detect multiple local communities for a single query node. HoSIM enforces a three-stage processing, namely subgraph sampling, core member identification, and local community detection. The key idea is utilizing HoSI to find and identify the core members of communities relevant to the query node and optimize the generated communities. Extensive experiments illustrate the effectiveness of HoSIM.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Uncovering the Local Hidden Community Structure in Social Networks
Authors:
Meng Wang,
Boyu Li,
Kun He,
John E. Hopcroft
Abstract:
Hidden community is a useful concept proposed recently for social network analysis. To handle the rapid growth of network scale, in this work, we explore the detection of hidden communities from the local perspective, and propose a new method that detects and boosts each layer iteratively on a subgraph sampled from the original network. We first expand the seed set from a single seed node based on…
▽ More
Hidden community is a useful concept proposed recently for social network analysis. To handle the rapid growth of network scale, in this work, we explore the detection of hidden communities from the local perspective, and propose a new method that detects and boosts each layer iteratively on a subgraph sampled from the original network. We first expand the seed set from a single seed node based on our modified local spectral method and detect an initial dominant local community. Then we temporarily remove the members of this community as well as their connections to other nodes, and detect all the neighborhood communities in the remaining subgraph, including some "broken communities" that only contain a fraction of members in the original network. The local community and neighborhood communities form a dominant layer, and by reducing the edge weights inside these communities, we weaken this layer's structure to reveal the hidden layers. Eventually, we repeat the whole process and all communities containing the seed node can be detected and boosted iteratively. We theoretically show that our method can avoid some situations that a broken community and the local community are regarded as one community in the subgraph, leading to the inaccuracy on detection which can be caused by global hidden community detection methods. Extensive experiments show that our method could significantly outperform the state-of-the-art baselines designed for either global hidden community detection or multiple local community detection.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability
Authors:
Yifeng Xiong,
Jiadong Lin,
Min Zhang,
John E. Hopcroft,
Kun He
Abstract:
The black-box adversarial attack has attracted impressive attention for its practical use in the field of deep learning security. Meanwhile, it is very challenging as there is no access to the network architecture or internal weights of the target model. Based on the hypothesis that if an example remains adversarial for multiple models, then it is more likely to transfer the attack capability to o…
▽ More
The black-box adversarial attack has attracted impressive attention for its practical use in the field of deep learning security. Meanwhile, it is very challenging as there is no access to the network architecture or internal weights of the target model. Based on the hypothesis that if an example remains adversarial for multiple models, then it is more likely to transfer the attack capability to other models, the ensemble-based adversarial attack methods are efficient and widely used for black-box attacks. However, ways of ensemble attack are rather less investigated, and existing ensemble attacks simply fuse the outputs of all the models evenly. In this work, we treat the iterative ensemble attack as a stochastic gradient descent optimization process, in which the variance of the gradients on different models may lead to poor local optima. To this end, we propose a novel attack method called the stochastic variance reduced ensemble (SVRE) attack, which could reduce the gradient variance of the ensemble models and take full advantage of the ensemble attack. Empirical results on the standard ImageNet dataset demonstrate that the proposed method could boost the adversarial transferability and outperforms existing ensemble attacks significantly. Code is available at https://github.com/JHL-HUST/SVRE.
△ Less
Submitted 23 April, 2022; v1 submitted 21 November, 2021;
originally announced November 2021.
-
Structure Amplification on Multi-layer Stochastic Block Models
Authors:
Xiaodong Xin,
Kun He,
Jialu Bao,
Bart Selman,
John E. Hopcroft
Abstract:
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our prev…
▽ More
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our previous work proposes a general structure amplification technique called HICODE that uncovers many layers of functional hidden structure in complex networks. HICODE incrementally weakens dominant structure through randomization allowing the hidden functionality to emerge, and uncovers these hidden structure in real-world networks that previous methods rarely uncover. In this work, we conduct a comprehensive and systematic theoretical analysis on the hidden community structure. In what follows, we define multi-layer stochastic block model, and provide theoretical support using the model on why the existence of hidden structure will make the detection of dominant structure harder compared with equivalent random noise. We then provide theoretical proofs that the iterative reducing methods could help promote the uncovering of hidden structure as well as boosting the detection quality of dominant structure.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Integrating Large Circular Kernels into CNNs through Neural Architecture Search
Authors:
Kun He,
Chao Li,
Yixiao Yang,
Gao Huang,
John E. Hopcroft
Abstract:
The square kernel is a standard unit for contemporary CNNs, as it fits well on the tensor computation for convolution operation. However, the retinal ganglion cells in the biological visual system have approximately concentric receptive fields. Motivated by this observation, we propose to use circular kernel with a concentric and isotropic receptive field as an option for the convolution operation…
▽ More
The square kernel is a standard unit for contemporary CNNs, as it fits well on the tensor computation for convolution operation. However, the retinal ganglion cells in the biological visual system have approximately concentric receptive fields. Motivated by this observation, we propose to use circular kernel with a concentric and isotropic receptive field as an option for the convolution operation. We first propose a simple yet efficient implementation of the convolution using circular kernels, and empirically show the significant advantages of large circular kernels over the counterpart square kernels. We then expand the operation space of several typical Neural Architecture Search (NAS) methods with the convolutions of large circular kernels. The searched new neural architectures do contain large circular kernels and outperform the original searched models considerably. Our additional analysis also reveals that large circular kernels could help the model to be more robust to the rotated or sheared images due to their better rotation invariance. Our work shows the potential of designing new convolutional kernels for CNNs, bringing up the prospect of expanding the search space of NAS with new variants of convolutions.
△ Less
Submitted 15 April, 2022; v1 submitted 6 July, 2021;
originally announced July 2021.
-
Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective
Authors:
Jialu Bao,
Kun He,
Xiaodong Xin,
Bart Selman,
John E. Hopcroft
Abstract:
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide n…
▽ More
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide no theoretical guarantee for the performance. In this work, we focus on the theoretical analysis of HICODE on synthetic two-layer networks, where layers are independent of each other and each layer is generated by stochastic block model. We bridge their gap through two-layer stochastic block model networks in the following aspects: 1) we show that partitions that locally optimize modularity correspond to grounded layers, indicating modularity-optimizing algorithms can detect strong layers; 2) we prove that when reducing found layers, HICODE increases absolute modularities of all unreduced layers, showing its layer reduction step makes weak layers more detectable. Our work builds a solid theoretical base for HICODE, demonstrating that it is promising in uncovering both weak and strong layers of communities in two-layer networks.
△ Less
Submitted 12 March, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Single Image Reflection Removal through Cascaded Refinement
Authors:
Chao Li,
Yixiao Yang,
Kun He,
Stephen Lin,
John E. Hopcroft
Abstract:
We address the problem of removing undesirable reflections from a single image captured through a glass surface, which is an ill-posed, challenging but practically important problem for photo enhancement. Inspired by iterative structure reduction for hidden community detection in social networks, we propose an Iterative Boost Convolutional LSTM Network (IBCLN) that enables cascaded prediction for…
▽ More
We address the problem of removing undesirable reflections from a single image captured through a glass surface, which is an ill-posed, challenging but practically important problem for photo enhancement. Inspired by iterative structure reduction for hidden community detection in social networks, we propose an Iterative Boost Convolutional LSTM Network (IBCLN) that enables cascaded prediction for reflection removal. IBCLN is a cascaded network that iteratively refines the estimates of transmission and reflection layers in a manner that they can boost the prediction quality to each other, and information across steps of the cascade is transferred using an LSTM. The intuition is that the transmission is the strong, dominant structure while the reflection is the weak, hidden structure. They are complementary to each other in a single image and thus a better estimate and reduction on one side from the original image leads to a more accurate estimate on the other side. To facilitate training over multiple cascade steps, we employ LSTM to address the vanishing gradient problem, and propose residual reconstruction loss as further training guidance. Besides, we create a dataset of real-world images with reflection and ground-truth transmission layers to mitigate the problem of insufficient data. Comprehensive experiments demonstrate that the proposed method can effectively remove reflections in real and synthetic images compared with state-of-the-art reflection removal methods.
△ Less
Submitted 5 April, 2020; v1 submitted 15 November, 2019;
originally announced November 2019.
-
Robust Local Features for Improving the Generalization of Adversarial Training
Authors:
Chuanbiao Song,
Kun He,
Jiadong Lin,
Liwei Wang,
John E. Hopcroft
Abstract:
Adversarial training has been demonstrated as one of the most effective methods for training robust models to defend against adversarial examples. However, adversarially trained models often lack adversarially robust generalization on unseen testing data. Recent works show that adversarially trained models are more biased towards global structure features. Instead, in this work, we would like to i…
▽ More
Adversarial training has been demonstrated as one of the most effective methods for training robust models to defend against adversarial examples. However, adversarially trained models often lack adversarially robust generalization on unseen testing data. Recent works show that adversarially trained models are more biased towards global structure features. Instead, in this work, we would like to investigate the relationship between the generalization of adversarial training and the robust local features, as the robust local features generalize well for unseen shape variation. To learn the robust local features, we develop a Random Block Shuffle (RBS) transformation to break up the global structure features on normal adversarial examples. We continue to propose a new approach called Robust Local Features for Adversarial Training (RLFAT), which first learns the robust local features by adversarial training on the RBS-transformed adversarial examples, and then transfers the robust local features into the training of normal adversarial examples. To demonstrate the generality of our argument, we implement RLFAT in currently state-of-the-art adversarial training frameworks. Extensive experiments on STL-10, CIFAR-10 and CIFAR-100 show that RLFAT significantly improves both the adversarially robust generalization and the standard generalization of adversarial training. Additionally, we demonstrate that our models capture more local features of the object on the images, aligning better with human perception.
△ Less
Submitted 2 February, 2020; v1 submitted 23 September, 2019;
originally announced September 2019.
-
Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks
Authors:
Jiadong Lin,
Chuanbiao Song,
Kun He,
Liwei Wang,
John E. Hopcroft
Abstract:
Deep learning models are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on benign inputs. However, under the black-box setting, most existing adversaries often have a poor transferability to attack other defense models. In this work, from the perspective of regarding the adversarial example generation as an optimization process, we propose two new methods…
▽ More
Deep learning models are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on benign inputs. However, under the black-box setting, most existing adversaries often have a poor transferability to attack other defense models. In this work, from the perspective of regarding the adversarial example generation as an optimization process, we propose two new methods to improve the transferability of adversarial examples, namely Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and Scale-Invariant attack Method (SIM). NI-FGSM aims to adapt Nesterov accelerated gradient into the iterative attacks so as to effectively look ahead and improve the transferability of adversarial examples. While SIM is based on our discovery on the scale-invariant property of deep learning models, for which we leverage to optimize the adversarial perturbations over the scale copies of the input images so as to avoid "overfitting" on the white-box model being attacked and generate more transferable adversarial examples. NI-FGSM and SIM can be naturally integrated to build a robust gradient-based attack to generate more transferable adversarial examples against the defense models. Empirical results on ImageNet dataset demonstrate that our attack methods exhibit higher transferability and achieve higher attack success rates than state-of-the-art gradient-based attacks.
△ Less
Submitted 2 February, 2020; v1 submitted 17 August, 2019;
originally announced August 2019.
-
A New Anchor Word Selection Method for the Separable Topic Discovery
Authors:
Kun He,
Wu Wang,
Xiaosen Wang,
John E. Hopcroft
Abstract:
Separable Non-negative Matrix Factorization (SNMF) is an important method for topic modeling, where "separable" assumes every topic contains at least one anchor word, defined as a word that has non-zero probability only on that topic. SNMF focuses on the word co-occurrence patterns to reveal topics by two steps: anchor word selection and topic recovery. The quality of the anchor words strongly inf…
▽ More
Separable Non-negative Matrix Factorization (SNMF) is an important method for topic modeling, where "separable" assumes every topic contains at least one anchor word, defined as a word that has non-zero probability only on that topic. SNMF focuses on the word co-occurrence patterns to reveal topics by two steps: anchor word selection and topic recovery. The quality of the anchor words strongly influences the quality of the extracted topics. Existing anchor word selection algorithm is to greedily find an approximate convex hull in a high-dimensional word co-occurrence space. In this work, we propose a new method for the anchor word selection by associating the word co-occurrence probability with the words similarity and assuming that the most different words on semantic are potential candidates for the anchor words. Therefore, if the similarity of a word-pair is very low, then the two words are very likely to be the anchor words. According to the statistical information of text corpora, we can get the similarity of all word-pairs. We build the word similarity graph where the nodes correspond to words and weights on edges stand for the word-pair similarity. Following this way, we design a greedy method to find a minimum edge-weight anchor clique of a given size in the graph for the anchor word selection. Extensive experiments on real-world corpus demonstrate the effectiveness of the proposed anchor word selection method that outperforms the common convex hull-based methods on the revealed topic quality. Meanwhile, our method is much faster than typical SNMF based method.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
AT-GAN: An Adversarial Generator Model for Non-constrained Adversarial Examples
Authors:
Xiaosen Wang,
Kun He,
Chuanbiao Song,
Liwei Wang,
John E. Hopcroft
Abstract:
Despite the rapid development of adversarial machine learning, most adversarial attack and defense researches mainly focus on the perturbation-based adversarial examples, which is constrained by the input images. In comparison with existing works, we propose non-constrained adversarial examples, which are generated entirely from scratch without any constraint on the input. Unlike perturbation-base…
▽ More
Despite the rapid development of adversarial machine learning, most adversarial attack and defense researches mainly focus on the perturbation-based adversarial examples, which is constrained by the input images. In comparison with existing works, we propose non-constrained adversarial examples, which are generated entirely from scratch without any constraint on the input. Unlike perturbation-based attacks, or the so-called unrestricted adversarial attack which is still constrained by the input noise, we aim to learn the distribution of adversarial examples to generate non-constrained but semantically meaningful adversarial examples. Following this spirit, we propose a novel attack framework called AT-GAN (Adversarial Transfer on Generative Adversarial Net). Specifically, we first develop a normal GAN model to learn the distribution of benign data, and then transfer the pre-trained GAN model to estimate the distribution of adversarial examples for the target model. In this way, AT-GAN can learn the distribution of adversarial examples that is very close to the distribution of real data. To our knowledge, this is the first work of building an adversarial generator model that could produce adversarial examples directly from any input noise. Extensive experiments and visualizations show that the proposed AT-GAN can very efficiently generate diverse adversarial examples that are more realistic to human perception. In addition, AT-GAN yields higher attack success rates against adversarially trained models under white-box attack setting and exhibits moderate transferability against black-box models.
△ Less
Submitted 7 February, 2020; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Adaptive Wavelet Clustering for Highly Noisy Data
Authors:
Zengjian Chen,
Jiayi Liu,
Yihe Deng,
Kun He,
John E. Hopcroft
Abstract:
In this paper we make progress on the unsupervised task of mining arbitrarily shaped clusters in highly noisy datasets, which is a task present in many real-world applications. Based on the fundamental work that first applies a wavelet transform to data clustering, we propose an adaptive clustering algorithm, denoted as AdaWave, which exhibits favorable characteristics for clustering. By a self-ad…
▽ More
In this paper we make progress on the unsupervised task of mining arbitrarily shaped clusters in highly noisy datasets, which is a task present in many real-world applications. Based on the fundamental work that first applies a wavelet transform to data clustering, we propose an adaptive clustering algorithm, denoted as AdaWave, which exhibits favorable characteristics for clustering. By a self-adaptive thresholding technique, AdaWave is parameter free and can handle data in various situations. It is deterministic, fast in linear time, order-insensitive, shape-insensitive, robust to highly noisy data, and requires no pre-knowledge on data models. Moreover, AdaWave inherits the ability from the wavelet transform to cluster data in different resolutions. We adopt the "grid labeling" data structure to drastically reduce the memory consumption of the wavelet transform so that AdaWave can be used for relatively high dimensional data. Experiments on synthetic as well as natural datasets demonstrate the effectiveness and efficiency of our proposed method.
△ Less
Submitted 7 January, 2019; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Improving the Generalization of Adversarial Training with Domain Adaptation
Authors:
Chuanbiao Song,
Kun He,
Liwei Wang,
John E. Hopcroft
Abstract:
By injecting adversarial examples into training data, adversarial training is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial attack. It may not provide sufficiently representative samples from the adversarial domain, leading to a weak generalization ability on adversarial examples fr…
▽ More
By injecting adversarial examples into training data, adversarial training is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial attack. It may not provide sufficiently representative samples from the adversarial domain, leading to a weak generalization ability on adversarial examples from other attacks. Moreover, during the adversarial training, adversarial perturbations on inputs are usually crafted by fast single-step adversaries so as to scale to large datasets. This work is mainly focused on the adversarial training yet efficient FGSM adversary. In this scenario, it is difficult to train a model with great generalization due to the lack of representative adversarial samples, aka the samples are unable to accurately reflect the adversarial domain. To alleviate this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to regard the adversarial training on FGSM adversary as a domain adaption task with limited number of target domain samples. The main idea is to learn a representation that is semantically meaningful and domain invariant on the clean domain as well as the adversarial domain. Empirical evaluations on Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets. To show the transfer ability of our method, we also extend ATDA to the adversarial training on iterative attacks such as PGD-Adversial Training (PAT) and the defense performance is improved considerably.
△ Less
Submitted 15 March, 2019; v1 submitted 1 October, 2018;
originally announced October 2018.
-
Curvature-based Comparison of Two Neural Networks
Authors:
Tao Yu,
Huan Long,
John E. Hopcroft
Abstract:
In this paper we show the similarities and differences of two deep neural networks by comparing the manifolds composed of activation vectors in each fully connected layer of them. The main contribution of this paper includes 1) a new data generating algorithm which is crucial for determining the dimension of manifolds; 2) a systematic strategy to compare manifolds. Especially, we take Riemann curv…
▽ More
In this paper we show the similarities and differences of two deep neural networks by comparing the manifolds composed of activation vectors in each fully connected layer of them. The main contribution of this paper includes 1) a new data generating algorithm which is crucial for determining the dimension of manifolds; 2) a systematic strategy to compare manifolds. Especially, we take Riemann curvature and sectional curvature as part of criterion, which can reflect the intrinsic geometric properties of manifolds. Some interesting results and phenomenon are given, which help in specifying the similarities and differences between the features extracted by two networks and demystifying the intrinsic mechanism of deep neural networks.
△ Less
Submitted 21 January, 2018;
originally announced January 2018.
-
Krylov Subspace Approximation for Local Community Detection in Large Networks
Authors:
Kun He,
Pan Shi,
David Bindel,
John E. Hopcroft
Abstract:
Community detection is an important information mining task to uncover modular structures in large networks. For increasingly common large network data sets, global community detection is prohibitively expensive, and attention has shifted to methods that mine local communities, i.e. identifying all latent members of a particular community from a few labeled seed members. To address such semi-super…
▽ More
Community detection is an important information mining task to uncover modular structures in large networks. For increasingly common large network data sets, global community detection is prohibitively expensive, and attention has shifted to methods that mine local communities, i.e. identifying all latent members of a particular community from a few labeled seed members. To address such semi-supervised mining task, we systematically develop a local spectral subspace-based community detection method, called LOSP. We define a family of local spectral subspaces based on Krylov subspaces, and seek a sparse indicator for the target community via an $\ell_1$ norm minimization over the Krylov subspace. Variants of LOSP depend on type of random walks with different diffusion speeds, type of random walks, dimension of the local spectral subspace and step of diffusions. The effectiveness of the proposed LOSP approach is theoretically analyzed based on Rayleigh quotients, and it is experimentally verified on a wide variety of real-world networks across social, production and biological domains, as well as on an extensive set of synthetic LFR benchmark datasets.
△ Less
Submitted 16 May, 2019; v1 submitted 13 December, 2017;
originally announced December 2017.
-
The Local Dimension of Deep Manifold
Authors:
Mengxiao Zhang,
Wangquan Wu,
Yanren Zhang,
Kun He,
Tao Yu,
Huan Long,
John E. Hopcroft
Abstract:
Based on our observation that there exists a dramatic drop for the singular values of the fully connected layers or a single feature map of the convolutional layer, and that the dimension of the concatenated feature vector almost equals the summation of the dimension on each feature map, we propose a singular value decomposition (SVD) based approach to estimate the dimension of the deep manifolds…
▽ More
Based on our observation that there exists a dramatic drop for the singular values of the fully connected layers or a single feature map of the convolutional layer, and that the dimension of the concatenated feature vector almost equals the summation of the dimension on each feature map, we propose a singular value decomposition (SVD) based approach to estimate the dimension of the deep manifolds for a typical convolutional neural network VGG19. We choose three categories from the ImageNet, namely Persian Cat, Container Ship and Volcano, and determine the local dimension of the deep manifolds of the deep layers through the tangent space of a target image. Through several augmentation methods, we found that the Gaussian noise method is closer to the intrinsic dimension, as by adding random noise to an image we are moving in an arbitrary dimension, and when the rank of the feature matrix of the augmented images does not increase we are very close to the local dimension of the manifold. We also estimate the dimension of the deep manifold based on the tangent space for each of the maxpooling layers. Our results show that the dimensions of different categories are close to each other and decline quickly along the convolutional layers and fully connected layers. Furthermore, we show that the dimensions decline quickly inside the Conv5 layer. Our work provides new insights for the intrinsic structure of deep neural networks and helps unveiling the inner organization of the black box of deep neural networks.
△ Less
Submitted 5 November, 2017;
originally announced November 2017.
-
Randomness in Deconvolutional Networks for Visual Representation
Authors:
Kun He,
**gbo Wang,
Haochuan Li,
Yao Shu,
Mengxiao Zhang,
Man Zhu,
Liwei Wang,
John E. Hopcroft
Abstract:
Toward a deeper understanding on the inner work of deep neural networks, we investigate CNN (convolutional neural network) using DCN (deconvolutional network) and randomization technique, and gain new insights for the intrinsic property of this network architecture. For the random representations of an untrained CNN, we train the corresponding DCN to reconstruct the input images. Compared with the…
▽ More
Toward a deeper understanding on the inner work of deep neural networks, we investigate CNN (convolutional neural network) using DCN (deconvolutional network) and randomization technique, and gain new insights for the intrinsic property of this network architecture. For the random representations of an untrained CNN, we train the corresponding DCN to reconstruct the input images. Compared with the image inversion on pre-trained CNN, our training converges faster and the yielding network exhibits higher quality for image reconstruction. It indicates there is rich information encoded in the random features; the pre-trained CNN may discard information irrelevant for classification and encode relevant features in a way favorable for classification but harder for reconstruction. We further explore the property of the overall random CNN-DCN architecture. Surprisingly, images can be inverted with satisfactory quality. Extensive empirical evidence as well as theoretical analysis are provided.
△ Less
Submitted 20 February, 2018; v1 submitted 2 April, 2017;
originally announced April 2017.
-
Snapshot Ensembles: Train 1, get M for free
Authors:
Gao Huang,
Yixuan Li,
Geoff Pleiss,
Zhuang Liu,
John E. Hopcroft,
Kilian Q. Weinberger
Abstract:
Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, c…
▽ More
Ensembles of neural networks are known to be much more robust and accurate than individual networks. However, training multiple deep networks for model averaging is computationally expensive. In this paper, we propose a method to obtain the seemingly contradictory goal of ensembling multiple neural networks at no additional training cost. We achieve this goal by training a single neural network, converging to several local minima along its optimization path and saving the model parameters. To obtain repeated rapid convergence, we leverage recent work on cyclic learning rate schedules. The resulting technique, which we refer to as Snapshot Ensembling, is simple, yet surprisingly effective. We show in a series of experiments that our approach is compatible with diverse network architectures and learning tasks. It consistently yields lower error rates than state-of-the-art single models at no additional training cost, and compares favorably with traditional network ensembles. On CIFAR-10 and CIFAR-100 our DenseNet Snapshot Ensembles obtain error rates of 3.4% and 17.4% respectively.
△ Less
Submitted 31 March, 2017;
originally announced April 2017.
-
Hidden Community Detection in Social Networks
Authors:
Kun He,
Yingru Li,
Sucheta Soundarajan,
John E. Hopcroft
Abstract:
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden…
▽ More
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden community structure.
We present a novel approach called HICODE (HIdden COmmunity DEtection) that identifies the hidden community structure as well as the dominant community structure. By weakening the strength of the dominant structure, one can uncover the hidden structure beneath. Likewise, by reducing the strength of the hidden structure, one can more accurately identify the dominant structure. In this way, HICODE tackles both tasks simultaneously.
Extensive experiments on real-world networks demonstrate that HICODE outperforms several state-of-the-art community detection methods in uncovering both the dominant and the hidden structure. In the Facebook university social networks, we find multiple non-redundant sets of communities that are strongly associated with residential hall, year of registration or career position of the faculties or students, while the state-of-the-art algorithms mainly locate the dominant ground truth category. In the Due to the difficulty of labeling all ground truth communities in real-world datasets, HICODE provides a promising approach to pinpoint the existing latent communities and uncover communities for which there is no ground truth. Finding this unknown structure is an extremely important community detection problem.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
Deep Manifold Traversal: Changing Labels with Convolutional Features
Authors:
Jacob R. Gardner,
Paul Upchurch,
Matt J. Kusner,
Yixuan Li,
Kilian Q. Weinberger,
Kavita Bala,
John E. Hopcroft
Abstract:
Many tasks in computer vision can be cast as a "label changing" problem, where the goal is to make a semantic change to the appearance of an image or some subject in an image in order to alter the class membership. Although successful task-specific methods have been developed for some label changing applications, to date no general purpose method exists. Motivated by this we propose deep manifold…
▽ More
Many tasks in computer vision can be cast as a "label changing" problem, where the goal is to make a semantic change to the appearance of an image or some subject in an image in order to alter the class membership. Although successful task-specific methods have been developed for some label changing applications, to date no general purpose method exists. Motivated by this we propose deep manifold traversal, a method that addresses the problem in its most general form: it first approximates the manifold of natural images then morphs a test image along a traversal path away from a source class and towards a target class while staying near the manifold throughout. The resulting algorithm is surprisingly effective and versatile. It is completely data driven, requiring only an example set of images from the desired source and target domains. We demonstrate deep manifold traversal on highly diverse label changing tasks: changing an individual's appearance (age and hair color), changing the season of an outdoor image, and transforming a city skyline towards nighttime.
△ Less
Submitted 17 March, 2016; v1 submitted 19 November, 2015;
originally announced November 2015.
-
Detecting Overlap** Communities from Local Spectral Subspaces
Authors:
Kun He,
Yiwei Sun,
David Bindel,
John E Hopcroft,
Yixuan Li
Abstract:
Based on the definition of local spectral subspace, we propose a novel approach called LOSP for local overlap** community detection. Instead of using the invariant subspace spanned by the dominant eigenvectors of the entire network, we run the power method for a few steps to approximate the leading eigenvectors that depict the embedding of the local neighborhood structure around seeds of interes…
▽ More
Based on the definition of local spectral subspace, we propose a novel approach called LOSP for local overlap** community detection. Instead of using the invariant subspace spanned by the dominant eigenvectors of the entire network, we run the power method for a few steps to approximate the leading eigenvectors that depict the embedding of the local neighborhood structure around seeds of interest. We then seek a sparse approximate indicator vector in the local spectral subspace spanned by these vectors such that the seeds are in its support.
We evaluate LOSP on five large real world networks across various domains with labeled ground-truth communities and compare the results with the state-of-the-art community detection approaches. LOSP identifies the members of a target community with high accuracy from very few seed members, and outperforms the local Heat Kernel or PageRank diffusions as well as the global baselines.
Two candidate definitions of the local spectral subspace are analyzed, and different community scoring functions for determining the community boundary, including two new metrics, are thoroughly evaluated. The structural properties of different seed sets and the impact of the seed set size are discussed. We observe low degree seeds behave better, and LOSP is robust even when started from a single random seed.
Using LOSP as a subroutine and starting from each ego connected component, we try the harder yet significant task of identifying all communities a single vertex is in. Experiments show that the proposed method achieves high F1 measures on the detected multiple local overlap** communities containing the seed vertex.
△ Less
Submitted 27 September, 2015;
originally announced September 2015.