-
Feature-based Learning for Diverse and Privacy-Preserving Counterfactual Explanations
Authors:
Vy Vo,
Trung Le,
Van Nguyen,
He Zhao,
Edwin Bonilla,
Gholamreza Haffari,
Dinh Phung
Abstract:
Interpretable machine learning seeks to understand the reasoning process of complex black-box systems that are long notorious for lack of explainability. One flourishing approach is through counterfactual explanations, which provide suggestions on what a user can do to alter an outcome. Not only must a counterfactual example counter the original prediction from the black-box classifier but it shou…
▽ More
Interpretable machine learning seeks to understand the reasoning process of complex black-box systems that are long notorious for lack of explainability. One flourishing approach is through counterfactual explanations, which provide suggestions on what a user can do to alter an outcome. Not only must a counterfactual example counter the original prediction from the black-box classifier but it should also satisfy various constraints for practical applications. Diversity is one of the critical constraints that however remains less discussed. While diverse counterfactuals are ideal, it is computationally challenging to simultaneously address some other constraints. Furthermore, there is a growing privacy concern over the released counterfactual data. To this end, we propose a feature-based learning framework that effectively handles the counterfactual constraints and contributes itself to the limited pool of private explanation models. We demonstrate the flexibility and effectiveness of our method in generating diverse counterfactuals of actionability and plausibility. Our counterfactual engine is more efficient than counterparts of the same capacity while yielding the lowest re-identification risks.
△ Less
Submitted 31 May, 2023; v1 submitted 27 September, 2022;
originally announced September 2022.
-
Statement-Level Vulnerability Detection: Learning Vulnerability Patterns Through Information Theory and Contrastive Learning
Authors:
Van Nguyen,
Trung Le,
Chakkrit Tantithamthavorn,
Michael Fu,
John Grundy,
Hung Nguyen,
Seyit Camtepe,
Paul Quirk,
Dinh Phung
Abstract:
Software vulnerabilities are a serious and crucial concern. Typically, in a program or function consisting of hundreds or thousands of source code statements, there are only a few statements causing the corresponding vulnerabilities. Most current approaches to vulnerability labelling are done on a function or program level by experts with the assistance of machine learning tools. Extending this ap…
▽ More
Software vulnerabilities are a serious and crucial concern. Typically, in a program or function consisting of hundreds or thousands of source code statements, there are only a few statements causing the corresponding vulnerabilities. Most current approaches to vulnerability labelling are done on a function or program level by experts with the assistance of machine learning tools. Extending this approach to the code statement level is much more costly and time-consuming and remains an open problem. In this paper, we propose a novel end-to-end deep learning-based approach to identify the vulnerability-relevant code statements of a specific function. Inspired by the specific structures observed in real-world vulnerable code, we first leverage mutual information for learning a set of latent variables representing the relevance of the source code statements to the corresponding function's vulnerability. We then propose novel clustered spatial contrastive learning in order to further improve the representation learning and the robust selection process of vulnerability-relevant code statements. Experimental results on real-world datasets of 200k+ C/C++ functions show the superiority of our method over other state-of-the-art baselines. In general, our method obtains a higher performance in VCP, VCA, and Top-10 ACC measures of between 3% to 14% over the baselines when running on real-world datasets in an unsupervised setting. Our released source code samples are publicly available at \href{https://github.com/vannguyennd/livuitcl}{https://github.com/vannguyennd/livuitcl.}
△ Less
Submitted 11 June, 2024; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Cross Project Software Vulnerability Detection via Domain Adaptation and Max-Margin Principle
Authors:
Van Nguyen,
Trung Le,
Chakkrit Tantithamthavorn,
John Grundy,
Hung Nguyen,
Dinh Phung
Abstract:
Software vulnerabilities (SVs) have become a common, serious and crucial concern due to the ubiquity of computer software. Many machine learning-based approaches have been proposed to solve the software vulnerability detection (SVD) problem. However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SV…
▽ More
Software vulnerabilities (SVs) have become a common, serious and crucial concern due to the ubiquity of computer software. Many machine learning-based approaches have been proposed to solve the software vulnerability detection (SVD) problem. However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerabilities datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel end-to-end approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for software vulnerability detection. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of software vulnerabilities from labeled projects into unlabeled ones. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, the most important measure in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets. Our released source code samples are publicly available at https://github.com/vannguyennd/dam2p
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
Authors:
Chuanxia Zheng,
Long Tung Vuong,
Jianfei Cai,
Dinh Phung
Abstract:
Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalizatio…
▽ More
Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Stag hunt game-based approach for cooperative UAVs
Authors:
L. V. Nguyen,
I. Torres Herrera,
T. H. Le,
M. D. Phung,
R. P. Aguilera,
Q. P. Ha
Abstract:
Unmanned aerial vehicles (UAVs) are being employed in many areas such as photography, emergency, entertainment, defence, agriculture, forestry, mining and construction. Over the last decade, UAV technology has found applications in numerous construction project phases, ranging from site map**, progress monitoring, building inspection, damage assessments, and material delivery. While extensive st…
▽ More
Unmanned aerial vehicles (UAVs) are being employed in many areas such as photography, emergency, entertainment, defence, agriculture, forestry, mining and construction. Over the last decade, UAV technology has found applications in numerous construction project phases, ranging from site map**, progress monitoring, building inspection, damage assessments, and material delivery. While extensive studies have been conducted on the advantages of UAVs for various construction-related processes, studies on UAV collaboration to improve the task capacity and efficiency are still scarce. This paper proposes a new cooperative path planning algorithm for multiple UAVs based on the stag hunt game and particle swarm optimization (PSO). First, a cost function for each UAV is defined, incorporating multiple objectives and constraints. The UAV game framework is then developed to formulate the multi-UAV path planning into the problem of finding payoff-dominant equilibrium. Next, a PSO-based algorithm is proposed to obtain optimal paths for the UAVs. Simulation results for a large construction site inspected by three UAVs indicate the effectiveness of the proposed algorithm in generating feasible and efficient flight paths for UAV formation during the inspection task.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
Quasi-resonant diffusion of wave packets in one-dimensional disordered mosaic lattices
Authors:
Ba Phi Nguyen,
Duy Khuong Phung,
Kihong Kim
Abstract:
We investigate numerically the time evolution of wave packets incident on one-dimensional semi-infinite lattices with mosaic modulated random on-site potentials, which are characterized by the integer-valued modulation period $κ$ and the disorder strength $W$. For Gaussian wave packets with the central energy $E_0$ and a small spectral width, we perform extensive numerical calculations of the diso…
▽ More
We investigate numerically the time evolution of wave packets incident on one-dimensional semi-infinite lattices with mosaic modulated random on-site potentials, which are characterized by the integer-valued modulation period $κ$ and the disorder strength $W$. For Gaussian wave packets with the central energy $E_0$ and a small spectral width, we perform extensive numerical calculations of the disorder-averaged time-dependent reflectance, $\langle R(t)\rangle$, for various values of $E_0$, $κ$, and $W$. We find that the long-time behavior of $\langle R(t)\rangle$ obeys a power law of the form $t^{-γ}$ in all cases. In the presence of the mosaic modulation, $γ$ is equal to 2 for almost all values of $E_0$, implying the onset of the Anderson localization, while at a finite number of discrete values of $E_0$ dependent on $κ$, $γ$ approaches 3/2, implying the onset of the classical diffusion. This phenomenon is independent of the disorder strength and arises in a quasi-resonant manner such that $γ$ varies rapidly from 3/2 to 2 in a narrow energy range as $E_0$ varies away from the quasi-resonance values. We deduce a simple analytical formula for the quasi-resonance energies and provide an explanation of the delocalization phenomenon based on the interplay between randomness and band structure and the node structure of the wave functions. We explore the nature of the states at the quasi-resonance energies using a finite-size scaling analysis of the average participation ratio and find that the states are neither extended nor exponentially localized, but ciritical states..
△ Less
Submitted 15 September, 2022; v1 submitted 28 July, 2022;
originally announced July 2022.
-
An Additive Instance-Wise Approach to Multi-class Model Interpretation
Authors:
Vy Vo,
Van Nguyen,
Trung Le,
Quan Hung Tran,
Gholamreza Haffari,
Seyit Camtepe,
Dinh Phung
Abstract:
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an addi…
▽ More
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.
△ Less
Submitted 9 February, 2023; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Classification of EEG Motor Imagery Using Deep Learning for Brain-Computer Interface Systems
Authors:
Alessandro Gallo,
Manh Duong Phung
Abstract:
A trained T1 class Convolutional Neural Network (CNN) model will be used to examine its ability to successfully identify motor imagery when fed pre-processed electroencephalography (EEG) data. In theory, and if the model has been trained accurately, it should be able to identify a class and label it accordingly. The CNN model will then be restored and used to try and identify the same class of mot…
▽ More
A trained T1 class Convolutional Neural Network (CNN) model will be used to examine its ability to successfully identify motor imagery when fed pre-processed electroencephalography (EEG) data. In theory, and if the model has been trained accurately, it should be able to identify a class and label it accordingly. The CNN model will then be restored and used to try and identify the same class of motor imagery data using much smaller sampled data in an attempt to simulate live data.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Stochastic Multiple Target Sampling Gradient Descent
Authors:
Hoang Phan,
Ngoc Tran,
Trung Le,
Toan Tran,
Nhat Ho,
Dinh Phung
Abstract:
Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of particles to approximate the distribution of interest. Furthermore, when analysing its asymptotic properties, SVGD reduces exactly to a single-objective optimiz…
▽ More
Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of particles to approximate the distribution of interest. Furthermore, when analysing its asymptotic properties, SVGD reduces exactly to a single-objective optimization problem and can be viewed as a probabilistic version of this single-objective optimization problem. A natural question then arises: "Can we derive a probabilistic version of the multi-objective optimization?". To answer this question, we propose Stochastic Multiple Target Sampling Gradient Descent (MT-SGD), enabling us to sample from multiple unnormalized target distributions. Specifically, our MT-SGD conducts a flow of intermediate distributions gradually orienting to multiple target distributions, which allows the sampled particles to move to the joint high-likelihood region of the target distributions. Interestingly, the asymptotic analysis shows that our approach reduces exactly to the multiple-gradient descent algorithm for multi-objective optimization, as expected. Finally, we conduct comprehensive experiments to demonstrate the merit of our approach to multi-task learning.
△ Less
Submitted 10 February, 2023; v1 submitted 4 June, 2022;
originally announced June 2022.
-
Enhanced Teaching-Learning-based Optimization for 3D Path Planning of Multicopter UAVs
Authors:
Van Truong Hoang,
Manh Duong Phung
Abstract:
This paper introduces a new path planning algorithm for unmanned aerial vehicles (UAVs) based on the teaching-learning-based optimization (TLBO) technique. We first define an objective function that incorporates requirements on the path length and constraints on the movement and safe operation of UAVs to convert the path planning into an optimization problem. The optimization algorithm named Multi…
▽ More
This paper introduces a new path planning algorithm for unmanned aerial vehicles (UAVs) based on the teaching-learning-based optimization (TLBO) technique. We first define an objective function that incorporates requirements on the path length and constraints on the movement and safe operation of UAVs to convert the path planning into an optimization problem. The optimization algorithm named Multi-subject TLBO is then proposed to minimize the formulated objective function. The algorithm is developed based on TLBO but enhanced with new operations including mutation, elite selection and multi-subject training to improve the solution quality and speed up the convergence rate. Comparison with state-of-the-art algorithms and experiments with real UAVs have been conducted to evaluate the performance of the proposed algorithm. The results confirm its validity and effectiveness in generating optimal, collision-free and flyable paths for UAVs in complex operating environments.
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
High-Quality Pluralistic Image Completion via Code Shared VQGAN
Authors:
Chuanxia Zheng,
Guoxian Song,
Tat-Jen Cham,
Jianfei Cai,
Dinh Phung,
Linjie Luo
Abstract:
PICNet pioneered the generation of multiple and diverse results for image completion task, but it required a careful balance between $\mathcal{KL}$ loss (diversity) and reconstruction loss (quality), resulting in a limited diversity and quality . Separately, iGPT-based architecture has been employed to infer distributions in a discrete space derived from a pixel-level pre-clustered palette, which…
▽ More
PICNet pioneered the generation of multiple and diverse results for image completion task, but it required a careful balance between $\mathcal{KL}$ loss (diversity) and reconstruction loss (quality), resulting in a limited diversity and quality . Separately, iGPT-based architecture has been employed to infer distributions in a discrete space derived from a pixel-level pre-clustered palette, which however cannot generate high-quality results directly. In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed. The core of our design lies in a simple yet effective code sharing mechanism that leads to a very compact yet expressive image representation in a discrete latent domain. The compactness and the richness of the representation further facilitate the subsequent deployment of a transformer to effectively learn how to composite and complete a masked image at the discrete code domain. Based on the global context well-captured by the transformer and the available visual regions, we are able to sample all tokens simultaneously, which is completely different from the prevailing autoregressive approach of iGPT-based works, and leads to more than 100$\times$ faster inference speed. Experiments show that our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality. Our diverse image completion framework significantly outperforms the state-of-the-art both quantitatively and qualitatively on multiple benchmark datasets.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Global-Local Regularization Via Distributional Robustness
Authors:
Hoang Phan,
Trung Le,
Trung Phung,
Tuan Anh Bui,
Nhat Ho,
Dinh Phung
Abstract:
Despite superior performance in many situations, deep neural networks are often vulnerable to adversarial examples and distribution shifts, limiting model generalization ability in real-world applications. To alleviate these problems, recent approaches leverage distributional robustness optimization (DRO) to find the most challenging distribution, and then minimize loss function over this most cha…
▽ More
Despite superior performance in many situations, deep neural networks are often vulnerable to adversarial examples and distribution shifts, limiting model generalization ability in real-world applications. To alleviate these problems, recent approaches leverage distributional robustness optimization (DRO) to find the most challenging distribution, and then minimize loss function over this most challenging distribution. Regardless of achieving some improvements, these DRO approaches have some obvious limitations. First, they purely focus on local regularization to strengthen model robustness, missing a global regularization effect which is useful in many real-world applications (e.g., domain adaptation, domain generalization, and adversarial machine learning). Second, the loss functions in the existing DRO approaches operate in only the most challenging distribution, hence decouple with the original distribution, leading to a restrictive modeling capability. In this paper, we propose a novel regularization technique, following the veins of Wasserstein-based DRO framework. Specifically, we define a particular joint distribution and Wasserstein-based uncertainty, allowing us to couple the original and most challenging distributions for enhancing modeling capability and applying both local and global regularizations. Empirical studies on different learning problems demonstrate that our proposed approach significantly outperforms the existing regularization approaches in various domains: semi-supervised learning, domain adaptation, domain generalization, and adversarial machine learning.
△ Less
Submitted 12 February, 2023; v1 submitted 1 March, 2022;
originally announced March 2022.
-
A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
Authors:
Tuan Anh Bui,
Trung Le,
Quan Tran,
He Zhao,
Dinh Phung
Abstract:
It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and T…
▽ More
It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to "probe" the vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics
Authors:
Tam Le,
Truyen Nguyen,
Dinh Phung,
Viet Anh Nguyen
Abstract:
Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport me…
▽ More
Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport metric yields a closed-form formula for fast computation and it is negative definite. We show that the space of probability measures endowed with this transport distance is isometric to a bounded convex set in a Euclidean space with a weighted $\ell_p$ distance. We further exploit the negative definiteness of the Sobolev transport to design positive-definite kernels, and evaluate their performances against other baselines in document classification with word embeddings and in topological data analysis.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Two-view Graph Neural Networks for Knowledge Graph Completion
Authors:
Vinh Tong,
Dai Quoc Nguyen,
Dinh Phung,
Dat Quoc Nguyen
Abstract:
We present an effective graph neural network (GNN)-based knowledge graph embedding model, which we name WGE, to capture entity- and relation-focused graph structures. Given a knowledge graph, WGE builds a single undirected entity-focused graph that views entities as nodes. WGE also constructs another single undirected graph from relation-focused constraints, which views entities and relations as n…
▽ More
We present an effective graph neural network (GNN)-based knowledge graph embedding model, which we name WGE, to capture entity- and relation-focused graph structures. Given a knowledge graph, WGE builds a single undirected entity-focused graph that views entities as nodes. WGE also constructs another single undirected graph from relation-focused constraints, which views entities and relations as nodes. WGE then proposes a GNN-based architecture to better learn vector representations of entities and relations from these two single entity- and relation-focused graphs. WGE feeds the learned entity and relation representations into a weighted score function to return the triple scores for knowledge graph completion. Experimental results show that WGE outperforms strong baselines on seven benchmark datasets for knowledge graph completion.
△ Less
Submitted 11 March, 2023; v1 submitted 16 December, 2021;
originally announced December 2021.
-
On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources
Authors:
Trung Phung,
Trung Le,
Long Vuong,
Toan Tran,
Anh Tran,
Hung Bui,
Dinh Phung
Abstract:
Domain adaptation (DA) benefits from the rigorous theoretical works that study its insightful characteristics and various aspects, e.g., learning domain-invariant representations and its trade-off. However, it seems not the case for the multiple source DA and domain generalization (DG) settings which are remarkably more complicated and sophisticated due to the involvement of multiple source domain…
▽ More
Domain adaptation (DA) benefits from the rigorous theoretical works that study its insightful characteristics and various aspects, e.g., learning domain-invariant representations and its trade-off. However, it seems not the case for the multiple source DA and domain generalization (DG) settings which are remarkably more complicated and sophisticated due to the involvement of multiple source domains and potential unavailability of target domain during training. In this paper, we develop novel upper-bounds for the target general loss which appeal to us to define two kinds of domain-invariant representations. We further study the pros and cons as well as the trade-offs of enforcing learning each domain-invariant representation. Finally, we conduct experiments to inspect the trade-off of these representations for offering practical hints regarding how to use them in practice and explore other interesting properties of our developed theory.
△ Less
Submitted 27 November, 2021;
originally announced November 2021.
-
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks
Authors:
Dang Nguyen,
Trang Nguyen,
Khai Nguyen,
Dinh Phung,
Hung Bui,
Nhat Ho
Abstract:
Layer-wise model fusion via optimal transport, named OTFusion, applies soft neuron association for unifying different pre-trained networks to save computational resources. While enjoying its success, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different numb…
▽ More
Layer-wise model fusion via optimal transport, named OTFusion, applies soft neuron association for unifying different pre-trained networks to save computational resources. While enjoying its success, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different number of layers, which we refer to as heterogeneous neural networks, via cross-layer alignment. The cross-layer alignment problem, which is an unbalanced assignment problem, can be solved efficiently using dynamic programming. Based on the cross-layer alignment, our framework balances the number of layers of neural networks before applying layer-wise model fusion. Our experiments indicate that CLAFusion, with an extra finetuning process, improves the accuracy of residual networks on the CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Furthermore, we explore its practical usage for model compression and knowledge distillation when applying to the teacher-student setting.
△ Less
Submitted 19 February, 2023; v1 submitted 29 October, 2021;
originally announced October 2021.
-
On Label Shift in Domain Adaptation via Wasserstein Distance
Authors:
Trung Le,
Dat Do,
Tuan Nguyen,
Huy Nguyen,
Hung Bui,
Nhat Ho,
Dinh Phung
Abstract:
We study the label shift problem between the source and target domains in general domain adaptation (DA) settings. We consider transformations transporting the target to source domains, which enable us to align the source and target examples. Through those transformations, we define the label shift between two domains via optimal transport and develop theory to investigate the properties of DA und…
▽ More
We study the label shift problem between the source and target domains in general domain adaptation (DA) settings. We consider transformations transporting the target to source domains, which enable us to align the source and target examples. Through those transformations, we define the label shift between two domains via optimal transport and develop theory to investigate the properties of DA under various DA settings (e.g., closed-set, partial-set, open-set, and universal settings). Inspired from the developed theory, we propose Label and Data Shift Reduction via Optimal Transport (LDROT) which can mitigate the data and label shifts simultaneously. Finally, we conduct comprehensive experiments to verify our theoretical findings and compare LDROT with state-of-the-art baselines.
△ Less
Submitted 1 March, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Exploiting Domain-Specific Features to Enhance Domain Generalization
Authors:
Manh-Ha Bui,
Toan Tran,
Anh Tuan Tran,
Dinh Phung
Abstract:
Domain Generalization (DG) aims to train a model, from multiple observed source domains, in order to perform well on unseen target domains. To obtain the generalization capability, prior DG approaches have focused on extracting domain-invariant information across sources to generalize on target domains, while useful domain-specific information which strongly correlates with labels in individual do…
▽ More
Domain Generalization (DG) aims to train a model, from multiple observed source domains, in order to perform well on unseen target domains. To obtain the generalization capability, prior DG approaches have focused on extracting domain-invariant information across sources to generalize on target domains, while useful domain-specific information which strongly correlates with labels in individual domains and the generalization to target domains is usually ignored. In this paper, we propose meta-Domain Specific-Domain Invariant (mDSDI) - a novel theoretically sound framework that extends beyond the invariance view to further capture the usefulness of domain-specific information. Our key insight is to disentangle features in the latent space while jointly learning both domain-invariant and domain-specific features in a unified framework. The domain-specific representation is optimized through the meta-learning framework to adapt from source domains, targeting a robust generalization on unseen domains. We empirically show that mDSDI provides competitive results with state-of-the-art techniques in DG. A further ablation study with our generated dataset, Background-Colored-MNIST, confirms the hypothesis that domain-specific is essential, leading to better results when compared with only using domain-invariant.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection
Authors:
Van-Anh Nguyen,
Dai Quoc Nguyen,
Van Nguyen,
Trung Le,
Quan Hung Tran,
Dinh Phung
Abstract:
Identifying vulnerabilities in the source code is essential to protect the software systems from cyber security attacks. It, however, is also a challenging step that requires specialized expertise in security and code representation. To this end, we aim to develop a general, practical, and programming language-independent model capable of running on various source codes and libraries without diffi…
▽ More
Identifying vulnerabilities in the source code is essential to protect the software systems from cyber security attacks. It, however, is also a challenging step that requires specialized expertise in security and code representation. To this end, we aim to develop a general, practical, and programming language-independent model capable of running on various source codes and libraries without difficulty. Therefore, we consider vulnerability detection as an inductive text classification problem and propose ReGVD, a simple yet effective graph neural network-based model for the problem. In particular, ReGVD views each raw source code as a flat sequence of tokens to build a graph, wherein node features are initialized by only the token embedding layer of a pre-trained programming language (PL) model. ReGVD then leverages residual connection among GNN layers and examines a mixture of graph-level sum and max poolings to return a graph embedding for the source code. ReGVD outperforms the existing state-of-the-art models and obtains the highest accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection. Our code is available at: \url{https://github.com/daiquocnguyen/GNN-ReGVD}.
△ Less
Submitted 4 February, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
Authors:
Thuy-Trang Vu,
Xuanli He,
Dinh Phung,
Gholamreza Haffari
Abstract:
This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT), where we assume the access to only monolingual text in either the source or target language in the new domain. We propose a cross-lingual data selection method to extract in-domain sentences in the missing language side from a large generic monolingual corpus. Our proposed method trains an adaptiv…
▽ More
This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT), where we assume the access to only monolingual text in either the source or target language in the new domain. We propose a cross-lingual data selection method to extract in-domain sentences in the missing language side from a large generic monolingual corpus. Our proposed method trains an adaptive layer on top of multilingual BERT by contrastive learning to align the representation between the source and target language. This then enables the transferability of the domain classifier between the languages in a zero-shot manner. Once the in-domain data is detected by the classifier, the NMT model is then adapted to the new domain by jointly learning translation and domain discrimination tasks. We evaluate our cross-lingual data selection method on NMT across five diverse domains in three language pairs, as well as a real-world scenario of translation for COVID-19. The results show that our proposed method outperforms other selection baselines up to +1.5 BLEU score.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Exponential decay toward equilibrium via log convexity for a degenerate reaction-diffusion system
Authors:
Laurent Desvillettes,
Kim Dang Phung
Abstract:
We consider a system of two reaction-diffusion equations coming out of reversible chemistry. When the reaction happens on the totality of the domain, it is known that exponential convergence to equilibrium holds. We show in this paper that this exponential convergence also holds when the reaction holds only on a given open set of a ball, thanks to an observation estimate deduced by logarithmic con…
▽ More
We consider a system of two reaction-diffusion equations coming out of reversible chemistry. When the reaction happens on the totality of the domain, it is known that exponential convergence to equilibrium holds. We show in this paper that this exponential convergence also holds when the reaction holds only on a given open set of a ball, thanks to an observation estimate deduced by logarithmic convexity.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Rapid Elastic Architecture Search under Specialized Classes and Resource Constraints
Authors:
**g Liu,
Bohan Zhuang,
Mingkui Tan,
Xu Liu,
Dinh Phung,
Yuanqing Li,
Jianfei Cai
Abstract:
In many real-world applications, we often need to handle various deployment scenarios, where the resource constraint and the superclass of interest corresponding to a group of classes are dynamically specified. How to efficiently deploy deep models for diverse deployment scenarios is a new challenge. Previous NAS approaches seek to design architectures for all classes simultaneously, which may not…
▽ More
In many real-world applications, we often need to handle various deployment scenarios, where the resource constraint and the superclass of interest corresponding to a group of classes are dynamically specified. How to efficiently deploy deep models for diverse deployment scenarios is a new challenge. Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual superclasses. A straightforward solution is to search an architecture from scratch for each deployment scenario, which however is computation-intensive and impractical. To address this, we present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse superclasses with various resource constraints. To this end, we first propose to effectively train an over-parameterized network via a superclass dropout strategy during training. In this way, the resulting model is robust to the subsequent superclasses drop** at inference time. Based on the well-trained over-parameterized network, we then propose an efficient architecture generator to obtain promising architectures within a single forward pass. Experiments on three image classification datasets show that EAS is able to find more compact networks with better performance while remarkably being orders of magnitude faster than state-of-the-art NAS methods, e.g., outperforming OFA (once-for-all) by 1.3% on Top-1 accuracy at a budget around 361M #MAdds on ImageNet-10. More critically, EAS is able to find compact architectures within 0.1 second for 50 deployment scenarios.
△ Less
Submitted 15 March, 2022; v1 submitted 2 August, 2021;
originally announced August 2021.
-
Multi-Label Image Classification with Contrastive Learning
Authors:
Son D. Dao,
Ethan Zhao,
Dinh Phung,
Jianfei Cai
Abstract:
Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains. The success of constrastive learning in single-label classifications motivates us to leverage this learning framework to enhance distinctiveness for better performance in multi-label image classification. In this paper, we show that a direct applic…
▽ More
Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains. The success of constrastive learning in single-label classifications motivates us to leverage this learning framework to enhance distinctiveness for better performance in multi-label image classification. In this paper, we show that a direct application of contrastive learning can hardly improve in multi-label cases. Accordingly, we propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting, which learns multiple representations of an image under the context of different labels. This facilities a simple yet intuitive adaption of contrastive learning into our model to boost its performance in multi-label image classification. Extensive experiments on two benchmark datasets show that the proposed framework achieves state-of-the-art performance in the comparison with the advanced methods in multi-label classification.
△ Less
Submitted 24 July, 2021;
originally announced July 2021.
-
Observation estimate for the heat equations with Neumann boundary condition via logarithmic convexity
Authors:
Rémi Buffe,
Kim Dang Phung
Abstract:
We prove an inequality of Hölder type traducing the unique continuation property at one time for the heat equation with a potential and Neumann boundary condition. The main feature of the proof is to overcome the propagation of smallness by a global approach using a refined parabolic frequency function method. It relies with a Carleman commutator estimate to obtain the logarithmic convexity proper…
▽ More
We prove an inequality of Hölder type traducing the unique continuation property at one time for the heat equation with a potential and Neumann boundary condition. The main feature of the proof is to overcome the propagation of smallness by a global approach using a refined parabolic frequency function method. It relies with a Carleman commutator estimate to obtain the logarithmic convexity property of the frequency function.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Mobile Robot Localization Using Fuzzy Neural Network Based Extended Kalman Filter
Authors:
Thi Thanh Van Nguyen,
Manh Duong Phung,
Thuan Hoang Tran,
Quang Vinh Tran
Abstract:
This paper proposes a novel approach to improve the performance of the extended Kalman filter (EKF) for the problem of mobile robot localization. A fuzzy logic system is employed to continuous-ly adjust the noise covariance matrices of the filter. A neural network is implemented to regulate the membership functions of the antecedent and consequent parts of the fuzzy rules. The aim is to gain the a…
▽ More
This paper proposes a novel approach to improve the performance of the extended Kalman filter (EKF) for the problem of mobile robot localization. A fuzzy logic system is employed to continuous-ly adjust the noise covariance matrices of the filter. A neural network is implemented to regulate the membership functions of the antecedent and consequent parts of the fuzzy rules. The aim is to gain the accuracy and avoid the divergence of the EKF when the noise covariance matrices are fixed or wrongly determined. Simulations and experiments have been conducted. The results show that the proposed filter is better than the EKF in localizing the mobile robot.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Development of a Fast and Robust Gaze Tracking System for Game Applications
Authors:
Manh Duong Phung,
Cong Hoang Quach,
Quang Vinh Tran
Abstract:
In this study, a novel eye tracking system using a visual camera is developed to extract human's gaze, and it can be used in modern game machines to bring new and innovative interactive experience to players. Central to the components of the system, is a robust iris-center and eye-corner detection algorithm basing on it the gaze is continuously and adaptively extracted. Evaluation tests were appli…
▽ More
In this study, a novel eye tracking system using a visual camera is developed to extract human's gaze, and it can be used in modern game machines to bring new and innovative interactive experience to players. Central to the components of the system, is a robust iris-center and eye-corner detection algorithm basing on it the gaze is continuously and adaptively extracted. Evaluation tests were applied to nine people to evaluate the accuracy of the system and the results were 2.50 degrees (view angle) in horizontal direction and 3.07 degrees in vertical direction.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Text Generation with Deep Variational GAN
Authors:
Mahmoud Hossam,
Trung Le,
Michael Papasimeon,
Viet Huynh,
Dinh Phung
Abstract:
Generating realistic sequences is a central task in many machine learning applications. There has been considerable recent progress on building deep generative models for sequence generation tasks. However, the issue of mode-collapsing remains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principled approach…
▽ More
Generating realistic sequences is a central task in many machine learning applications. There has been considerable recent progress on building deep generative models for sequence generation tasks. However, the issue of mode-collapsing remains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principled approach. We change the standard GAN objective to maximize a variational lower-bound of the log-likelihood while minimizing the Jensen-Shanon divergence between data and model distributions. We experiment our model with text generation task and show that it can generate realistic text with high diversity.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Improved and Efficient Text Adversarial Attacks using Target Information
Authors:
Mahmoud Hossam,
Trung Le,
He Zhao,
Viet Huynh,
Dinh Phung
Abstract:
There has been recently a growing interest in studying adversarial examples on natural language models in the black-box setting. These methods attack natural language classifiers by perturbing certain important words until the classifier label is changed. In order to find these important words, these methods rank all words by importance by querying the target model word by word for each input sent…
▽ More
There has been recently a growing interest in studying adversarial examples on natural language models in the black-box setting. These methods attack natural language classifiers by perturbing certain important words until the classifier label is changed. In order to find these important words, these methods rank all words by importance by querying the target model word by word for each input sentence, resulting in high query inefficiency. A new interesting approach was introduced that addresses this problem through interpretable learning to learn the word ranking instead of previous expensive search. The main advantage of using this approach is that it achieves comparable attack rates to the state-of-the-art methods, yet faster and with fewer queries, where fewer queries are desirable to avoid suspicion towards the attacking agent. Nonetheless, this approach sacrificed the useful information that could be leveraged from the target classifier for that sake of query efficiency. In this paper we study the effect of leveraging the target model outputs and data on both attack rates and average number of queries, and we show that both can be improved, with a limited overhead of additional queries.
△ Less
Submitted 2 May, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Hierarchical Convolutional Neural Network with Feature Preservation and Autotuned Thresholding for Crack Detection
Authors:
Qiuchen Zhu,
Tran Hiep Dinh,
Manh Duong Phung,
Quang Phuc Ha
Abstract:
Drone imagery is increasingly used in automated inspection for infrastructure surface defects, especially in hazardous or unreachable environments. In machine vision, the key to crack detection rests with robust and accurate algorithms for image processing. To this end, this paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation (HCNNFP)…
▽ More
Drone imagery is increasingly used in automated inspection for infrastructure surface defects, especially in hazardous or unreachable environments. In machine vision, the key to crack detection rests with robust and accurate algorithms for image processing. To this end, this paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation (HCNNFP) and an intercontrast iterative thresholding algorithm for image binarization. First, a set of branch networks is proposed, wherein the output of previous convolutional blocks is half-sizedly concatenated to the current ones to reduce the obscuration in the down-sampling stage taking into account the overall information loss. Next, to extract the feature map generated from the enhanced HCNN, a binary contrast-based autotuned thresholding (CBAT) approach is developed at the post-processing step, where patterns of interest are clustered within the probability map of the identified features. The proposed technique is then applied to identify surface cracks on the surface of roads, bridges or pavements. An extensive comparison with existing techniques is conducted on various datasets and subject to a number of evaluation criteria including the average F-measure (AF\b{eta}) introduced here for dynamic quantification of the performance. Experiments on crack images, including those captured by unmanned aerial vehicles inspecting a monorail bridge. The proposed technique outperforms the existing methods on various tested datasets especially for GAPs dataset with an increase of about 1.4% in terms of AF\b{eta} while the mean percentage error drops by 2.2%. Such performance demonstrates the merits of the proposed HCNNFP architecture for surface defect inspection.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Safety-enhanced UAV Path Planning with Spherical Vector-based Particle Swarm Optimization
Authors:
Manh Duong Phung,
Quang Phuc Ha
Abstract:
This paper presents a new algorithm named spherical vector-based particle swarm optimization (SPSO) to deal with the problem of path planning for unmanned aerial vehicles (UAVs) in complicated environments subjected to multiple threats. A cost function is first formulated to convert the path planning into an optimization problem that incorporates requirements and constraints for the feasible and s…
▽ More
This paper presents a new algorithm named spherical vector-based particle swarm optimization (SPSO) to deal with the problem of path planning for unmanned aerial vehicles (UAVs) in complicated environments subjected to multiple threats. A cost function is first formulated to convert the path planning into an optimization problem that incorporates requirements and constraints for the feasible and safe operation of the UAV. SPSO is then used to find the optimal path that minimizes the cost function by efficiently searching the configuration space of the UAV via the correspondence between the particle position and the speed, turn angle and climb/dive angle of the UAV. To evaluate the performance of SPSO, eight benchmarking scenarios have been generated from real digital elevation model maps. The results show that the proposed SPSO outperforms not only other particle swarm optimization (PSO) variants including the classic PSO, phase angle-encoded PSO and quantum-behave PSO but also other state-of-the-art metaheuristic optimization algorithms including the genetic algorithm (GA), artificial bee colony (ABC), and differential evolution (DE) in most scenarios. In addition, experiments have been conducted to demonstrate the validity of the generated paths for real UAV operations. Source code of the algorithm can be found at https://github.com/duongpm/SPSO.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Node Co-occurrence based Graph Neural Networks for Knowledge Graph Link Prediction
Authors:
Dai Quoc Nguyen,
Vinh Tong,
Dinh Phung,
Dat Quoc Nguyen
Abstract:
We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of en…
▽ More
We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of entities and relations. Next, NoGE proposes Dual Quaternion Graph Neural Networks (DualQGNN) and utilizes DualQGNN to update vector representations for entity and relation nodes. NoGE then adopts a score function to produce the triple scores. Comprehensive experimental results show that NoGE obtains state-of-the-art results on three new and difficult benchmark datasets CoDEx for knowledge graph completion.
△ Less
Submitted 25 December, 2021; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Bridging Global Context Interactions for High-Fidelity Image Completion
Authors:
Chuanxia Zheng,
Tat-Jen Cham,
Jianfei Cai,
Dinh Phung
Abstract:
Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a…
▽ More
Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range dependence in the encoder. Crucially, we employ a restrictive CNN with small and non-overlap** RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets.
△ Less
Submitted 22 November, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Topic Modelling Meets Deep Neural Networks: A Survey
Authors:
He Zhao,
Dinh Phung,
Viet Huynh,
Yuan **,
Lan Du,
Wray Buntine
Abstract:
Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, summarisation and language models. There is a need to…
▽ More
Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, summarisation and language models. There is a need to summarise research developments and discuss open problems and future directions. In this paper, we provide a focused yet comprehensive overview of neural topic models for interested researchers in the AI community, so as to facilitate them to navigate and innovate in this fast-growing research area. To the best of our knowledge, ours is the first review focusing on this specific topic.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
On Transportation of Mini-batches: A Hierarchical Approach
Authors:
Khai Nguyen,
Dang Nguyen,
Quoc Nguyen,
Tung Pham,
Hung Bui,
Dinh Phung,
Trung Le,
Nhat Ho
Abstract:
Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads…
▽ More
Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with a very high number of supports. The m-OT solves several smaller optimal transport problems and then returns the average of their costs and transportation plans. Despite its scalability advantage, the m-OT does not consider the relationship between mini-batches which leads to undesirable estimation. Moreover, the m-OT does not approximate a proper metric between probability measures since the identity property is not satisfied. To address these problems, we propose a novel mini-batch scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT), that finds the optimal coupling between mini-batches and it can be seen as an approximation to a well-defined distance on the space of probability measures. Furthermore, we show that the m-OT is a limit of the entropic regularized version of the BoMb-OT when the regularized parameter goes to infinity. Finally, we carry out experiments on various applications including deep generative models, deep domain adaptation, approximate Bayesian computation, color transfer, and gradient flow to show that the BoMb-OT can be widely applied and performs well in various applications.
△ Less
Submitted 6 June, 2022; v1 submitted 11 February, 2021;
originally announced February 2021.
-
Understanding and Achieving Efficient Robustness with Adversarial Supervised Contrastive Learning
Authors:
Anh Bui,
Trung Le,
He Zhao,
Paul Montague,
Seyit Camtepe,
Dinh Phung
Abstract:
Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the opportunity to `contrast' between data and class representation in the latent space. In this paper, we investigate CL for improving model robustness…
▽ More
Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the opportunity to `contrast' between data and class representation in the latent space. In this paper, we investigate CL for improving model robustness using adversarial samples. We first designed and performed a comprehensive study to understand how adversarial vulnerability behaves in the latent space. Based on this empirical evidence, we propose an effective and efficient supervised contrastive learning to achieve model robustness against adversarial attacks. Moreover, we propose a new sample selection strategy that optimizes the positive/negative sets by removing redundancy and improving correlation with the anchor. Extensive experiments show that our Adversarial Supervised Contrastive Learning (ASCL) approach achieves comparable performance with the state-of-the-art defenses while significantly outperforms other CL-based defense methods by using only $42.8\%$ positives and $6.3\%$ negatives.
△ Less
Submitted 22 October, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Structural and Functional Decomposition for Personality Image Captioning in a Communication Game
Authors:
Thu Nguyen,
Duy Phung,
Minh Hoai,
Thien Huu Nguyen
Abstract:
Personality image captioning (PIC) aims to describe an image with a natural language caption given a personality trait. In this work, we introduce a novel formulation for PIC based on a communication game between a speaker and a listener. The speaker attempts to generate natural language captions while the listener encourages the generated captions to contain discriminative information about the i…
▽ More
Personality image captioning (PIC) aims to describe an image with a natural language caption given a personality trait. In this work, we introduce a novel formulation for PIC based on a communication game between a speaker and a listener. The speaker attempts to generate natural language captions while the listener encourages the generated captions to contain discriminative information about the input images and personality traits. In this way, we expect that the generated captions can be improved to naturally represent the images and express the traits. In addition, we propose to adapt the language model GPT2 to perform caption generation for PIC. This enables the speaker and listener to benefit from the language encoding capacity of GPT2. Our experiments show that the proposed model achieves the state-of-the-art performance for PIC.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Remarks on results by Müger and Tuset on the moments of polynomials
Authors:
Greg Markowsky,
Dylan Phung
Abstract:
Let $f(x)$ be a non-zero polynomial with complex coefficients, and $M_p = \int_{0}^1 f(x)^p dx$ for $p$ a positive integer. In a recent paper, Müger and Tuset showed that $\limsup_{p \to \infty} |M_p|^{1/p} > 0$, and conjectured that this limit is equal to the maximum amongst the critical values of $f$ together with the values $|f(0)|$ and $|f(1)|$. We give an example that shows that this conjectu…
▽ More
Let $f(x)$ be a non-zero polynomial with complex coefficients, and $M_p = \int_{0}^1 f(x)^p dx$ for $p$ a positive integer. In a recent paper, Müger and Tuset showed that $\limsup_{p \to \infty} |M_p|^{1/p} > 0$, and conjectured that this limit is equal to the maximum amongst the critical values of $f$ together with the values $|f(0)|$ and $|f(1)|$. We give an example that shows that this conjecture is false. It also may be natural to guess that $\limsup_{p \to \infty} |M_p|^{1/p}$ is equal to the maximum of $|f(x)|$ on $[0,1]$. However, we give a counterexample to this as well. We also provide a few more guesses as to the behaviour of the quantity $\limsup_{p \to \infty} |M_p|^{1/p}$.
△ Less
Submitted 17 November, 2020; v1 submitted 12 November, 2020;
originally announced November 2020.
-
Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Authors:
Quan Tran,
Nhan Dam,
Tuan Lai,
Franck Dernoncourt,
Trung Le,
Nham Le,
Dinh Phung
Abstract:
Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on simil…
▽ More
Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability
Authors:
Mahmoud Hossam,
Trung Le,
He Zhao,
Dinh Phung
Abstract:
Training robust deep learning models for down-stream tasks is a critical challenge. Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks.…
▽ More
Training robust deep learning models for down-stream tasks is a critical challenge. Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper, we propose Explain2Attack, a black-box adversarial attack on text classification task. Instead of searching for important words to be perturbed by querying the target model, Explain2Attack employs an interpretable substitute model from a similar domain to learn word importance scores. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.
△ Less
Submitted 16 January, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Learning to Attack with Fewer Pixels: A Probabilistic Post-hoc Framework for Refining Arbitrary Dense Adversarial Attacks
Authors:
He Zhao,
Thanh Nguyen,
Trung Le,
Paul Montague,
Olivier De Vel,
Tamas Abraham,
Dinh Phung
Abstract:
Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Many adversarial attacks belong to the category of dense attacks, which generate adversarial examples by perturbing all the pixels of a natural image. To generate sparse perturbations, sparse attacks have been recently developed, w…
▽ More
Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Many adversarial attacks belong to the category of dense attacks, which generate adversarial examples by perturbing all the pixels of a natural image. To generate sparse perturbations, sparse attacks have been recently developed, which are usually independent attacks derived by modifying a dense attack's algorithm with sparsity regularisations, resulting in reduced attack efficiency. In this paper, we aim to tackle this task from a different perspective. We select the most effective perturbations from the ones generated from a dense attack, based on the fact we find that a considerable amount of the perturbations on an image generated by dense attacks may contribute little to attacking a classifier. Accordingly, we propose a probabilistic post-hoc framework that refines given dense attacks by significantly reducing the number of perturbed pixels but kee** their attack power, trained with mutual information maximisation. Given an arbitrary dense attack, the proposed model enjoys appealing compatibility for making its adversarial images more realistic and less detectable with fewer perturbations. Moreover, our framework performs adversarial attacks much faster than existing sparse attacks.
△ Less
Submitted 21 February, 2022; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Motion-Encoded Particle Swarm Optimization for Moving Target Search Using UAVs
Authors:
Manh Duong Phung,
Quang Phuc Ha
Abstract:
This paper presents a novel algorithm named the motion-encoded particle swarm optimization (MPSO) for finding a moving target with unmanned aerial vehicles (UAVs). From the Bayesian theory, the search problem can be converted to the optimization of a cost function that represents the probability of detecting the target. Here, the proposed MPSO is developed to solve that problem by encoding the sea…
▽ More
This paper presents a novel algorithm named the motion-encoded particle swarm optimization (MPSO) for finding a moving target with unmanned aerial vehicles (UAVs). From the Bayesian theory, the search problem can be converted to the optimization of a cost function that represents the probability of detecting the target. Here, the proposed MPSO is developed to solve that problem by encoding the search trajectory as a series of UAV motion paths evolving over the generation of particles in a PSO algorithm. This motion-encoded approach allows for preserving important properties of the swarm including the cognitive and social coherence, and thus resulting in better solutions. Results from extensive simulations with existing methods show that the proposed MPSO improves the detection performance by 24\% and time performance by 4.71 times compared to the original PSO, and moreover, also outperforms other state-of-the-art metaheuristic optimization algorithms including the artificial bee colony (ABC), ant colony optimization (ACO), genetic algorithm (GA), differential evolution (DE), and tree-seed algorithm (TSA) in most search scenarios. Experiments have been conducted with real UAVs in searching for a dynamic target in different scenarios to demonstrate MPSO merits in a practical application.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models
Authors:
Thuy-Trang Vu,
Dinh Phung,
Gholamreza Haffari
Abstract:
Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \emph{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language…
▽ More
Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \emph{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \emph{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
QuatRE: Relation-Aware Quaternions for Knowledge Graph Embeddings
Authors:
Dai Quoc Nguyen,
Thanh Vu,
Tu Dinh Nguyen,
Dinh Phung
Abstract:
We propose a simple yet effective embedding model to learn quaternion embeddings for entities and relations in knowledge graphs. Our model aims to enhance correlations between head and tail entities given a relation within the Quaternion space with Hamilton product. The model achieves this goal by further associating each relation with two relation-aware rotations, which are used to rotate quatern…
▽ More
We propose a simple yet effective embedding model to learn quaternion embeddings for entities and relations in knowledge graphs. Our model aims to enhance correlations between head and tail entities given a relation within the Quaternion space with Hamilton product. The model achieves this goal by further associating each relation with two relation-aware rotations, which are used to rotate quaternion embeddings of the head and tail entities, respectively. Experimental results show that our proposed model produces state-of-the-art performances on well-known benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QuatRE}.
△ Less
Submitted 8 March, 2022; v1 submitted 26 September, 2020;
originally announced September 2020.
-
Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness
Authors:
Anh Bui,
Trung Le,
He Zhao,
Paul Montague,
Olivier deVel,
Tamas Abraham,
Dinh Phung
Abstract:
Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets…
▽ More
Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy. Our code is available at: https://github.com/tuananhbui89/Crossing-Collaborative-Ensemble.
△ Less
Submitted 4 February, 2022; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Neural Topic Model via Optimal Transport
Authors:
He Zhao,
Dinh Phung,
Viet Huynh,
Trung Le,
Wray Buntine
Abstract:
Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of repar…
▽ More
Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of reparameterisation could also comprise their training quality and model flexibility. To address these shortcomings, we present a new neural topic model via the theory of optimal transport (OT). Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Importantly, the cost matrix of the OT distance models the weights between topics and words, which is constructed by the distances between topics and words in an embedding space. Our proposed model can be trained efficiently with a differentiable loss. Extensive experiments show that our framework significantly outperforms the state-of-the-art NTMs on discovering more coherent and diverse topics and deriving better document representations for both regular and short texts.
△ Less
Submitted 31 May, 2022; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Quaternion Graph Neural Networks
Authors:
Dai Quoc Nguyen,
Tu Dinh Nguyen,
Dinh Phung
Abstract:
Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, w…
▽ More
Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}.
△ Less
Submitted 6 October, 2021; v1 submitted 11 August, 2020;
originally announced August 2020.
-
MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models
Authors:
Thanh Nguyen-Duc,
He Zhao,
Jianfei Cai,
Dinh Phung
Abstract:
Deep learning methods usually require a large amount of training data and lack interpretability. In this paper, we propose a novel knowledge distillation and model interpretation framework for medical image classification that jointly solves the above two issues. Specifically, to address the data-hungry issue, a small student model is learned with less data by distilling knowledge from a cumbersom…
▽ More
Deep learning methods usually require a large amount of training data and lack interpretability. In this paper, we propose a novel knowledge distillation and model interpretation framework for medical image classification that jointly solves the above two issues. Specifically, to address the data-hungry issue, a small student model is learned with less data by distilling knowledge from a cumbersome pretrained teacher model. To interpret the teacher model and assist the learning of the student, an explainer module is introduced to highlight the regions of an input that are important for the predictions of the teacher model. Furthermore, the joint framework is trained by a principled way derived from the information-theoretic perspective. Our framework outperforms on the knowledge distillation and model interpretation tasks compared to state-of-the-art methods on a fundus dataset.
△ Less
Submitted 12 January, 2022; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Improving Adversarial Robustness by Enforcing Local and Global Compactness
Authors:
Anh Bui,
Trung Le,
He Zhao,
Paul Montague,
Olivier deVel,
Tamas Abraham,
Dinh Phung
Abstract:
The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representa…
▽ More
The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
A Self-Attention Network based Node Embedding Model
Authors:
Dai Quoc Nguyen,
Tu Dinh Nguyen,
Dinh Phung
Abstract:
Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extracti…
▽ More
Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes -- a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE -- a novel unsupervised embedding model -- whose central idea is to employ a transformer self-attention network to iteratively aggregate vector representations of nodes in random walks. Our SANNE aims to produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that the proposed SANNE obtains state-of-the-art results for the node classification task on well-known benchmark datasets.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.