-
Robust Contrastive Learning With Theory Guarantee
Authors:
Ngoc N. Tran,
Lam Tran,
Hoang Phan,
Anh Bui,
Tung Pham,
Toan Tran,
Dinh Phung,
Trung Le
Abstract:
Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyz…
▽ More
Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyzed how the unsupervised loss in the first phase can support the supervised loss in the second phase, none has examined the connection between the unsupervised loss and the robust supervised loss, which can shed light on how to construct an effective unsupervised loss for the first phase of CL. To fill this gap, our work develops rigorous theories to dissect and identify which components in the unsupervised loss can help improve the robust supervised loss and conduct proper experiments to verify our findings.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Sharpness & Shift-Aware Self-Supervised Learning
Authors:
Ngoc N. Tran,
Son Duong,
Hoang Phan,
Tung Pham,
Dinh Phung,
Trung Le
Abstract:
Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. In this paper, we consider classification as a downstream task in phase 2 and develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. Our theories signify that sharpness-aware feature extractors benefit the classification ta…
▽ More
Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. In this paper, we consider classification as a downstream task in phase 2 and develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. Our theories signify that sharpness-aware feature extractors benefit the classification task in phase 2 and the existing data shift between the ideal (i.e., the ideal one used in theory development) and practical (i.e., the practical one used in implementation) distributions to generate positive pairs also remarkably affects this classification task. Further harvesting these theoretical findings, we propose to minimize the sharpness of the feature extractor and a new Fourier-based data augmentation technique to relieve the data shift in the distributions generating positive pairs, reaching Sharpness & Shift-Aware Contrastive Learning (SSA-CLR). We conduct extensive experiments to verify our theoretical findings and demonstrate that sharpness & shift-aware contrastive learning can remarkably boost the performance as well as obtaining more robust extracted features compared with the baselines.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Multiple Perturbation Attack: Attack Pixelwise Under Different $\ell_p$-norms For Better Adversarial Performance
Authors:
Ngoc N. Tran,
Anh Tuan Bui,
Dinh Phung,
Trung Le
Abstract:
Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to maliciou…
▽ More
Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.
△ Less
Submitted 7 December, 2022; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Improving Multi-task Learning via Seeking Task-based Flat Regions
Authors:
Hoang Phan,
Lam Tran,
Ngoc N. Tran,
Nhat Ho,
Dinh Phung,
Trung Le
Abstract:
Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a…
▽ More
Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a variety of applications, ranging from computer vision to natural language processing and speech recognition. Among them, there is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction to benefit all tasks. Despite achieving impressive results on many benchmarks, directly applying these approaches without using appropriate regularization techniques might lead to suboptimal solutions on real-world problems. In particular, standard training that minimizes the empirical loss on the training data can easily suffer from overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which can cause negative transfer between tasks and overall performance drop. To alleviate such problems, we propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning. Accordingly, we present a novel MTL training methodology, encouraging the model to find task-based flat minima for coherently improving its generalization capability on all tasks. Finally, we conduct comprehensive experiments on a variety of applications to demonstrate the merit of our proposed approach to existing gradient-based MTL methods, as suggested by our developed theory.
△ Less
Submitted 29 September, 2023; v1 submitted 24 November, 2022;
originally announced November 2022.
-
ReINTEL Challenge 2020: A Comparative Study of Hybrid Deep Neural Network for Reliable Intelligence Identification on Vietnamese SNSs
Authors:
Hoang Viet Trinh,
Tung Tien Bui,
Tam Minh Nguyen,
Huy Quang Dao,
Quang Huu Pham,
Ngoc N. Tran,
Ta Minh Thanh
Abstract:
The overwhelming abundance of data has created a misinformation crisis. Unverified sensationalism that is designed to grab the readers' short attention span, when crafted with malice, has caused irreparable damage to our society's structure. As a result, determining the reliability of an article has become a crucial task. After various ablation studies, we propose a multi-input model that can effe…
▽ More
The overwhelming abundance of data has created a misinformation crisis. Unverified sensationalism that is designed to grab the readers' short attention span, when crafted with malice, has caused irreparable damage to our society's structure. As a result, determining the reliability of an article has become a crucial task. After various ablation studies, we propose a multi-input model that can effectively leverage both tabular metadata and post content for the task. Applying state-of-the-art finetuning techniques for the pretrained component and training strategies for our complete model, we have achieved a 0.9462 ROC-score on the VLSP private test set.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Efficient Low-Latency Dynamic Licensing for Deep Neural Network Deployment on Edge Devices
Authors:
Toan Pham Van,
Ngoc N. Tran,
Hoang Pham Minh,
Tam Nguyen Minh,
Thanh Ta Minh
Abstract:
Along with the rapid development in the field of artificial intelligence, especially deep learning, deep neural network applications are becoming more and more popular in reality. To be able to withstand the heavy load from mainstream users, deployment techniques are essential in bringing neural network models from research to production. Among the two popular computing topologies for deploying ne…
▽ More
Along with the rapid development in the field of artificial intelligence, especially deep learning, deep neural network applications are becoming more and more popular in reality. To be able to withstand the heavy load from mainstream users, deployment techniques are essential in bringing neural network models from research to production. Among the two popular computing topologies for deploying neural network models in production are cloud-computing and edge-computing. Recent advances in communication technologies, along with the great increase in the number of mobile devices, has made edge-computing gradually become an inevitable trend. In this paper, we propose an architecture to solve deploying and processing deep neural networks on edge-devices by leveraging their synergy with the cloud and the access-control mechanisms of the database. Adopting this architecture allows low-latency DNN model updates on devices. At the same time, with only one model deployed, we can easily make different versions of it by setting access permissions on the model weights. This method allows for dynamic model licensing, which benefits commercial applications.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection
Authors:
Quang Huu Pham,
Viet Anh Nguyen,
Linh Bao Doan,
Ngoc N. Tran,
Ta Minh Thanh
Abstract:
Natural language processing is a fast-growing field of artificial intelligence. Since the Transformer was introduced by Google in 2017, a large number of language models such as BERT, GPT, and ELMo have been inspired by this architecture. These models were trained on huge datasets and achieved state-of-the-art results on natural language understanding. However, fine-tuning a pre-trained language m…
▽ More
Natural language processing is a fast-growing field of artificial intelligence. Since the Transformer was introduced by Google in 2017, a large number of language models such as BERT, GPT, and ELMo have been inspired by this architecture. These models were trained on huge datasets and achieved state-of-the-art results on natural language understanding. However, fine-tuning a pre-trained language model on much smaller datasets for downstream tasks requires a carefully-designed pipeline to mitigate problems of the datasets such as lack of training data and imbalanced data. In this paper, we propose a pipeline to adapt the general-purpose RoBERTa language model to a specific text classification task: Vietnamese Hate Speech Detection. We first tune the PhoBERT on our dataset by re-training the model on the Masked Language Model task; then, we employ its encoder for text classification. In order to preserve pre-trained weights while learning new feature representations, we further utilize different training techniques: layer freezing, block-wise learning rate, and label smoothing. Our experiments proved that our proposed pipeline boosts the performance significantly, achieving a new state-of-the-art on Vietnamese Hate Speech Detection campaign with 0.7221 F1 score.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Interpreting the Latent Space of Generative Adversarial Networks using Supervised Learning
Authors:
Toan Pham Van,
Tam Minh Nguyen,
Ngoc N. Tran,
Hoai Viet Nguyen,
Linh Bao Doan,
Huy Quang Dao,
Thanh Ta Minh
Abstract:
With great progress in the development of Generative Adversarial Networks (GANs), in recent years, the quest for insights in understanding and manipulating the latent space of GAN has gained more and more attention due to its wide range of applications. While most of the researches on this task have focused on unsupervised learning method, which induces difficulties in training and limitation in r…
▽ More
With great progress in the development of Generative Adversarial Networks (GANs), in recent years, the quest for insights in understanding and manipulating the latent space of GAN has gained more and more attention due to its wide range of applications. While most of the researches on this task have focused on unsupervised learning method, which induces difficulties in training and limitation in results, our work approaches another direction, encoding human's prior knowledge to discover more about the hidden space of GAN. With this supervised manner, we produce promising results, demonstrated by accurate manipulation of generated images. Even though our model is more suitable for task-specific problems, we hope that its ease in implementation, preciseness, robustness, and the allowance of richer set of properties (compared to other approaches) for image manipulation can enhance the result of many current applications.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Efficient Palm-Line Segmentation with U-Net Context Fusion Module
Authors:
Toan Pham Van,
Son Trung Nguyen,
Linh Bao Doan,
Ngoc N. Tran,
Ta Minh Thanh
Abstract:
Many cultures around the world believe that palm reading can be used to predict the future life of a person. Palmistry uses features of the hand such as palm lines, hand shape, or fingertip position. However, the research on palm-line detection is still scarce, many of them applied traditional image processing techniques. In most real-world scenarios, images usually are not in well-conditioned, ca…
▽ More
Many cultures around the world believe that palm reading can be used to predict the future life of a person. Palmistry uses features of the hand such as palm lines, hand shape, or fingertip position. However, the research on palm-line detection is still scarce, many of them applied traditional image processing techniques. In most real-world scenarios, images usually are not in well-conditioned, causing these methods to severely under-perform. In this paper, we propose an algorithm to extract principle palm lines from an image of a person's hand. Our method applies deep learning networks (DNNs) to improve performance. Another challenge of this problem is the lack of training data. To deal with this issue, we handcrafted a dataset from scratch. From this dataset, we compare the performance of readily available methods with ours. Furthermore, based on the UNet segmentation neural network architecture and the knowledge of attention mechanism, we propose a highly efficient architecture to detect palm-lines. We proposed the Context Fusion Module to capture the most important context feature, which aims to improve segmentation accuracy. The experimental results show that it outperforms the other methods with the highest F1 Score about 99.42% and mIoU is 0.584 for the same dataset.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music
Authors:
Toan Pham Van,
Ngoc N. Tran,
Ta Minh Thanh
Abstract:
Singer voice classification is a meaningful task in the digital era. With a huge number of songs today, identifying a singer is very helpful for music information retrieval, music properties indexing, and so on. In this paper, we propose a new method to identify the singer's name based on analysis of Vietnamese popular music. We employ the use of vocal segment detection and singing voice separatio…
▽ More
Singer voice classification is a meaningful task in the digital era. With a huge number of songs today, identifying a singer is very helpful for music information retrieval, music properties indexing, and so on. In this paper, we propose a new method to identify the singer's name based on analysis of Vietnamese popular music. We employ the use of vocal segment detection and singing voice separation as the pre-processing steps. The purpose of these steps is to extract the singer's voice from the mixture sound. In order to build a singer classifier, we propose a neural network architecture working with Mel Frequency Cepstral Coefficient as extracted input features from said vocal. To verify the accuracy of our methods, we evaluate on a dataset of 300 Vietnamese songs from 18 famous singers. We achieve an accuracy of 92.84% with 5-fold stratified cross-validation, the best result compared to other methods on the same data set.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
Deep Neural Networks based Invisible Steganography for Audio-into-Image Algorithm
Authors:
Quang Pham Huu,
Thoi Hoang Dinh,
Ngoc N. Tran,
Toan Pham Van,
Thanh Ta Minh
Abstract:
In the last few years, steganography has attracted increasing attention from a large number of researchers since its applications are expanding further than just the field of information security. The most traditional method is based on digital signal processing, such as least significant bit encoding. Recently, there have been some new approaches employing deep learning to address the problem of…
▽ More
In the last few years, steganography has attracted increasing attention from a large number of researchers since its applications are expanding further than just the field of information security. The most traditional method is based on digital signal processing, such as least significant bit encoding. Recently, there have been some new approaches employing deep learning to address the problem of steganography. However, most of the existing approaches are designed for image-in-image steganography. In this paper, the use of deep learning techniques to hide secret audio into the digital images is proposed. We employ a joint deep neural network architecture consisting of two sub-models: the first network hides the secret audio into an image, and the second one is responsible for decoding the image to obtain the original audio. Extensive experiments are conducted with a set of 24K images and the VIVOS Corpus audio dataset. Through experimental results, it can be seen that our method is more effective than traditional approaches. The integrity of both image and audio is well preserved, while the maximum length of the hidden audio is significantly improved.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Designing constraint-based false data injection attacks against the unbalanced distribution smart grids
Authors:
Nam N. Tran,
Hemanshu R. Pota,
Quang N. Tran,
Jiankun Hu
Abstract:
The advent of smart power grid which plays a vital role in the upcoming smart city era is accompanied with the implementation of a monitoring tool, called state estimation. For the case of the unbalanced residential distribution grid, the state estimating operation which is conducted at a regional scale is considered as an application of the edge computing-based Internet of Things (IoT). While the…
▽ More
The advent of smart power grid which plays a vital role in the upcoming smart city era is accompanied with the implementation of a monitoring tool, called state estimation. For the case of the unbalanced residential distribution grid, the state estimating operation which is conducted at a regional scale is considered as an application of the edge computing-based Internet of Things (IoT). While the outcome of the state estimation is important to the subsequent control activities, its accuracy heavily depends on the data integrity of the information collected from the scattered measurement devices. This fact exposes the vulnerability of the state estimation module under the effect of data-driven attacks. Among these, false data injection attack (FDI) is attracting much attention due to its capability to interfere with the normal operation of the network without being detected. This paper presents an attack design scheme based on a nonlinear physical-constraint model that is able to produce an FDI attack with theoretically stealthy characteristic. To demonstrate the effectiveness of the proposed design scheme, simulations with the IEEE 13-node test feeder and the WSCC 9-bus system are conducted. The experimental results indicate that not only the false positive rate of the bad data detection mechanism is 100 per cent but the physical consequence of the attack is severe. These results pose a serious challenge for operators in maintaining the integrity of measurement data.
△ Less
Submitted 1 February, 2021; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Designing False Data Injection attacks penetrating AC-based Bad Data Detection System and FDI Dataset generation
Authors:
Nam N. Tran,
Hemanshu R. Pota,
Quang N. Tran,
Xuefei Yin,
Jiankun Hu
Abstract:
The evolution of the traditional power system towards the modern smart grid has posed many new cybersecurity challenges to this critical infrastructure. One of the most dangerous cybersecurity threats is the False Data Injection (FDI) attack, especially when it is capable of completely bypassing the widely deployed Bad Data Detector of State Estimation and interrupting the normal operation of the…
▽ More
The evolution of the traditional power system towards the modern smart grid has posed many new cybersecurity challenges to this critical infrastructure. One of the most dangerous cybersecurity threats is the False Data Injection (FDI) attack, especially when it is capable of completely bypassing the widely deployed Bad Data Detector of State Estimation and interrupting the normal operation of the power system. Most of the simulated FDI attacks are designed using simplified linearized DC model while most industry standard State Estimation systems are based on the nonlinear AC model. In this paper, a comprehensive FDI attack scheme is presented based on the nonlinear AC model. A case study of the nine-bus Western System Coordinated Council (WSCC)'s power system is provided, using an industry standard package to assess the outcomes of the proposed design scheme. A public FDI dataset is generated as a test set for the community to develop and evaluate new detection algorithms, which are lacking in the field. The FDI's stealthy quality of the dataset is assessed and proven through a preliminary analysis based on both physical power law and statistical analysis.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Minc's generating function and a Segal conjecture for Thom spectra. La fonction generatrice de Minc et une conjecture de Segal pour certains spectres de Thom
Authors:
Dang Ho Hai Nguyen,
Lionel Schwartz,
Ngoc Nam Tran
Abstract:
One constructs minimal injective resolutions for certain unstable modules that appears to be the mod 2 cohomology of Thom spectra. The terms of the resolution are tensor products of Brown-Gitler modules and Steinberg modules introduced by S. Mitchell and S. Priddy. A combinatorial result of Andrews shows that the alternating sum of the Poincare series of the considered modules is zero. One gives…
▽ More
One constructs minimal injective resolutions for certain unstable modules that appears to be the mod 2 cohomology of Thom spectra. The terms of the resolution are tensor products of Brown-Gitler modules and Steinberg modules introduced by S. Mitchell and S. Priddy. A combinatorial result of Andrews shows that the alternating sum of the Poincare series of the considered modules is zero. One gives homotopical applications of this result.
△ Less
Submitted 21 July, 2009;
originally announced July 2009.