Search | arXiv e-print repository

Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning

Authors: Cuong Pham, Cuong C. Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do

Abstract: Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The p… ▽ More Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The proposed approaches aim to increase diversity in both network parameter distributions and feature distributions, promoting peer networks to acquire distinct features that capture different characteristics of the input, which enhances the effectiveness of mutual learning. Experimental results demonstrate significant improvements in the classification accuracy, negative log-likelihood, and expected calibration error when compared to traditional mutual learning for BNNs. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted to NeurIPS 2023

arXiv:2406.09008 [pdf, other]

LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

Authors: Xiaohao Yang, He Zhao, Dinh Phung, Wray Buntine, Lan Du

Abstract: Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the ov… ▽ More Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the overall model performance. In this paper, we propose WALM (Words Agreement with Language Model), a new evaluation method for topic modeling that comprehensively considers the semantic quality of document representations and topics in a joint manner, leveraging the power of large language models (LLMs). With extensive experiments involving different types of topic models, WALM is shown to align with human judgment and can serve as a complementary evaluation method to the existing ones, bringing a new perspective to topic modeling. Our software package will be available at https://github.com/Xiaohao-Yang/Topic_Model_Evaluation, which can be integrated with many widely used topic models. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07107 [pdf, other]

Agnostic Sharpness-Aware Minimization

Authors: Van-Anh Nguyen, Quyen Tran, Tuan Truong, Thanh-Toan Do, Dinh Phung, Trung Le

Abstract: Sharpness-aware minimization (SAM) has been instrumental in improving deep neural network training by minimizing both the training loss and the sharpness of the loss landscape, leading the model into flatter minima that are associated with better generalization properties. In another aspect, Model-Agnostic Meta-Learning (MAML) is a framework designed to improve the adaptability of models. MAML opt… ▽ More Sharpness-aware minimization (SAM) has been instrumental in improving deep neural network training by minimizing both the training loss and the sharpness of the loss landscape, leading the model into flatter minima that are associated with better generalization properties. In another aspect, Model-Agnostic Meta-Learning (MAML) is a framework designed to improve the adaptability of models. MAML optimizes a set of meta-models that are specifically tailored for quick adaptation to multiple tasks with minimal fine-tuning steps and can generalize well with limited data. In this work, we explore the connection between SAM and MAML, particularly in terms of enhancing model generalization. We introduce Agnostic-SAM, a novel approach that combines the principles of both SAM and MAML. Agnostic-SAM adapts the core idea of SAM by optimizing the model towards wider local minima using training data, while concurrently maintaining low loss values on validation data. By doing so, it seeks flatter minima that are not only robust to small perturbations but also less vulnerable to data distributional shift problems. Our experimental results demonstrate that Agnostic-SAM significantly improves generalization over baselines across a range of datasets and under challenging conditions such as noisy labels and data limitation. △ Less

Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2406.05641 [pdf, other]

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction

Authors: Shangyu Chen, Zizheng Pan, Jianfei Cai, Dinh Phung

Abstract: Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with divers… ▽ More Personalizing a large-scale pretrained Text-to-Image (T2I) diffusion model is challenging as it typically struggles to make an appropriate trade-off between its training data distribution and the target distribution, i.e., learning a novel concept with only a few target images to achieve personalization (aligning with the personalized target) while preserving text editability (aligning with diverse text prompts). In this paper, we propose PaRa, an effective and efficient Parameter Rank Reduction approach for T2I model personalization by explicitly controlling the rank of the diffusion model parameters to restrict its initial diverse generation space into a small and well-balanced target space. Our design is motivated by the fact that taming a T2I model toward a novel concept such as a specific art style implies a small generation space. To this end, by reducing the rank of model parameters during finetuning, we can effectively constrain the space of the denoising sampling trajectories towards the target. With comprehensive experiments, we show that PaRa achieves great advantages over existing finetuning approaches on single/multi-subject generation as well as single-image editing. Notably, compared to the prevailing fine-tuning technique LoRA, PaRa achieves better parameter efficiency (2x fewer learnable parameters) and much better target image alignment. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.12578 [pdf, ps, other]

Trend to equilibrium for degenerate reaction-diffusion systems coming out of chemistry

Authors: Laurent Desvillettes, Kim Dang Phung, Bao Quoc Tang

Abstract: The trend to equilibrium for reaction-diffusion systems coming out of chemistry is investigated, in the case when reaction processes might happen only on some open subsets of the domain. A special case has been studied recently in [Desvillettes, L., \\& Phung, K. D. (2022). Journal of Differential Equations, 338, 227-255] using log convexity technique from controllability theory, which in turn req… ▽ More The trend to equilibrium for reaction-diffusion systems coming out of chemistry is investigated, in the case when reaction processes might happen only on some open subsets of the domain. A special case has been studied recently in [Desvillettes, L., \\& Phung, K. D. (2022). Journal of Differential Equations, 338, 227-255] using log convexity technique from controllability theory, which in turn requires some amount of regularity for the solutions, and is difficult to generalise to more general systems. In this paper, we prove the convergence to equilibrium directly using vector-valued functional inequalities. One major advantage of our approach is that it allows to deal with nonlinearities of arbitrary orders, for which only global renormalised solutions are known to globally exist. For a specific situation where solutions are known to be bounded, we also prove the convergence to equilibrium when the diffusion as well as the reaction rates are degenerate. For this situation, we also treat the case of reactions happening in a set of strictly positive measure which may have an empty interior. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.10084 [pdf, other]

Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation

Authors: Manh Luong, Khai Nguyen, Nhat Ho, Reza Haf, Dinh Phung, Lizhen Qu

Abstract: The Learning-to-match (LTM) framework proves to be an effective inverse optimal transport approach for learning the underlying ground metric between two sources of data, facilitating subsequent matching. However, the conventional LTM framework faces scalability challenges, necessitating the use of the entire dataset each time the parameters of the ground metric are updated. In adapting LTM to the… ▽ More The Learning-to-match (LTM) framework proves to be an effective inverse optimal transport approach for learning the underlying ground metric between two sources of data, facilitating subsequent matching. However, the conventional LTM framework faces scalability challenges, necessitating the use of the entire dataset each time the parameters of the ground metric are updated. In adapting LTM to the deep learning context, we introduce the mini-batch Learning-to-match (m-LTM) framework for audio-text retrieval problems. This framework leverages mini-batch subsampling and Mahalanobis-enhanced family of ground metrics. Moreover, to cope with misaligned training data in practice, we propose a variant using partial optimal transport to mitigate the harm of misaligned data pairs in training data. We conduct extensive experiments on audio-text matching problems using three datasets: AudioCaps, Clotho, and ESC-50. Results demonstrate that our proposed method is capable of learning rich and expressive joint embedding space, which achieves SOTA performance. Beyond this, the proposed m-LTM framework is able to close the modality gap across audio and text embedding, which surpasses both triplet and contrastive loss in the zero-shot sound event detection task on the ESC-50 dataset. Notably, our strategy of employing partial optimal transport with m-LTM demonstrates greater noise tolerance than contrastive loss, especially under varying noise ratios in training data on the AudioCaps dataset. Our code is available at https://github.com/v-manhlt3/m-LTM-Audio-Text-Retrieval △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.13583 [pdf, other]

doi 10.1017/S0263574724000559

Radial Basis Function Neural Networks for Formation Control of Unmanned Aerial Vehicles

Authors: Duy-Nam Bui, Manh Duong Phung

Abstract: This paper addresses the problem of controlling multiple unmanned aerial vehicles (UAVs) cooperating in a formation to carry out a complex task such as surface inspection. We first use the virtual leader-follower model to determine the topology and trajectory of the formation. A double-loop control system combining backstep** and sliding mode control techniques is then designed for the UAVs to t… ▽ More This paper addresses the problem of controlling multiple unmanned aerial vehicles (UAVs) cooperating in a formation to carry out a complex task such as surface inspection. We first use the virtual leader-follower model to determine the topology and trajectory of the formation. A double-loop control system combining backstep** and sliding mode control techniques is then designed for the UAVs to track the trajectory. A radial basis function neural network (RBFNN) capable of estimating external disturbances is developed to enhance the robustness of the controller. The stability of the controller is proven by using the Lyapunov theorem. A number of comparisons and software-in-the-loop (SIL) tests have been conducted to evaluate the performance of the proposed controller. The results show that our controller not only outperforms other state-of-the-art controllers but is also sufficient for complex tasks of UAVs such as collecting surface data for inspection. The source code of our controller can be found at https://github.com/duynamrcv/rbf_bsmc △ Less

Submitted 21 April, 2024; originally announced April 2024.

Journal ref: Robotica, 2024

arXiv:2404.07949 [pdf, other]

Taming Stable Diffusion for Text to 360° Panorama Image Generation

Authors: Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

Abstract: Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to gen… ▽ More Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and, thanks to its dual-branch structure, can integrate additional constraints like room layout for customized panorama outputs. Code is available at https://chengzhag.github.io/publication/panfusion. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project Page: https://chengzhag.github.io/publication/panfusion Code: https://github.com/chengzhag/PanFusion

arXiv:2404.01226 [pdf, other]

Stable Code Technical Report

Authors: Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, Nathan Cooper

Abstract: We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing quest… ▽ More We introduce Stable Code, the first in our new-generation of code language models series, which serves as a general-purpose base code language model targeting code completion, reasoning, math, and other software engineering-based tasks. Additionally, we introduce an instruction variant named Stable Code Instruct that allows conversing with the model in a natural chat interface for performing question-answering and instruction-based tasks. In this technical report, we detail the data and training procedure leading to both models. Their weights are available via Hugging Face for anyone to download and use at https://huggingface.co/stabilityai/stable-code-3b and https://huggingface.co/stabilityai/stable-code-instruct-3b. This report contains thorough evaluations of the models, including multilingual programming benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of its release, Stable Code is the state-of-the-art open model under 3B parameters and even performs comparably to larger models of sizes 7 billion and 15 billion parameters on the popular Multi-PL benchmark. Stable Code Instruct also exhibits state-of-the-art performance on the MT-Bench coding tasks and on Multi-PL completion compared to other instruction tuned models. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.14101 [pdf, other]

Text-Enhanced Data-free Approach for Federated Class-Incremental Learning

Authors: Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Dinh Phung

Abstract: Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, c… ▽ More Federated Class-Incremental Learning (FCIL) is an underexplored yet pivotal issue, involving the dynamic addition of new classes in the context of federated learning. In this field, Data-Free Knowledge Transfer (DFKT) plays a crucial role in addressing catastrophic forgetting and data privacy problems. However, prior approaches lack the crucial synergy between DFKT and the model training phases, causing DFKT to encounter difficulties in generating high-quality data from a non-anchored latent space of the old task model. In this paper, we introduce LANDER (Label Text Centered Data-Free Knowledge Transfer) to address this issue by utilizing label text embeddings (LTE) produced by pretrained language models. Specifically, during the model training phase, our approach treats LTE as anchor points and constrains the feature embeddings of corresponding training samples around them, enriching the surrounding area with more meaningful information. In the DFKT phase, by using these LTE anchors, LANDER can synthesize more meaningful samples, thereby effectively addressing the forgetting problem. Additionally, instead of tightly constraining embeddings toward the anchor, the Bounding Loss is introduced to encourage sample embeddings to remain flexible within a defined radius. This approach preserves the natural differences in sample embeddings and mitigates the embedding overlap caused by heterogeneous federated settings. Extensive experiments conducted on CIFAR100, Tiny-ImageNet, and ImageNet demonstrate that LANDER significantly outperforms previous methods and achieves state-of-the-art performance in FCIL. The code is available at https://github.com/tmtuan1307/lander. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.13204 [pdf, other]

Diversity-Aware Agnostic Ensemble of Sharpness Minimizers

Authors: Anh Bui, Vy Vo, Tung Pham, Dinh Phung, Trung Le

Abstract: There has long been plenty of theoretical and empirical evidence supporting the success of ensemble learning. Deep ensembles in particular take advantage of training randomness and expressivity of individual neural networks to gain prediction diversity, ultimately leading to better generalization, robustness and uncertainty estimation. In respect of generalization, it is found that pursuing wider… ▽ More There has long been plenty of theoretical and empirical evidence supporting the success of ensemble learning. Deep ensembles in particular take advantage of training randomness and expressivity of individual neural networks to gain prediction diversity, ultimately leading to better generalization, robustness and uncertainty estimation. In respect of generalization, it is found that pursuing wider local minima result in models being more robust to shifts between training and testing sets. A natural research question arises out of these two approaches as to whether a boost in generalization ability can be achieved if ensemble learning and loss sharpness minimization are integrated. Our work investigates this connection and proposes DASH - a learning algorithm that promotes diversity and flatness within deep ensembles. More concretely, DASH encourages base learners to move divergently towards low-loss regions of minimal sharpness. We provide a theoretical backbone for our method along with extensive empirical evidence demonstrating an improvement in ensemble generalizability. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12326 [pdf, other]

Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts

Authors: Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung

Abstract: Generative models have demonstrated remarkable potential in generating visually impressive content from textual descriptions. However, training these models on unfiltered internet data poses the risk of learning and subsequently propagating undesirable concepts, such as copyrighted or unethical content. In this paper, we propose a novel method to remove undesirable concepts from text-to-image gene… ▽ More Generative models have demonstrated remarkable potential in generating visually impressive content from textual descriptions. However, training these models on unfiltered internet data poses the risk of learning and subsequently propagating undesirable concepts, such as copyrighted or unethical content. In this paper, we propose a novel method to remove undesirable concepts from text-to-image generative models by incorporating a learnable prompt into the cross-attention module. This learnable prompt acts as additional memory to transfer the knowledge of undesirable concepts into it and reduce the dependency of these concepts on the model parameters and corresponding textual inputs. Because of this knowledge transfer into the prompt, erasing these undesirable concepts is more stable and has minimal negative impact on other concepts. We demonstrate the effectiveness of our method on the Stable Diffusion model, showcasing its superiority over state-of-the-art erasure methods in terms of removing undesirable content while preserving other unrelated elements. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.05894 [pdf, other]

Frequency Attention for Knowledge Distillation

Authors: Cuong Pham, Van-Anh Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do

Abstract: Knowledge distillation is an attractive approach for learning compact deep neural networks, which learns a lightweight student model by distilling knowledge from a complex teacher model. Attention-based knowledge distillation is a specific form of intermediate feature-based knowledge distillation that uses attention mechanisms to encourage the student to better mimic the teacher. However, most of… ▽ More Knowledge distillation is an attractive approach for learning compact deep neural networks, which learns a lightweight student model by distilling knowledge from a complex teacher model. Attention-based knowledge distillation is a specific form of intermediate feature-based knowledge distillation that uses attention mechanisms to encourage the student to better mimic the teacher. However, most of the previous attention-based distillation approaches perform attention in the spatial domain, which primarily affects local regions in the input image. This may not be sufficient when we need to capture the broader context or global information necessary for effective knowledge transfer. In frequency domain, since each frequency is determined from all pixels of the image in spatial domain, it can contain global information about the image. Inspired by the benefits of the frequency domain, we propose a novel module that functions as an attention mechanism in the frequency domain. The module consists of a learnable global filter that can adjust the frequencies of student's features under the guidance of the teacher's features, which encourages the student's features to have patterns similar to the teacher's features. We then propose an enhanced knowledge review-based distillation model by leveraging the proposed frequency attention module. The extensive experiments with various teacher and student architectures on image classification and object detection benchmark datasets show that the proposed approach outperforms other knowledge distillation methods. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Appear to WACV 2024

arXiv:2402.17834 [pdf, other]

Stable LM 2 1.6B Technical Report

Authors: Marco Bellagente, Jonathan Tow, Dakota Mahan, Duy Phung, Maksym Zhuravinskyi, Reshinth Adithyan, James Baicoianu, Ben Brooks, Nathan Cooper, Ashish Datta, Meng Lee, Emad Mostaque, Michael Pieler, Nikhil Pinnaparju, Paulo Rocha, Harry Saini, Hannah Teufel, Niccolo Zanichelli, Carlos Riquelme

Abstract: We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including z… ▽ More We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 23 pages, 6 figures

arXiv:2402.15255 [pdf, other]

Optimal Transport for Structure Learning Under Missing Data

Authors: Vy Vo, He Zhao, Trung Le, Edwin V. Bonilla, Dinh Phung

Abstract: Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among variables. Merely filling in missing values with existing imputation methods and subsequently applying structure learning on the complete data is empirically show… ▽ More Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among variables. Merely filling in missing values with existing imputation methods and subsequently applying structure learning on the complete data is empirically shown to be sub-optimal. To address this problem, we propose a score-based algorithm for learning causal structures from missing data based on optimal transport. This optimal transport viewpoint diverges from existing score-based approaches that are dominantly based on expectation maximization. We formulate structure learning as a density fitting problem, where the goal is to find the causal model that induces a distribution of minimum Wasserstein distance with the observed data distribution. Our framework is shown to recover the true causal graphs more effectively than competing methods in most simulations and real-data settings. Empirical evidence also shows the superior scalability of our approach, along with the flexibility to incorporate any off-the-shelf causal discovery methods for complete data. △ Less

Submitted 1 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Journal ref: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

arXiv:2402.11199 [pdf, other]

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Authors: Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

Abstract: Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizi… ▽ More Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizing knowledge graphs (KGs). We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT. Through experiments conducted on 5 different families of LLMs across 2 multi-hop question-answering datasets, we find that LLMs possess sufficient knowledge to perform reasoning. However, there exists a significant disparity between answer accuracy and faithfulness of the CoT reasoning generated by LLMs, indicating that they often arrive at correct answers through incorrect reasoning. △ Less

Submitted 19 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: Minh-Vuong Nguyen and Linhao Luo are co-first authors and contributed equally to the preparation of this manuscript. Accepted to ACL24-Findings

arXiv:2402.08251 [pdf, other]

doi 10.1109/SII58957.2024.10417611

Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles

Authors: Minh Dang Tu, Kieu Trang Le, Manh Duong Phung

Abstract: This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the us… ▽ More This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the use of a sliding window and a transformer to increase the information fed into the prediction head. The prediction head carries out the detection by evaluating feature maps with the Sigmoid function. The use of transformers with attention and sliding windows increases recognition accuracy while kee** the model at a reasonable number of parameters and computation requirements for embedded systems. Experiments conducted on public dataset VEDAI and our collected datasets show that our model has a higher accuracy than state-of-the-art methods such as ResNet, Faster RCNN, ComNet, ViT, YOLOv5, SMPNet, and DPNetV3. Experiments on the embedded computer Jetson AGX show that our model achieves a real-time computation speed with a stability rate of over 90%. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)

arXiv:2402.08246 [pdf, other]

doi 10.1109/SII58957.2024.10417512

Ant Colony Optimization for Cooperative Inspection Path Planning Using Multiple Unmanned Aerial Vehicles

Authors: Duy Nam Bui, Thuy Ngan Duong, Manh Duong Phung

Abstract: This paper presents a new swarm intelligence-based approach to deal with the cooperative path planning problem of unmanned aerial vehicles (UAVs), which is essential for the automatic inspection of infrastructure. The approach uses a 3D model of the structure to generate viewpoints for the UAVs. The calculation of the viewpoints considers the constraints related to the UAV formation model, camera… ▽ More This paper presents a new swarm intelligence-based approach to deal with the cooperative path planning problem of unmanned aerial vehicles (UAVs), which is essential for the automatic inspection of infrastructure. The approach uses a 3D model of the structure to generate viewpoints for the UAVs. The calculation of the viewpoints considers the constraints related to the UAV formation model, camera parameters, and requirements for data post-processing. The viewpoints are then used as input to formulate the path planning as an extended traveling salesman problem and the definition of a new cost function. Ant colony optimization is finally used to solve the problem to yield optimal inspection paths. Experiments with 3D models of real structures have been conducted to evaluate the performance of the proposed approach. The results show that our system is not only capable of generating feasible inspection paths for UAVs but also reducing the path length by 29.47\% for complex structures when compared with another heuristic approach. The source code of the algorithm can be found at https://github.com/duynamrcv/aco_3d_ipp. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)

arXiv:2402.08245 [pdf, other]

doi 10.1109/SII58957.2024.10417519

Self-Reconfigurable V-shape Formation of Multiple UAVs in Narrow Space Environments

Authors: Duy Nam Bui, Manh Duong Phung, Hung Pham Duy

Abstract: This paper presents the design and implementation of a self-reconfigurable V-shape formation controller for multiple unmanned aerial vehicles (UAVs) navigating through narrow spaces in a dense obstacle environment. The selection of the V-shape formation is motivated by its maneuverability and visibility advantages. The main objective is to develop an effective formation control strategy that allow… ▽ More This paper presents the design and implementation of a self-reconfigurable V-shape formation controller for multiple unmanned aerial vehicles (UAVs) navigating through narrow spaces in a dense obstacle environment. The selection of the V-shape formation is motivated by its maneuverability and visibility advantages. The main objective is to develop an effective formation control strategy that allows UAVs to autonomously adjust their positions to form the desired formation while navigating through obstacles. To achieve this, we propose a distributed behavior-based control algorithm that combines the behaviors designed for individual UAVs so that they together navigate the UAVs to their desired positions. The reconfiguration process is automatic, utilizing individual UAV sensing within the formation, allowing for dynamic adaptations such as opening/closing wings or merging into a straight line. Simulation results show that the self-reconfigurable V-shape formation offers adaptability and effectiveness for UAV formations in complex operational scenarios. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Published in: 2024 IEEE/SICE International Symposium on System Integration (SII)

arXiv:2401.15952 [pdf, other]

A Class-aware Optimal Transport Approach with Higher-Order Moment Matching for Unsupervised Domain Adaptation

Authors: Tuan Nguyen, Van Nguyen, Trung Le, He Zhao, Quan Hung Tran, Dinh Phung

Abstract: Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we introduce a novel approach called class-aware optimal transport (OT), which measures the OT distance between a distribution over the source class-conditional distributions and a mixture of source and target data distribution. Our class-aware OT leverages a c… ▽ More Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we introduce a novel approach called class-aware optimal transport (OT), which measures the OT distance between a distribution over the source class-conditional distributions and a mixture of source and target data distribution. Our class-aware OT leverages a cost function that determines the matching extent between a given data example and a source class-conditional distribution. By optimizing this cost function, we find the optimal matching between target examples and source class-conditional distributions, effectively addressing the data and label shifts that occur between the two domains. To handle the class-aware OT efficiently, we propose an amortization solution that employs deep neural networks to formulate the transportation probabilities and the cost function. Additionally, we propose minimizing class-aware Higher-order Moment Matching (HMM) to align the corresponding class regions on the source and target domains. The class-aware HMM component offers an economical computational approach for accurately evaluating the HMM distance between the two distributions. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art baselines. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 18 pages

arXiv:2312.06710 [pdf, other]

Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning

Authors: Khanh Doan, Quyen Tran, Tung Lam Tran, Tuan Nguyen, Dinh Phung, Trung Le

Abstract: Mitigating catastrophic forgetting is a key hurdle in continual learning. Deep Generative Replay (GR) provides techniques focused on generating samples from prior tasks to enhance the model's memory capabilities using generative AI models ranging from Generative Adversarial Networks (GANs) to the more recent Diffusion Models (DMs). A major issue is the deterioration in the quality of generated dat… ▽ More Mitigating catastrophic forgetting is a key hurdle in continual learning. Deep Generative Replay (GR) provides techniques focused on generating samples from prior tasks to enhance the model's memory capabilities using generative AI models ranging from Generative Adversarial Networks (GANs) to the more recent Diffusion Models (DMs). A major issue is the deterioration in the quality of generated data compared to the original, as the generator continuously self-learns from its outputs. This degradation can lead to the potential risk of catastrophic forgetting (CF) occurring in the classifier. To address this, we propose the Gradient Projection Class-Prototype Conditional Diffusion Model (GPPDM), a GR-based approach for continual learning that enhances image quality in generators and thus reduces the CF in classifiers. The cornerstone of GPPDM is a learnable class prototype that captures the core characteristics of images in a given class. This prototype, integrated into the diffusion model's denoising process, ensures the generation of high-quality images of the old tasks, hence reducing the risk of CF in classifiers. Moreover, to further mitigate the CF of diffusion models, we propose a gradient projection technique tailored for the cross-attention layer of diffusion models to maximally maintain and preserve the representations of old task data in the current task as close as possible to their representations when they first arrived. Our empirical studies on diverse datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art models, highlighting its satisfactory ability to preserve image quality and enhance the model's memory retention. △ Less

Submitted 21 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.15414 [pdf, other]

KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All

Authors: Quyen Tran, Lam Tran, Khoat Than, Toan Tran, Dinh Phung, Trung Le

Abstract: Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when… ▽ More Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%. △ Less

Submitted 30 November, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.09671 [pdf, ps, other]

Robust Contrastive Learning With Theory Guarantee

Authors: Ngoc N. Tran, Lam Tran, Hoang Phan, Anh Bui, Tung Pham, Toan Tran, Dinh Phung, Trung Le

Abstract: Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyz… ▽ More Contrastive learning (CL) is a self-supervised training paradigm that allows us to extract meaningful features without any label information. A typical CL framework is divided into two phases, where it first tries to learn the features from unlabelled data, and then uses those features to train a linear classifier with the labeled data. While a fair amount of existing theoretical works have analyzed how the unsupervised loss in the first phase can support the supervised loss in the second phase, none has examined the connection between the unsupervised loss and the robust supervised loss, which can shed light on how to construct an effective unsupervised loss for the first phase of CL. To fill this gap, our work develops rigorous theories to dissect and identify which components in the unsupervised loss can help improve the robust supervised loss and conduct proper experiments to verify our findings. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 27 pages, 0 figures. arXiv admin note: text overlap with arXiv:2305.10252

arXiv:2311.02945 [pdf, ps, other]

PhoGPT: Generative Pre-training for Vietnamese

Authors: Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui

Abstract: We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat vari… ▽ More We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat variant, PhoGPT-4B-Chat, is the modeling output obtained by fine-tuning PhoGPT-4B on a dataset of 70K instructional prompts and their responses, along with an additional 290K conversations. In addition, we also demonstrate its superior performance compared to previous open-source models. Our PhoGPT models are available at: https://github.com/VinAIResearch/PhoGPT △ Less

Submitted 22 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: PhoGPT-4B Technical Report - 5 pages

arXiv:2310.11638 [pdf, other]

Systematic Assessment of Factual Knowledge in Large Language Models

Authors: Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

Abstract: Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging… ▽ More Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context. △ Less

Submitted 30 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP 2023 Findings

arXiv:2310.01176 [pdf, other]

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

Authors: Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung

Abstract: Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness… ▽ More Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data. Existing semi-supervised segmentation methods are usually based on the smoothness assumption. This assumption implies that the model output distributions of two similar data samples are encouraged to be invariant. In other words, the smoothness assumption states that similar samples (e.g., adding small perturbations to an image) should have similar outputs. In this paper, we introduce a novel cross-adversarial local distribution (Cross-ALD) regularization to further enhance the smoothness assumption for semi-supervised medical image segmentation task. We conducted comprehensive experiments that the Cross-ALD archives state-of-the-art performance against many recent methods on the public LA and ACDC datasets. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: MICCAI 2023

arXiv:2310.00258 [pdf, other]

NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation

Authors: Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Quan Hung Tran, Dinh Phung

Abstract: Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models s… ▽ More Data-Free Knowledge Distillation (DFKD) has made significant recent strides by transferring knowledge from a teacher neural network to a student neural network without accessing the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in prolonging training times and low-quality outputs. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the random source from the input to a noisy layer and utilizes the meaningful constant label-text embedding (LTE) as the input. LTE is generated by using the language model once, and then it is stored in memory for all subsequent training processes. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches. The code is available at https://github.com/tmtuan1307/nayer. △ Less

Submitted 21 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: Accepted at CVPR 2024

arXiv:2309.17215 [pdf, other]

RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Authors: Tuan Truong, Hoang-Phi Nguyen, Tung Pham, Minh-Tuan Tran, Mehrtash Harandi, Dinh Phung, Trung Le

Abstract: Nowadays, understanding the geometry of the loss landscape shows promise in enhancing a model's generalization ability. In this work, we draw upon prior works that apply geometric principles to optimization and present a novel approach to improve robustness and generalization ability for constrained optimization problems. Indeed, this paper aims to generalize the Sharpness-Aware Minimization (SAM)… ▽ More Nowadays, understanding the geometry of the loss landscape shows promise in enhancing a model's generalization ability. In this work, we draw upon prior works that apply geometric principles to optimization and present a novel approach to improve robustness and generalization ability for constrained optimization problems. Indeed, this paper aims to generalize the Sharpness-Aware Minimization (SAM) optimizer to Riemannian manifolds. In doing so, we first extend the concept of sharpness and introduce a novel notion of sharpness on manifolds. To support this notion of sharpness, we present a theoretical analysis characterizing generalization capabilities with respect to manifold sharpness, which demonstrates a tighter bound on the generalization gap, a result not known before. Motivated by this analysis, we introduce our algorithm, Riemannian Sharpness-Aware Minimization (RSAM). To demonstrate RSAM's ability to enhance generalization ability, we evaluate and contrast our algorithm on a broad set of problems, such as image classification and contrastive learning across different datasets, including CIFAR100, CIFAR10, and FGVCAircraft. Our code is publicly available at \url{https://t.ly/RiemannianSAM}. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.15491 [pdf, ps, other]

An optimal spectral inequality for degenerate operators

Authors: Rémi Buffe, Kim Dang Phung, Amine Slimani

Abstract: In this paper we establish a Lebeau-Robbiano spectral inequality for a degenerate one dimensional elliptic operator. Carleman techniques and moment method are combined. Application to null controllability on a measurable set in time for the degenerated heat equation is described. In this paper we establish a Lebeau-Robbiano spectral inequality for a degenerate one dimensional elliptic operator. Carleman techniques and moment method are combined. Application to null controllability on a measurable set in time for the degenerated heat equation is described. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2307.12564 [pdf, other]

Towards Generalising Neural Topical Representations

Authors: Xiaohao Yang, He Zhao, Dinh Phung, Lan Du

Abstract: Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalisation ability across corpora has yet to be studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation (i.e., late… ▽ More Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalisation ability across corpora has yet to be studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation (i.e., latent distribution over topics) for the document from different target corpora to a certain degree. In this work, we aim to improve NTMs further so that their representation power for documents generalises reliably across corpora and tasks. To do so, we propose to enhance NTMs by narrowing the semantic distance between similar documents, with the underlying assumption that documents from different corpora may share similar semantics. Specifically, we obtain a similar document for each training document by text data augmentation. Then, we optimise NTMs further by minimising the semantic distance between each pair, measured by the Topical Optimal Transport (TopicalOT) distance, which computes the optimal transport distance between their topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora. Our code and datasets are available at: https://github.com/Xiaohao-Yang/Topic_Model_Generalisation. △ Less

Submitted 12 June, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2306.06109 [pdf, other]

Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities

Authors: Michael Fu, Trung Le, Van Nguyen, Chakkrit Tantithamthavorn, Dinh Phung

Abstract: Deep learning (DL) models have become increasingly popular in identifying software vulnerabilities. Prior studies found that vulnerabilities across different vulnerable programs may exhibit similar vulnerable scopes, implicitly forming discernible vulnerability patterns that can be learned by DL models through supervised training. However, vulnerable scopes still manifest in various spatial locati… ▽ More Deep learning (DL) models have become increasingly popular in identifying software vulnerabilities. Prior studies found that vulnerabilities across different vulnerable programs may exhibit similar vulnerable scopes, implicitly forming discernible vulnerability patterns that can be learned by DL models through supervised training. However, vulnerable scopes still manifest in various spatial locations and formats within a program, posing challenges for models to accurately identify vulnerable statements. Despite this challenge, state-of-the-art vulnerability detection approaches fail to exploit the vulnerability patterns that arise in vulnerable programs. To take full advantage of vulnerability patterns and unleash the ability of DL models, we propose a novel vulnerability-matching approach in this paper, drawing inspiration from program analysis tools that locate vulnerabilities based on pre-defined patterns. Specifically, a vulnerability codebook is learned, which consists of quantized vectors representing various vulnerability patterns. During inference, the codebook is iterated to match all learned patterns and predict the presence of potential vulnerabilities within a given program. Our approach was extensively evaluated on a real-world dataset comprising more than 188,000 C/C++ functions. The evaluation results show that our approach achieves an F1-score of 94% (6% higher than the previous best) and 82% (19% higher than the previous best) for function and statement-level vulnerability identification, respectively. These substantial enhancements highlight the effectiveness of our approach to identifying vulnerabilities. The training code and pre-trained models are available at https://github.com/optimatch/optimatch. △ Less

Submitted 26 May, 2023; originally announced June 2023.

arXiv:2306.04178 [pdf, other]

Optimal Transport Model Distributional Robustness

Authors: Van-Anh Nguyen, Trung Le, Anh Tuan Bui, Thanh-Toan Do, Dinh Phung

Abstract: Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in the data space. In this work, we explore an optimal transport-based distributional robustness framework in model spaces. Specifically, we examine a model dist… ▽ More Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in the data space. In this work, we explore an optimal transport-based distributional robustness framework in model spaces. Specifically, we examine a model distribution within a Wasserstein ball centered on a given model distribution that maximizes the loss. We have developed theories that enable us to learn the optimal robust center model distribution. Interestingly, our developed theories allow us to flexibly incorporate the concept of sharpness awareness into training, whether it's a single model, ensemble models, or Bayesian Neural Networks, by considering specific forms of the center model distribution. These forms include a Dirac delta distribution over a single model, a uniform distribution over several models, and a general Bayesian Neural Network. Furthermore, we demonstrate that Sharpness-Aware Minimization (SAM) is a specific case of our framework when using a Dirac delta distribution over a single model, while our framework can be seen as a probabilistic extension of SAM. To validate the effectiveness of our framework in the aforementioned settings, we conducted extensive experiments, and the results reveal remarkable improvements compared to the baselines. △ Less

Submitted 1 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted at NeurIPs 2023

Journal ref: Advances in Neural Information Processing Systems, 2023

arXiv:2305.16615 [pdf, other]

AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing Software Vulnerabilities

Authors: Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Yuki Kume, Van Nguyen, Dinh Phung, John Grundy

Abstract: Many ML-based approaches have been proposed to automatically detect, localize, and repair software vulnerabilities. While ML-based methods are more effective than program analysis-based vulnerability analysis tools, few have been integrated into modern IDEs, hindering practical adoption. To bridge this critical gap, we propose AIBugHunter, a novel ML-based software vulnerability analysis tool for… ▽ More Many ML-based approaches have been proposed to automatically detect, localize, and repair software vulnerabilities. While ML-based methods are more effective than program analysis-based vulnerability analysis tools, few have been integrated into modern IDEs, hindering practical adoption. To bridge this critical gap, we propose AIBugHunter, a novel ML-based software vulnerability analysis tool for C/C++ languages that is integrated into Visual Studio Code. AIBugHunter helps software developers to achieve real-time vulnerability detection, explanation, and repairs during programming. In particular, AIBugHunter scans through developers' source code to (1) locate vulnerabilities, (2) identify vulnerability types, (3) estimate vulnerability severity, and (4) suggest vulnerability repairs. In this article, we propose a novel multi-objective optimization (MOO)-based vulnerability classification approach and a transformer-based estimation approach to help AIBugHunter accurately identify vulnerability types and estimate severity. Our empirical experiments on a large dataset consisting of 188K+ C/C++ functions confirm that our proposed approaches are more accurate than other state-of-the-art baseline methods for vulnerability classification and estimation. Furthermore, we conduct qualitative evaluations including a survey study and a user study to obtain software practitioners' perceptions of our AIBugHunter tool and assess the impact that AIBugHunter may have on developers' productivity in security aspects. Our survey study shows that our AIBugHunter is perceived as useful where 90% of the participants consider adopting our AIBugHunter. Last but not least, our user study shows that our AIBugHunter could possibly enhance developers' productivity in combating cybersecurity issues during software development. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 34 pages, Accepted at Empirical Software Engineering Journal

Journal ref: Empirical Software Engineering (EMSE), 2023

arXiv:2305.15927 [pdf, other]

Parameter Estimation in DAGs from Incomplete Data via Optimal Transport

Authors: Vy Vo, Trung Le, Tung-Long Vuong, He Zhao, Edwin Bonilla, Dinh Phung

Abstract: Estimating the parameters of a probabilistic directed graphical model from incomplete data is a long-standing challenge. This is because, in the presence of latent variables, both the likelihood function and posterior distribution are intractable without assumptions about structural dependencies or model classes. While existing learning methods are fundamentally based on likelihood maximization, h… ▽ More Estimating the parameters of a probabilistic directed graphical model from incomplete data is a long-standing challenge. This is because, in the presence of latent variables, both the likelihood function and posterior distribution are intractable without assumptions about structural dependencies or model classes. While existing learning methods are fundamentally based on likelihood maximization, here we offer a new view of the parameter learning problem through the lens of optimal transport. This perspective licenses a general framework that operates on any directed graphs without making unrealistic assumptions on the posterior over the latent variables or resorting to variational approximations. We develop a theoretical framework and support it with extensive empirical evidence demonstrating the versatility and robustness of our approach. Across experiments, we show that not only can our method effectively recover the ground-truth parameters but it also performs comparably or better than competing baselines on downstream applications. △ Less

Submitted 1 June, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

arXiv:2305.10252 [pdf, other]

Sharpness & Shift-Aware Self-Supervised Learning

Authors: Ngoc N. Tran, Son Duong, Hoang Phan, Tung Pham, Dinh Phung, Trung Le

Abstract: Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. In this paper, we consider classification as a downstream task in phase 2 and develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. Our theories signify that sharpness-aware feature extractors benefit the classification ta… ▽ More Self-supervised learning aims to extract meaningful features from unlabeled data for further downstream tasks. In this paper, we consider classification as a downstream task in phase 2 and develop rigorous theories to realize the factors that implicitly influence the general loss of this classification task. Our theories signify that sharpness-aware feature extractors benefit the classification task in phase 2 and the existing data shift between the ideal (i.e., the ideal one used in theory development) and practical (i.e., the practical one used in implementation) distributions to generate positive pairs also remarkably affects this classification task. Further harvesting these theoretical findings, we propose to minimize the sharpness of the feature extractor and a new Fourier-based data augmentation technique to relieve the data shift in the distributions generating positive pairs, reaching Sharpness & Shift-Aware Contrastive Learning (SSA-CLR). We conduct extensive experiments to verify our theoretical findings and demonstrate that sharpness & shift-aware contrastive learning can remarkably boost the performance as well as obtaining more robust extracted features compared with the baselines. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.03923 [pdf, other]

Active Continual Learning: On Balancing Knowledge Retention and Learnability

Authors: Thuy-Trang Vu, Shahram Khadivi, Mahsa Ghorbanali, Dinh Phung, Gholamreza Haffari

Abstract: Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to the CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active… ▽ More Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to the CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active learning (AL) tasks, where each incoming task includes a pool of unlabelled data and an annotation budget. We investigate the effectiveness and interplay between several AL and CL algorithms in the domain, class and task-incremental scenarios. Our experiments reveal the trade-off between two contrasting goals of not forgetting the old knowledge and the ability to quickly learn new knowledge in CL and AL, respectively. While conditioning the AL query strategy on the annotations collected for the previous tasks leads to improved task performance on the domain and task incremental learning, our proposed forgetting-learning profile suggests a gap in balancing the effect of AL and CL for the class-incremental scenario. △ Less

Submitted 30 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2304.13229 [pdf, other]

Generating Adversarial Examples with Task Oriented Multi-Objective Optimization

Authors: Anh Bui, Trung Le, He Zhao, Quan Tran, Paul Montague, Dinh Phung

Abstract: Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that… ▽ More Deep learning models, even the-state-of-the-art ones, are highly vulnerable to adversarial examples. Adversarial training is one of the most efficient methods to improve the model's robustness. The key factor for the success of adversarial training is the capability to generate qualified and divergent adversarial examples which satisfy some objectives/goals (e.g., finding adversarial examples that maximize the model losses for simultaneously attacking multiple models). Therefore, multi-objective optimization (MOO) is a natural tool for adversarial example generation to achieve multiple objectives/goals simultaneously. However, we observe that a naive application of MOO tends to maximize all objectives/goals equally, without caring if an objective/goal has been achieved yet. This leads to useless effort to further improve the goal-achieved tasks, while putting less focus on the goal-unachieved tasks. In this paper, we propose \emph{Task Oriented MOO} to address this issue, in the context where we can explicitly define the goal achievement for a task. Our principle is to only maintain the goal-achieved tasks, while letting the optimizer spend more effort on improving the goal-unachieved tasks. We conduct comprehensive experiments for our Task Oriented MOO on various adversarial example generation schemes. The experimental results firmly demonstrate the merit of our proposed approach. Our code is available at \url{https://github.com/tuananhbui89/TAMOO}. △ Less

Submitted 1 June, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.10764 [pdf, other]

Hyperbolic Geometry in Computer Vision: A Survey

Authors: Pengfei Fang, Mehrtash Harandi, Trung Le, Dinh Phung

Abstract: Hyperbolic geometry, a Riemannian manifold endowed with constant sectional negative curvature, has been considered an alternative embedding space in many learning scenarios, \eg, natural language processing, graph learning, \etc, as a result of its intriguing property of encoding the data's hierarchical structure (like irregular graph or tree-likeness data). Recent studies prove that such data hie… ▽ More Hyperbolic geometry, a Riemannian manifold endowed with constant sectional negative curvature, has been considered an alternative embedding space in many learning scenarios, \eg, natural language processing, graph learning, \etc, as a result of its intriguing property of encoding the data's hierarchical structure (like irregular graph or tree-likeness data). Recent studies prove that such data hierarchy also exists in the visual dataset, and investigate the successful practice of hyperbolic geometry in the computer vision (CV) regime, ranging from the classical image classification to advanced model adaptation learning. This paper presents the first and most up-to-date literature review of hyperbolic spaces for CV applications. To this end, we first introduce the background of hyperbolic geometry, followed by a comprehensive investigation of algorithms, with geometric prior of hyperbolic space, in the context of visual applications. We also conclude this manuscript and identify possible future directions. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: First survey paper for the hyperbolic geometry in CV applications

arXiv:2304.00483 [pdf, other]

doi 10.18653/v1/2023.insights-1.3

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

Authors: Iva Bojic, Josef Halim, Verena Suharman, Sreeja Tar, Qi Chwen Ong, Duy Phung, Mathieu Ravaut, Shafiq Joty, Josip Car

Abstract: Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their… ▽ More Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets. We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33%/40% for fine-tuning of retrieval/reader models on the BioASQ dataset when using back translation to enhance the original dataset quality. △ Less

Submitted 26 May, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Journal ref: 2023.In The Fourth Workshop on Insights from Negative Results in NLP, pages 19-32, Dubrovnik, Croatia. Association for Computational Linguistics

arXiv:2302.05917 [pdf, other]

Vector Quantized Wasserstein Auto-Encoder

Authors: Tung-Long Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung

Abstract: Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most of work in learning deep discrete representations has mainly focused on improving the original VQ-VAE form and none of them has studied learning deep discrete… ▽ More Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most of work in learning deep discrete representations has mainly focused on improving the original VQ-VAE form and none of them has studied learning deep discrete representations from the generative viewpoint. In this work, we study learning deep discrete representations from the generative viewpoint. Specifically, we endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution via minimizing a WS distance between them. We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution. Finally, we empirically evaluate our method on several well-known benchmarks, where it achieves better qualitative and quantitative performances than the other VQ-VAE variants in terms of the codebook utilization and image reconstruction/generation. △ Less

Submitted 17 June, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

arXiv:2302.02713 [pdf, other]

Flat Seeking Bayesian Neural Networks

Authors: Van-Anh Nguyen, Tung-Long Vuong, Hoang Phan, Thanh-Toan Do, Dinh Phung, Trung Le

Abstract: Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with l… ▽ More Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with lower sharpness have better generalization ability. However, existing posterior inferences are not aware of sharpness/flatness in terms of formulation, possibly leading to high sharpness for the models sampled from them. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness, hence possibly possessing higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest. △ Less

Submitted 6 November, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: Accepted at NeurIPS 2023

Journal ref: Advances in Neural Information Processing Systems, 2023

arXiv:2301.07336 [pdf, other]

Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation

Authors: Son Duy Dao, Hengcan Shi, Dinh Phung, Jianfei Cai

Abstract: Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation. However, the use of a `background' embedding during training in these methods is problematic as the resulting model tends to over-learn and assign all unseen classes as the background class instead of their correct labels. Furthermore, they ignore the semantic relationship of text embeddings… ▽ More Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation. However, the use of a `background' embedding during training in these methods is problematic as the resulting model tends to over-learn and assign all unseen classes as the background class instead of their correct labels. Furthermore, they ignore the semantic relationship of text embeddings, which arguably can be highly informative for zero-shot prediction as seen classes may have close relationship with unseen classes. To this end, this paper proposes novel class enhancement losses to bypass the use of the background embbedding during training, and simultaneously exploit the semantic relationship between text embeddings and mask proposals by ranking the similarity scores. To further capture the relationship between seen and unseen classes, we propose an effective pseudo label generation pipeline using pretrained vision-language model. Extensive experiments on several benchmark datasets show that our method achieves overall the best performance for zero-shot semantic segmentation. Our method is flexible, and can also be applied to the challenging open-vocabulary semantic segmentation problem. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.13218 [pdf, other]

Multisensor Data Fusion for Reliable Obstacle Avoidance

Authors: Thanh Nguyen Canh, Truong Son Nguyen, Cong Hoang Quach, Xiem HoangVan, Manh Duong Phung

Abstract: In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection… ▽ More In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: In the 11th International Conference on Control, Automation and Information Sciences (ICCAIS 2022), Hanoi, Vietnam

arXiv:2212.13209 [pdf, other]

Deployment of UAVs for Optimal Multihop Ad-hoc Networks Using Particle Swarm Optimization and Behavior-based Control

Authors: Ngan Duong Thi Thuy, Duy Nam Bui, Manh Duong Phung, Hung Pham Duy

Abstract: This study proposes an approach for establishing an optimal multihop ad-hoc network using multiple unmanned aerial vehicles (UAVs) to provide emergency communication in disaster areas. The approach includes two stages, one uses particle swarm optimization (PSO) to find optimal positions to deploy UAVs, and the other uses a behavior-based controller to navigate the UAVs to their assigned positions… ▽ More This study proposes an approach for establishing an optimal multihop ad-hoc network using multiple unmanned aerial vehicles (UAVs) to provide emergency communication in disaster areas. The approach includes two stages, one uses particle swarm optimization (PSO) to find optimal positions to deploy UAVs, and the other uses a behavior-based controller to navigate the UAVs to their assigned positions without colliding with obstacles in an unknown environment. Several constraints related to the UAVs' sensing and communication ranges have been imposed to ensure the applicability of the proposed approach in real-world scenarios. A number of simulation experiments with data loaded from real environments have been conducted. The results show that our proposed approach is not only successful in establishing multihop ad-hoc routes but also meets the requirements for real-time deployment of UAVs. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: In the 11th International Conference on Control, Automation and Information Sciences (ICCAIS 2022), Hanoi, Vietnam

arXiv:2212.03069 [pdf, other]

Multiple Perturbation Attack: Attack Pixelwise Under Different $\ell_p$-norms For Better Adversarial Performance

Authors: Ngoc N. Tran, Anh Tuan Bui, Dinh Phung, Trung Le

Abstract: Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to maliciou… ▽ More Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually. △ Less

Submitted 7 December, 2022; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: 18 pages, 8 figures, 7 tables

arXiv:2211.16780 [pdf, other]

Continual Learning with Optimal Transport based Mixture Model

Authors: Quyen Tran, Hoang Phan, Khoat Than, Dinh Phung, Trung Le

Abstract: Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incomi… ▽ More Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incoming data stream of a class is naturally multimodal. To address this issue, in this work, we first propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM). Specifically, the centroids and covariance matrices of the mixture model are adapted incrementally according to incoming data streams. The advantages are two-fold: (i) we can characterize more accurately complex data streams and (ii) by using centroids for each class produced by OT-MM, we can estimate the similarity of an unseen example to each class more reasonably when doing inference. Moreover, to combat the catastrophic forgetting in the CIL scenario, we further propose Dynamic Preservation. Particularly, after performing the dynamic preservation technique across data streams, the latent representations of the classes in the old and new tasks become more condensed themselves and more separate from each other. Together with a contraction feature extractor, this technique facilitates the model in mitigating the catastrophic forgetting. The experimental results on real-world datasets show that our proposed method can significantly outperform the current state-of-the-art baselines. △ Less

Submitted 5 December, 2022; v1 submitted 30 November, 2022; originally announced November 2022.

arXiv:2211.13723 [pdf, other]

Improving Multi-task Learning via Seeking Task-based Flat Regions

Authors: Hoang Phan, Lam Tran, Ngoc N. Tran, Nhat Ho, Dinh Phung, Trung Le

Abstract: Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a… ▽ More Multi-Task Learning (MTL) is a widely-used and powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone. Compared to training tasks separately, MTL significantly reduces computational costs, improves data efficiency, and potentially enhances model performance by leveraging knowledge across tasks. Hence, it has been adopted in a variety of applications, ranging from computer vision to natural language processing and speech recognition. Among them, there is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction to benefit all tasks. Despite achieving impressive results on many benchmarks, directly applying these approaches without using appropriate regularization techniques might lead to suboptimal solutions on real-world problems. In particular, standard training that minimizes the empirical loss on the training data can easily suffer from overfitting to low-resource tasks or be spoiled by noisy-labeled ones, which can cause negative transfer between tasks and overall performance drop. To alleviate such problems, we propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning. Accordingly, we present a novel MTL training methodology, encouraging the model to find task-based flat minima for coherently improving its generalization capability on all tasks. Finally, we conduct comprehensive experiments on a variety of applications to demonstrate the merit of our proposed approach to existing gradient-based MTL methods, as suggested by our developed theory. △ Less

Submitted 29 September, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: 35 pages, 17 figures, 7 tables

arXiv:2210.15930 [pdf, other]

doi 10.1007/s42405-022-00545-5

Lyapunov-based Nonlinear Model Predictive Control for Attitude Trajectory Tracking of Unmanned Aerial Vehicles

Authors: Duy Nam Bui, Thi Thanh Van Nguyen, Manh Duong Phung

Abstract: This paper presents a new Lyapunov-based nonlinear model predictive controller (LNMPC) for the attitude control problem of unmanned aerial vehicles (UAVs), which is essential for their functioning operation. The controller is designed based on a quadratic cost function integrating UAV dynamics and system constraints. An additional contraction constraint is then introduced to ensure closed-loop sys… ▽ More This paper presents a new Lyapunov-based nonlinear model predictive controller (LNMPC) for the attitude control problem of unmanned aerial vehicles (UAVs), which is essential for their functioning operation. The controller is designed based on a quadratic cost function integrating UAV dynamics and system constraints. An additional contraction constraint is then introduced to ensure closed-loop system stability. That constraint is fulfilled via a Lyapunov function derived from a sliding mode controller (SMC). The feasibility and stability of the LNMPC are finally proved. Simulation and comparison results show that the proposed controller guarantees the system stability and outperforms other state-of-the-art nonlinear controllers such as the backstep** controller (BSC) and SMC. In addition, the proposed controller can be integrated into an existing UAV model in the Gazebo simulator to perform software-in-the-loop tests. The results show that the LNMPC is better than the built-in PID controller of the UAV, which confirms the validity and applicability of our proposed approach. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Journal ref: International Journal of Aeronautical and Space Sciences, 2022

arXiv:2210.11628 [pdf, other]

Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation?

Authors: Thuy-Trang Vu, Shahram Khadivi, Xuanli He, Dinh Phung, Gholamreza Haffari

Abstract: Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-o… ▽ More Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-one-domain-out experiments show that multi-domain multilingual (MDML) NMT can boost zero-shot translation performance up to +10 gains on BLEU, as well as aid the generalisation of multi-domain NMT to the missing domain. We also explore strategies for effective integration of multilingual and multi-domain NMT, including language and domain tag combination and auxiliary task training. We find that learning domain-aware representations and adding target-language tags to the encoder leads to effective MDML-NMT. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: WMT2022

arXiv:2210.07646 [pdf, other]

Vision Transformer Visualization: What Neurons Tell and How Neurons Behave?

Authors: Van-Anh Nguyen, Khanh Pham Dinh, Long Tung Vuong, Thanh-Toan Do, Quan Hung Tran, Dinh Phung, Trung Le

Abstract: Recently vision transformers (ViT) have been applied successfully for various tasks in computer vision. However, important questions such as why they work or how they behave still remain largely unknown. In this paper, we propose an effective visualization technique, to assist us in exposing the information carried in neurons and feature embeddings across the ViT's layers. Our approach departs fro… ▽ More Recently vision transformers (ViT) have been applied successfully for various tasks in computer vision. However, important questions such as why they work or how they behave still remain largely unknown. In this paper, we propose an effective visualization technique, to assist us in exposing the information carried in neurons and feature embeddings across the ViT's layers. Our approach departs from the computational process of ViTs with a focus on visualizing the local and global information in input images and the latent feature embeddings at multiple levels. Visualizations at the input and embeddings at level 0 reveal interesting findings such as providing support as to why ViTs are rather generally robust to image occlusions and patch shuffling; or unlike CNNs, level 0 embeddings already carry rich semantic details. Next, we develop a rigorous framework to perform effective visualizations across layers, exposing the effects of ViTs filters and grou**/clustering behaviors to object patches. Finally, we provide comprehensive experiments on real datasets to qualitatively and quantitatively demonstrate the merit of our proposed methods as well as our findings. https://github.com/byM1902/ViT_visualization △ Less

Submitted 17 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: The first two authors contributed equally to this work. Our code is available at https://github.com/byM1902/ViT_visualization

Showing 1–50 of 201 results for author: Phung, D