-
UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages
Authors:
Trinh Pham,
Khoi M. Le,
Luu Anh Tuan
Abstract:
In this paper, we introduce UniBridge (Cross-Lingual Transfer Learning with Optimized Embeddings and Vocabulary), a comprehensive approach developed to improve the effectiveness of Cross-Lingual Transfer Learning, particularly in languages with limited resources. Our approach tackles two essential elements of a language model: the initialization of embeddings and the optimal vocabulary size. Speci…
▽ More
In this paper, we introduce UniBridge (Cross-Lingual Transfer Learning with Optimized Embeddings and Vocabulary), a comprehensive approach developed to improve the effectiveness of Cross-Lingual Transfer Learning, particularly in languages with limited resources. Our approach tackles two essential elements of a language model: the initialization of embeddings and the optimal vocabulary size. Specifically, we propose a novel embedding initialization method that leverages both lexical and semantic alignment for a language. In addition, we present a method for systematically searching for the optimal vocabulary size, ensuring a balance between model complexity and linguistic coverage. Our experiments across multilingual datasets show that our approach greatly improves the F1-Score in several languages. UniBridge is a robust and adaptable solution for cross-lingual systems in various languages, highlighting the significance of initializing embeddings and choosing the right vocabulary size in cross-lingual environments.
△ Less
Submitted 17 June, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension
Authors:
Khiem Le,
Zhichun Guo,
Kaiwen Dong,
Xiaobao Huang,
Bozhao Nan,
Roshni Iyer,
Xiangliang Zhang,
Olaf Wiest,
Wei Wang,
Nitesh V. Chawla
Abstract:
Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehend…
▽ More
Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e., SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by designing and equip** them with a multi-modal external module, namely MolX. In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. Moreover, a human-defined molecular fingerprint is incorporated to leverage its embedded domain knowledge. Then, to establish an alignment between MolX and the LLM's textual input space, the whole model in which the LLM is frozen, is pre-trained with a versatile strategy including a diverse set of tasks. Extensive experimental evaluations demonstrate that our proposed method only introduces a small number of trainable parameters while outperforming baselines on various downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM.
△ Less
Submitted 27 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation
Authors:
Ziyi Guo,
Dang K Le,
Zhenpeng Lin,
Kyle Zeng,
Ruoyu Wang,
Tiffany Bao,
Yan Shoshitaishvili,
Adam Doupé,
Xinyu Xing
Abstract:
Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation…
▽ More
Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the \sys model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to the community.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective
Authors:
Khiem Le,
Nhan Luong-Ha,
Manh Nguyen-Duc,
Danh Le-Phuoc,
Cuong Do,
Kok-Seng Wong
Abstract:
Federated Learning (FL) is a promising paradigm that offers significant advancements in privacy-preserving, decentralized machine learning by enabling collaborative training of models across distributed devices without centralizing data. However, the practical deployment of FL systems faces a significant bottleneck: the communication overhead caused by frequently exchanging large model updates bet…
▽ More
Federated Learning (FL) is a promising paradigm that offers significant advancements in privacy-preserving, decentralized machine learning by enabling collaborative training of models across distributed devices without centralizing data. However, the practical deployment of FL systems faces a significant bottleneck: the communication overhead caused by frequently exchanging large model updates between numerous devices and a central server. This communication inefficiency can hinder training speed, model performance, and the overall feasibility of real-world FL applications. In this survey, we investigate various strategies and advancements made in communication-efficient FL, highlighting their impact and potential to overcome the communication challenges inherent in FL systems. Specifically, we define measures for communication efficiency, analyze sources of communication inefficiency in FL systems, and provide a taxonomy and comprehensive review of state-of-the-art communication-efficient FL methods. Additionally, we discuss promising future research directions for enhancing the communication efficiency of FL systems. By addressing the communication bottleneck, FL can be effectively applied and enable scalable and practical deployment across diverse applications that require privacy-preserving, decentralized machine learning, such as IoT, healthcare, or finance.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Nudging Users to Change Breached Passwords Using the Protection Motivation Theory
Authors:
Yixin Zou,
Khue Le,
Peter Mayer,
Alessandro Acquisti,
Adam J. Aviv,
Florian Schaub
Abstract:
We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords. Our online experiment ($n$=$1,386$) compared the effectiveness of a threat appeal (highlighting negative consequences of breached passwords) and a co** appeal (providing instructions on how to change the breached password) in a 2x2 factorial design. Compared to the control condit…
▽ More
We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords. Our online experiment ($n$=$1,386$) compared the effectiveness of a threat appeal (highlighting negative consequences of breached passwords) and a co** appeal (providing instructions on how to change the breached password) in a 2x2 factorial design. Compared to the control condition, participants receiving the threat appeal were more likely to intend to change their passwords, and participants receiving both appeals were more likely to end up changing their passwords; both comparisons have a small effect size. Participants' password change behaviors are further associated with other factors such as their security attitudes (SA-6) and time passed since the breach, suggesting that PMT-based nudges are useful but insufficient to fully motivate users to change their passwords. Our study contributes to PMT's application in security research and provides concrete design implications for improving compromised credential notifications.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization
Authors:
Khiem Le,
Long Ho,
Cuong Do,
Danh Le-Phuoc,
Kok-Seng Wong
Abstract:
Domain shift is a formidable issue in Machine Learning that causes a model to suffer from performance degradation when tested on unseen domains. Federated Domain Generalization (FedDG) attempts to train a global model using collaborative clients in a privacy-preserving manner that can generalize well to unseen clients possibly with domain shift. However, most existing FedDG methods either cause ad…
▽ More
Domain shift is a formidable issue in Machine Learning that causes a model to suffer from performance degradation when tested on unseen domains. Federated Domain Generalization (FedDG) attempts to train a global model using collaborative clients in a privacy-preserving manner that can generalize well to unseen clients possibly with domain shift. However, most existing FedDG methods either cause additional privacy risks of data leakage or induce significant costs in client communication and computation, which are major concerns in the Federated Learning paradigm. To circumvent these challenges, here we introduce a novel architectural method for FedDG, namely gPerXAN, which relies on a normalization scheme working with a guiding regularizer. In particular, we carefully design Personalized eXplicitly Assembled Normalization to enforce client models selectively filtering domain-specific features that are biased towards local data while retaining discrimination of those features. Then, we incorporate a simple yet effective regularizer to guide these models in directly capturing domain-invariant representations that the global model's classifier can leverage. Extensive experimental results on two benchmark datasets, i.e., PACS and Office-Home, and a real-world medical dataset, Camelyon17, indicate that our proposed method outperforms other existing methods in addressing this particular problem.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models
Authors:
Tan Khang Le,
Saba Alimadadi,
Steven Y. Ko
Abstract:
In recent years, JavaScript has become the most widely used programming language, especially in web development. However, writing secure JavaScript code is not trivial, and programmers often make mistakes that lead to security vulnerabilities in web applications. Large Language Models (LLMs) have demonstrated substantial advancements across multiple domains, and their evolving capabilities indicat…
▽ More
In recent years, JavaScript has become the most widely used programming language, especially in web development. However, writing secure JavaScript code is not trivial, and programmers often make mistakes that lead to security vulnerabilities in web applications. Large Language Models (LLMs) have demonstrated substantial advancements across multiple domains, and their evolving capabilities indicate their potential for automatic code generation based on a required specification, including automatic bug fixing. In this study, we explore the accuracy of LLMs, namely ChatGPT and Bard, in finding and fixing security vulnerabilities in JavaScript programs. We also investigate the impact of context in a prompt on directing LLMs to produce a correct patch of vulnerable JavaScript code. Our experiments on real-world software vulnerabilities show that while LLMs are promising in automatic program repair of JavaScript code, achieving a correct bug fix often requires an appropriate amount of context in the prompt.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
ARtVista: Gateway To Empower Anyone Into Artist
Authors:
Trong-Vu Hoang,
Quang-Binh Nguyen,
Duy-Nam Ly,
Khanh-Duy Le,
Tam V. Nguyen,
Minh-Triet Tran,
Trung-Nghia Le
Abstract:
Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVis…
▽ More
Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVista not only recommends reference images aligned with users' abstract ideas and generates sketches for users to draw but also goes beyond, crafting vibrant paintings in various painting styles. ARtVista also offers users an alternative approach to create striking paintings by simulating the paint-by-number concept on reference images, empowering users to create visually stunning artwork devoid of the necessity for advanced drawing skills. We perform a pilot study and reveal positive feedback on its usability, emphasizing its effectiveness in visualizing user ideas and aiding the painting process to achieve stunning pictures without requiring advanced drawing skills. The source code will be available at https://github.com/htrvu/ARtVista.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer
Authors:
Dinh-Khoi Vo,
Duy-Nam Ly,
Khanh-Duy Le,
Tam V. Nguyen,
Minh-Triet Tran,
Trung-Nghia Le
Abstract:
Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an…
▽ More
Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an interactive CONcept TRAnsfer system. With a user-friendly interface, iCONTRA enables both experienced designers and novices to effortlessly explore creative design concepts and efficiently generate thematic collections. We also propose a zero-shot image editing algorithm, eliminating the need for fine-tuning models, which gradually integrates information from initial objects, ensuring consistency in the generation process without influencing the background. A pilot study suggests iCONTRA's potential to reduce designers' efforts. Experimental results demonstrate its effectiveness in producing consistent and high-quality object concept transfers. iCONTRA stands as a promising tool for innovation and creative exploration in thematic collection design. The source code will be available at: https://github.com/vdkhoi20/iCONTRA.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Object Detection in Thermal Images Using Deep Learning for Unmanned Aerial Vehicles
Authors:
Minh Dang Tu,
Kieu Trang Le,
Manh Duong Phung
Abstract:
This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the us…
▽ More
This work presents a neural network model capable of recognizing small and tiny objects in thermal images collected by unmanned aerial vehicles. Our model consists of three parts, the backbone, the neck, and the prediction head. The backbone is developed based on the structure of YOLOv5 combined with the use of a transformer encoder at the end. The neck includes a BI-FPN block combined with the use of a sliding window and a transformer to increase the information fed into the prediction head. The prediction head carries out the detection by evaluating feature maps with the Sigmoid function. The use of transformers with attention and sliding windows increases recognition accuracy while kee** the model at a reasonable number of parameters and computation requirements for embedded systems. Experiments conducted on public dataset VEDAI and our collected datasets show that our model has a higher accuracy than state-of-the-art methods such as ResNet, Faster RCNN, ComNet, ViT, YOLOv5, SMPNet, and DPNetV3. Experiments on the embedded computer Jetson AGX show that our model achieves a real-time computation speed with a stability rate of over 90%.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
Authors:
Vinh Quoc Luu,
Duy Khanh Le,
Huy Thanh Nguyen,
Minh Thanh Nguyen,
Thinh Tien Nguyen,
Vinh Quang Dinh
Abstract:
Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first c…
▽ More
Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first challenge, a semi-supervised learning framework should be devised to efficiently capitalize on the scarcity of the dataset available. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. Self-training is a technique that utilizes the model trained on labeled data to generate pseudo-labels for the unlabeled data and then re-train on both of them. FixMatch is a consistency-regularization algorithm to enforce the model's robustness against variations in the input image. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.
△ Less
Submitted 23 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
LAMPAT: Low-Rank Adaption for Multilingual Paraphrasing Using Adversarial Training
Authors:
Khoi M. Le,
Trinh Pham,
Tho Quan,
Anh Tuan Luu
Abstract:
Paraphrases are texts that convey the same meaning while using different words or sentence structures. It can be used as an automatic data augmentation tool for many Natural Language Processing tasks, especially when dealing with low-resource languages, where data shortage is a significant problem. To generate a paraphrase in multilingual settings, previous studies have leveraged the knowledge fro…
▽ More
Paraphrases are texts that convey the same meaning while using different words or sentence structures. It can be used as an automatic data augmentation tool for many Natural Language Processing tasks, especially when dealing with low-resource languages, where data shortage is a significant problem. To generate a paraphrase in multilingual settings, previous studies have leveraged the knowledge from the machine translation field, i.e., forming a paraphrase through zero-shot machine translation in the same language. Despite good performance on human evaluation, those methods still require parallel translation datasets, thus making them inapplicable to languages that do not have parallel corpora. To mitigate that problem, we proposed the first unsupervised multilingual paraphrasing model, LAMPAT ($\textbf{L}$ow-rank $\textbf{A}$daptation for $\textbf{M}$ultilingual $\textbf{P}$araphrasing using $\textbf{A}$dversarial $\textbf{T}$raining), by which monolingual dataset is sufficient enough to generate a human-like and diverse sentence. Throughout the experiments, we found out that our method not only works well for English but can generalize on unseen languages as well. Data and code are available at https://github.com/VinAIResearch/LAMPAT.
△ Less
Submitted 23 June, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Toward a comprehensive simulation framework for hypergraphs: a Python-base approach
Authors:
Quoc Chuong Nguyen,
Trung Kien Le
Abstract:
Hypergraphs, or generalization of graphs such that edges can contain more than two nodes, have become increasingly prominent in understanding complex network analysis. Unlike graphs, hypergraphs have relatively few supporting platforms, and such dearth presents a barrier to more widespread adaptation of hypergraph computational toolboxes that could enable further research in several areas. Here, w…
▽ More
Hypergraphs, or generalization of graphs such that edges can contain more than two nodes, have become increasingly prominent in understanding complex network analysis. Unlike graphs, hypergraphs have relatively few supporting platforms, and such dearth presents a barrier to more widespread adaptation of hypergraph computational toolboxes that could enable further research in several areas. Here, we introduce HyperRD, a Python package for hypergraph computation, simulation, and interoperability with other powerful Python packages in graph and hypergraph research. Then, we will introduce two models on hypergraph, the general Schelling's model and the SIR model, and simulate them with HyperRD.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts
Authors:
Giang Do,
Khiem Le,
Quang Pham,
TrungTin Nguyen,
Thanh-Nam Doan,
Bint T. Nguyen,
Chenghao Liu,
Savitha Ramasamy,
Xiaoli Li,
Steven Hoi
Abstract:
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from rando…
▽ More
By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces \HyperRout, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of \HyperRouter compared to existing routing methods. Our implementation is publicly available at {\url{https://github.com/giangdip2410/HyperRouter}}.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
Authors:
Thong Nguyen,
Xiaobao Wu,
Xinshuai Dong,
Khoi Le,
Zhiyuan Hu,
Cong-Duy Nguyen,
See-Kiong Ng,
Luu Anh Tuan
Abstract:
Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapte…
▽ More
Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapters to the pre-trained model and only update them at fine-tuning time. However, existing adapters fail to capture intrinsic temporal relations among video frames or textual words. Moreover, they neglect the preservation of critical task-related information that flows from the raw video-language input into the adapter's low-dimensional space. To address these issues, we first propose a novel REcurrent ADapter (READ) that employs recurrent computation to enable temporal modeling capability. Second, we propose Partial Video-Language Alignment (PVLA) objective via the use of partial optimal transport to maintain task-related information flowing into our READ modules. We validate our READ-PVLA framework through extensive experiments where READ-PVLA significantly outperforms all existing fine-tuning strategies on multiple low-resource temporal language grounding and video-language summarization benchmarks.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Towards Robust Natural-Looking Mammography Lesion Synthesis on Ipsilateral Dual-Views Breast Cancer Analysis
Authors:
Thanh-Huy Nguyen,
Quang Hien Kha,
Thai Ngoc Toan Truong,
Ba Thinh Lam,
Ba Hung Ngo,
Quang Vinh Dinh,
Nguyen Quoc Khanh Le
Abstract:
In recent years, many mammographic image analysis methods have been introduced for improving cancer classification tasks. Two major issues of mammogram classification tasks are leveraging multi-view mammographic information and class-imbalance handling. In the first problem, many multi-view methods have been released for concatenating features of two or more views for the training and inference st…
▽ More
In recent years, many mammographic image analysis methods have been introduced for improving cancer classification tasks. Two major issues of mammogram classification tasks are leveraging multi-view mammographic information and class-imbalance handling. In the first problem, many multi-view methods have been released for concatenating features of two or more views for the training and inference stage. Having said that, most multi-view existing methods are not explainable in the meaning of feature fusion, and treat many views equally for diagnosing. Our work aims to propose a simple but novel method for enhancing examined view (main view) by leveraging low-level feature information from the auxiliary view (ipsilateral view) before learning the high-level feature that contains the cancerous features. For the second issue, we also propose a simple but novel malignant mammogram synthesis framework for upsampling minor class samples. Our easy-to-implement and no-training framework has eliminated the current limitation of the CutMix algorithm which is unreliable synthesized images with random pasted patches, hard-contour problems, and domain shift problems. Our results on VinDr-Mammo and CMMD datasets show the effectiveness of our two new frameworks for both multi-view training and synthesizing mammographic images, outperforming the previous conventional methods in our experimental settings.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
DM-VTON: Distilled Mobile Real-time Virtual Try-On
Authors:
Khoi-Nguyen Nguyen-Ngoc,
Thanh-Tung Phan-Nguyen,
Khanh-Duy Le,
Tam V. Nguyen,
Minh-Triet Tran,
Trung-Nghia Le
Abstract:
The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shop** platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output qu…
▽ More
The fashion e-commerce industry has witnessed significant growth in recent years, prompting exploring image-based virtual try-on techniques to incorporate Augmented Reality (AR) experiences into online shop** platforms. However, existing research has primarily overlooked a crucial aspect - the runtime of the underlying machine-learning model. While existing methods prioritize enhancing output quality, they often disregard the execution time, which restricts their applications on a limited range of devices. To address this gap, we propose Distilled Mobile Real-time Virtual Try-On (DM-VTON), a novel virtual try-on framework designed to achieve simplicity and efficiency. Our approach is based on a knowledge distillation scheme that leverages a strong Teacher network as supervision to guide a Student network without relying on human parsing. Notably, we introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime while ensuring high-quality output. Additionally, we propose Virtual Try-on-guided Pose for Data Synthesis to address the limited pose variation observed in training images. Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of memory while producing almost the same output quality as other state-of-the-art methods. DM-VTON stands poised to facilitate the advancement of real-time AR applications, in addition to the generation of lifelike attired human figures tailored for diverse specialized training tasks. https://sites.google.com/view/ltnghia/research/DMVTON
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
VIDES: Virtual Interior Design via Natural Language and Visual Guidance
Authors:
Minh-Hien Le,
Chi-Bien Chu,
Khanh-Duy Le,
Tam V. Nguyen,
Minh-Triet Tran,
Trung-Nghia Le
Abstract:
Interior design is crucial in creating aesthetically pleasing and functional indoor spaces. However, develo** and editing interior design concepts requires significant time and expertise. We propose Virtual Interior DESign (VIDES) system in response to this challenge. Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts…
▽ More
Interior design is crucial in creating aesthetically pleasing and functional indoor spaces. However, develo** and editing interior design concepts requires significant time and expertise. We propose Virtual Interior DESign (VIDES) system in response to this challenge. Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts quickly, given user text description and visual guidance. Using both visual guidance and language as the conditional inputs significantly enhances the accuracy and coherence of the generated scenes, resulting in visually appealing designs. Through extensive experimentation, we demonstrate the effectiveness of VIDES in develo** new indoor concepts, changing indoor styles, and replacing and removing interior objects. The system successfully captures the essence of users' descriptions while providing flexibility for customization. Consequently, this system can potentially reduce the entry barrier for indoor design, making it more accessible to users with limited technical skills and reducing the time required to create high-quality images. Individuals who have a background in design can now easily communicate their ideas visually and effectively present their design concepts. https://sites.google.com/view/ltnghia/research/VIDES
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Advancing Wound Filling Extraction on 3D Faces: Auto-Segmentation and Wound Face Regeneration Approach
Authors:
Duong Q. Nguyen,
Thinh D. Le,
Phuong D. Nguyen,
Nga T. K. Le,
H. Nguyen-Xuan
Abstract:
Facial wound segmentation plays a crucial role in preoperative planning and optimizing patient outcomes in various medical applications. In this paper, we propose an efficient approach for automating 3D facial wound segmentation using a two-stream graph convolutional network. Our method leverages the Cir3D-FaIR dataset and addresses the challenge of data imbalance through extensive experimentation…
▽ More
Facial wound segmentation plays a crucial role in preoperative planning and optimizing patient outcomes in various medical applications. In this paper, we propose an efficient approach for automating 3D facial wound segmentation using a two-stream graph convolutional network. Our method leverages the Cir3D-FaIR dataset and addresses the challenge of data imbalance through extensive experimentation with different loss functions. To achieve accurate segmentation, we conducted thorough experiments and selected a high-performing model from the trained models. The selected model demonstrates exceptional segmentation performance for complex 3D facial wounds. Furthermore, based on the segmentation model, we propose an improved approach for extracting 3D facial wound fillers and compare it to the results of the previous study. Our method achieved a remarkable accuracy of 0.9999986\% on the test suite, surpassing the performance of the previous method. From this result, we use 3D printing technology to illustrate the shape of the wound filling. The outcomes of this study have significant implications for physicians involved in preoperative planning and intervention design. By automating facial wound segmentation and improving the accuracy of wound-filling extraction, our approach can assist in carefully assessing and optimizing interventions, leading to enhanced patient outcomes. Additionally, it contributes to advancing facial reconstruction techniques by utilizing machine learning and 3D bioprinting for printing skin tissue implants. Our source code is available at \url{https://github.com/SIMOGroup/WoundFilling3D}.
△ Less
Submitted 12 July, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval
Authors:
Trung-Nghia Le,
Tam V. Nguyen,
Minh-Quan Le,
Trong-Thuan Nguyen,
Viet-Tham Huynh,
Trong-Le Do,
Khanh-Duy Le,
Mai-Khiem Tran,
Nhat Hoang-Xuan,
Thang-Long Nguyen-Ho,
Vinh-Tiep Nguyen,
Tuong-Nghiem Diep,
Khanh-Duy Ho,
Xuan-Hieu Nguyen,
Thien-Phuc Tran,
Tuan-Anh Yang,
Kim-Phat Tran,
Nhu-Vinh Hoang,
Minh-Quang Nguyen,
E-Ro Nguyen,
Minh-Khoi Nguyen-Nhat,
Tuan-An To,
Trung-Truc Huynh-Le,
Nham-Tan Nguyen,
Hoang-Chau Luong
, et al. (8 additional authors not shown)
Abstract:
3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC chall…
▽ More
3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC challenge track focusing on text-based fine-grained retrieval of 3D animal models. Unlike previous SHREC challenge tracks, the proposed task is considerably more challenging, requiring participants to develop innovative approaches to tackle the problem of text-based retrieval. Despite the increased difficulty, we believe this task can potentially drive useful applications in practice and facilitate more intuitive interactions with 3D objects. Five groups participated in our competition, submitting a total of 114 runs. While the results obtained in our competition are satisfactory, we note that the challenges presented by this task are far from fully solved. As such, we provide insights into potential areas for future research and improvements. We believe we can help push the boundaries of 3D object retrieval and facilitate more user-friendly interactions via vision-language technologies. https://aichallenge.hcmus.edu.vn/textanimar
△ Less
Submitted 9 August, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval
Authors:
Trung-Nghia Le,
Tam V. Nguyen,
Minh-Quan Le,
Trong-Thuan Nguyen,
Viet-Tham Huynh,
Trong-Le Do,
Khanh-Duy Le,
Mai-Khiem Tran,
Nhat Hoang-Xuan,
Thang-Long Nguyen-Ho,
Vinh-Tiep Nguyen,
Nhat-Quynh Le-Pham,
Huu-Phuc Pham,
Trong-Vu Hoang,
Quang-Binh Nguyen,
Trong-Hieu Nguyen-Mau,
Tuan-Luc Huynh,
Thanh-Danh Le,
Ngoc-Linh Nguyen-Ha,
Tuong-Vy Truong-Thuy,
Truong Hoai Phong,
Tuong-Nghiem Diep,
Khanh-Duy Ho,
Xuan-Hieu Nguyen,
Thien-Phuc Tran
, et al. (9 additional authors not shown)
Abstract:
The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this…
▽ More
The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this end, we introduce a novel SHREC challenge track that focuses on retrieving relevant 3D animal models from a dataset using sketch queries and expedites accessing 3D models through available sketches. Furthermore, a new dataset named ANIMAR was constructed in this study, comprising a collection of 711 unique 3D animal models and 140 corresponding sketch queries. Our contest requires participants to retrieve 3D models based on complex and detailed sketches. We receive satisfactory results from eight teams and 204 runs. Although further improvement is necessary, the proposed task has the potential to incentivize additional research in the domain of 3D object retrieval, potentially yielding benefits for a wide range of applications. We also provide insights into potential areas of future research, such as improving techniques for feature extraction and matching and creating more diverse datasets to evaluate retrieval performance. https://aichallenge.hcmus.edu.vn/sketchanimar
△ Less
Submitted 9 August, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Distributed Coverage Control of Constrained Constant-Speed Unicycle Multi-Agent Systems
Authors:
Qingchen Liu,
Zengjie Zhang,
Nhan Khanh Le,
Jiahu Qin,
Fangzhou Liu,
Sandra Hirche
Abstract:
This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel cov…
▽ More
This paper proposes a novel distributed coverage controller for a multi-agent system with constant-speed unicycle robots (CSUR). The work is motivated by the limitation of the conventional method that does not ensure the satisfaction of hard state- and input-dependent constraints and leads to feasibility issues for multi-CSUR systems. In this paper, we solve these problems by designing a novel coverage cost function and a saturated gradient-search-based control law. Invariant set theory and Lyapunov-based techniques are used to prove the state-dependent confinement and the convergence of the system state to the optimal coverage configuration, respectively. The controller is implemented in a distributed manner based on a novel communication standard among the agents. A series of simulation case studies are conducted to validate the effectiveness of the proposed coverage controller in different initial conditions and with control parameters. A comparison study in simulation reveals the advantage of the proposed method in terms of avoiding infeasibility. The experiment study verifies the applicability of the method to real robots with uncertainties. The development procedure of the method from theoretical analysis to experimental validation provides a novel framework for multi-agent system coordinate control with complex agent dynamics.
△ Less
Submitted 14 March, 2024; v1 submitted 12 April, 2023;
originally announced April 2023.
-
Improving Object Detection in Medical Image Analysis through Multiple Expert Annotators: An Empirical Investigation
Authors:
Hieu H. Pham,
Khiem H. Le,
Tuan V. Tran,
Ha Q. Nguyen
Abstract:
The work discusses the use of machine learning algorithms for anomaly detection in medical image analysis and how the performance of these algorithms depends on the number of annotators and the quality of labels. To address the issue of subjectivity in labeling with a single annotator, we introduce a simple and effective approach that aggregates annotations from multiple annotators with varying le…
▽ More
The work discusses the use of machine learning algorithms for anomaly detection in medical image analysis and how the performance of these algorithms depends on the number of annotators and the quality of labels. To address the issue of subjectivity in labeling with a single annotator, we introduce a simple and effective approach that aggregates annotations from multiple annotators with varying levels of expertise. We then aim to improve the efficiency of predictive models in abnormal detection tasks by estimating hidden labels from multiple annotations and using a re-weighted loss function to improve detection performance. Our method is evaluated on a real-world medical imaging dataset and outperforms relevant baselines that do not consider disagreements among annotators.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Learning for Amalgamation: A Multi-Source Transfer Learning Framework For Sentiment Classification
Authors:
Cuong V. Nguyen,
Khiem H. Le,
Anh M. Tran,
Quang H. Pham,
Binh T. Nguyen
Abstract:
Transfer learning plays an essential role in Deep Learning, which can remarkably improve the performance of the target domain, whose training data is not sufficient. Our work explores beyond the common practice of transfer learning with a single pre-trained model. We focus on the task of Vietnamese sentiment classification and propose LIFA, a framework to learn a unified embedding from several pre…
▽ More
Transfer learning plays an essential role in Deep Learning, which can remarkably improve the performance of the target domain, whose training data is not sufficient. Our work explores beyond the common practice of transfer learning with a single pre-trained model. We focus on the task of Vietnamese sentiment classification and propose LIFA, a framework to learn a unified embedding from several pre-trained models. We further propose two more LIFA variants that encourage the pre-trained models to either cooperate or compete with one another. Studying these variants sheds light on the success of LIFA by showing that sharing knowledge among the models is more beneficial for transfer learning. Moreover, we construct the AISIA-VN-Review-F dataset, the first large-scale Vietnamese sentiment classification database. We conduct extensive experiments on the AISIA-VN-Review-F and existing benchmarks to demonstrate the efficacy of LIFA compared to other techniques. To contribute to the Vietnamese NLP research, we publish our source code and datasets to the research community upon acceptance.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Location-based AR for Social Justice: Case Studies, Lessons, and Open Challenges
Authors:
Hope Schroeder,
Rob Tokanel,
Kyle Qian,
Khoi Le
Abstract:
Dear Visitor and Charleston Reconstructed were location-based augmented reality (AR) experiences created between 2018 and 2020 dealing with two controversial monument sites in the US. The projects were motivated by the ability of AR to 1) link layers of context to physical sites in ways that are otherwise difficult or impossible and 2) to visualize changes to physical spaces, potentially inspiring…
▽ More
Dear Visitor and Charleston Reconstructed were location-based augmented reality (AR) experiences created between 2018 and 2020 dealing with two controversial monument sites in the US. The projects were motivated by the ability of AR to 1) link layers of context to physical sites in ways that are otherwise difficult or impossible and 2) to visualize changes to physical spaces, potentially inspiring changes to the spaces themselves. We discuss the projects' motivations, designs, and deployments. We reflect on how physical changes to the projects' respective sites radically altered their outcomes, and we describe lessons for future work in location-based AR, particularly for projects in contested spaces.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Enhancing Deep Learning-based 3-lead ECG Classification with Heartbeat Counting and Demographic Data Integration
Authors:
Khiem H. Le,
Hieu H. Pham,
Thao B. T. Nguyen,
Tu A. Nguyen,
Cuong D. Do
Abstract:
Nowadays, an increasing number of people are being diagnosed with cardiovascular diseases (CVDs), the leading cause of death globally. The gold standard for identifying these heart problems is via electrocardiogram (ECG). The standard 12-lead ECG is widely used in clinical practice and the majority of current research. However, using a lower number of leads can make ECG more pervasive as it can be…
▽ More
Nowadays, an increasing number of people are being diagnosed with cardiovascular diseases (CVDs), the leading cause of death globally. The gold standard for identifying these heart problems is via electrocardiogram (ECG). The standard 12-lead ECG is widely used in clinical practice and the majority of current research. However, using a lower number of leads can make ECG more pervasive as it can be integrated with portable or wearable devices. This article introduces two novel techniques to improve the performance of the current deep learning system for 3-lead ECG classification, making it comparable with models that are trained using standard 12-lead ECG. Specifically, we propose a multi-task learning scheme in the form of the number of heartbeats regression and an effective mechanism to integrate patient demographic data into the system. With these two advancements, we got classification performance in terms of F1 scores of 0.9796 and 0.8140 on two large-scale ECG datasets, i.e., Chapman and CPSC-2018, respectively, which surpassed current state-of-the-art ECG classification methods, even those trained on 12-lead data. To encourage further development, our source code is publicly available at https://github.com/lhkhiem28/LightX3ECG.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Detecting COVID-19 from digitized ECG printouts using 1D convolutional neural networks
Authors:
Thao Nguyen,
Hieu H. Pham,
Huy Khiem Le,
Anh Tu Nguyen,
Ngoc Tien Thanh,
Cuong Do
Abstract:
The COVID-19 pandemic has exposed the vulnerability of healthcare services worldwide, raising the need to develop novel tools to provide rapid and cost-effective screening and diagnosis. Clinical reports indicated that COVID-19 infection may cause cardiac injury, and electrocardiograms (ECG) may serve as a diagnostic biomarker for COVID-19. This study aims to utilize ECG signals to detect COVID-19…
▽ More
The COVID-19 pandemic has exposed the vulnerability of healthcare services worldwide, raising the need to develop novel tools to provide rapid and cost-effective screening and diagnosis. Clinical reports indicated that COVID-19 infection may cause cardiac injury, and electrocardiograms (ECG) may serve as a diagnostic biomarker for COVID-19. This study aims to utilize ECG signals to detect COVID-19 automatically. We propose a novel method to extract ECG signals from ECG paper records, which are then fed into a one-dimensional convolution neural network (1D-CNN) to learn and diagnose the disease. To evaluate the quality of digitized signals, R peaks in the paper-based ECG images are labeled. Afterward, RR intervals calculated from each image are compared to RR intervals of the corresponding digitized signal. Experiments on the COVID-19 ECG images dataset demonstrate that the proposed digitization method is able to capture correctly the original signals, with a mean absolute error of 28.11 ms. Our proposed 1D-CNN model, which is trained on the digitized ECG signals, allows identifying individuals with COVID-19 and other subjects accurately, with classification accuracies of 98.42%, 95.63%, and 98.50% for classifying COVID-19 vs. Normal, COVID-19 vs. Abnormal Heartbeats, and COVID-19 vs. other classes, respectively. Furthermore, the proposed method also achieves a high-level of performance for the multi-classification task. Our findings indicate that a deep learning system trained on digitized ECG signals can serve as a potential tool for diagnosing COVID-19.
△ Less
Submitted 5 October, 2022; v1 submitted 10 August, 2022;
originally announced August 2022.
-
LightX3ECG: A Lightweight and eXplainable Deep Learning System for 3-lead Electrocardiogram Classification
Authors:
Khiem H. Le,
Hieu H. Pham,
Thao BT. Nguyen,
Tu A. Nguyen,
Tien N. Thanh,
Cuong D. Do
Abstract:
Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices…
▽ More
Cardiovascular diseases (CVDs) are a group of heart and blood vessel disorders that is one of the most serious dangers to human health, and the number of such patients is still growing. Early and accurate detection plays a key role in successful treatment and intervention. Electrocardiogram (ECG) is the gold standard for identifying a variety of cardiovascular abnormalities. In clinical practices and most of the current research, standard 12-lead ECG is mainly used. However, using a lower number of leads can make ECG more prevalent as it can be conveniently recorded by portable or wearable devices. In this research, we develop a novel deep learning system to accurately identify multiple cardiovascular abnormalities by using only three ECG leads.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Fast Conditional Network Compression Using Bayesian HyperNetworks
Authors:
Phuoc Nguyen,
Truyen Tran,
Ky Le,
Sunil Gupta,
Santu Rana,
Dang Nguyen,
Trong Nguyen,
Shannon Ryan,
Svetha Venkatesh
Abstract:
We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g. a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compres…
▽ More
We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g. a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compress a given large network into much smaller size tailored to meet each contextual requirement. We employ a hypernetwork to parameterize the posterior distribution of weights given conditional inputs and minimize a variational objective of this Bayesian neural network. To further reduce the network sizes, we propose a new input-output group sparsity factorization of weights to encourage more sparseness in the generated weights. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Learning from Multiple Expert Annotators for Enhancing Anomaly Detection in Medical Image Analysis
Authors:
Khiem H. Le,
Tuan V. Tran,
Hieu H. Pham,
Hieu T. Nguyen,
Tung T. Le,
Ha Q. Nguyen
Abstract:
Building an accurate computer-aided diagnosis system based on data-driven approaches requires a large amount of high-quality labeled data. In medical imaging analysis, multiple expert annotators often produce subjective estimates about "ground truth labels" during the annotation process, depending on their expertise and experience. As a result, the labeled data may contain a variety of human biase…
▽ More
Building an accurate computer-aided diagnosis system based on data-driven approaches requires a large amount of high-quality labeled data. In medical imaging analysis, multiple expert annotators often produce subjective estimates about "ground truth labels" during the annotation process, depending on their expertise and experience. As a result, the labeled data may contain a variety of human biases with a high rate of disagreement among annotators, which significantly affect the performance of supervised machine learning algorithms. To tackle this challenge, we propose a simple yet effective approach to combine annotations from multiple radiology experts for training a deep learning-based detector that aims to detect abnormalities on medical scans. The proposed method first estimates the ground truth annotations and confidence scores of training examples. The estimated annotations and their scores are then used to train a deep learning detector with a re-weighted loss function to localize abnormal findings. We conduct an extensive experimental evaluation of the proposed approach on both simulated and real-world medical imaging datasets. The experimental results show that our approach significantly outperforms baseline approaches that do not consider the disagreements among annotators, including methods in which all of the noisy annotations are treated equally as ground truth and the ensemble of different models trained on different label sets provided separately by annotators.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
Machine Learning for Food Review and Recommendation
Authors:
Tan Khang Le,
Siu Cheung Hui
Abstract:
Food reviews and recommendations have always been important for online food service websites. However, reviewing and recommending food is not simple as it is likely to be overwhelmed by disparate contexts and meanings. In this paper, we use different deep learning approaches to address the problems of sentiment analysis, automatic review tag generation, and retrieval of food reviews. We propose to…
▽ More
Food reviews and recommendations have always been important for online food service websites. However, reviewing and recommending food is not simple as it is likely to be overwhelmed by disparate contexts and meanings. In this paper, we use different deep learning approaches to address the problems of sentiment analysis, automatic review tag generation, and retrieval of food reviews. We propose to develop a web-based food review system at Nanyang Technological University (NTU) named NTU Food Hunter, which incorporates different deep learning approaches that help users with food selection. First, we implement the BERT and LSTM deep learning models into the system for sentiment analysis of food reviews. Then, we develop a Part-of-Speech (POS) algorithm to automatically identify and extract adjective-noun pairs from the review content for review tag generation based on POS tagging and dependency parsing. Finally, we also train a RankNet model for the re-ranking of the retrieval results to improve the accuracy in our Solr-based food reviews search system. The experimental results show that our proposed deep learning approaches are promising for the applications of real-world problems.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
Towards 6G Internet of Things: Recent Advances, Use Cases, and Open Challenges
Authors:
Zakria Qadir,
Hafiz Suliman Munawar,
Nasir Saeed,
Khoa Le
Abstract:
Smart services based on the Internet of Everything (IoE) are gaining considerable popularity due to the ever-increasing demands of wireless networks. This demands the appraisal of the wireless networks with enhanced properties as next-generation communication systems. Although 5G networks show great potential to support numerous IoE based services, it is not adequate to meet the complete requireme…
▽ More
Smart services based on the Internet of Everything (IoE) are gaining considerable popularity due to the ever-increasing demands of wireless networks. This demands the appraisal of the wireless networks with enhanced properties as next-generation communication systems. Although 5G networks show great potential to support numerous IoE based services, it is not adequate to meet the complete requirements of the new smart applications. Therefore, there is an increased demand for envisioning the 6G wireless communication systems to overcome the major limitations in the existing 5G networks. Moreover, incorporating artificial intelligence in 6G will provide solutions for very complex problems relevant to network optimization. Furthermore, to add further value to the future 6G networks, researchers are investigating new technologies, such as THz and quantum communications. The requirements of future 6G wireless communications demand to support massive data-driven applications and the increasing number of users. This paper presents recent advances in the 6G wireless networks, including the evolution from 1G to 5G communications, the research trends for 6G, enabling technologies, and state-of-the-art 6G projects.
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Entropic Gromov-Wasserstein between Gaussian Distributions
Authors:
Khang Le,
Dung Le,
Huy Nguyen,
Dat Do,
Tung Pham,
Nhat Ho
Abstract:
We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von…
▽ More
We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von Neumann's trace inequality, we obtain closed-form expressions for the entropic IGW between these Gaussian distributions. Finally, we consider an entropic inner product Gromov-Wasserstein barycenter of multiple Gaussian distributions. We prove that the barycenter is a Gaussian distribution when the entropic regularization parameter is small. We further derive a closed-form expression for the covariance matrix of the barycenter.
△ Less
Submitted 24 February, 2022; v1 submitted 24 August, 2021;
originally announced August 2021.
-
On Multimarginal Partial Optimal Transport: Equivalent Forms and Computational Complexity
Authors:
Khang Le,
Huy Nguyen,
Tung Pham,
Nhat Ho
Abstract:
We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masse…
▽ More
We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masses of each measure are sufficiently close while the second equivalence form does not require any conditions on these masses but at the price of more sophisticated extended cost tensor. Our proof techniques for obtaining these equivalence forms rely on novel procedures of moving mass in graph theory to push transportation plan into appropriate regions. Finally, based on the equivalence forms, we develop optimization algorithm, named ApproxMPOT algorithm, that builds upon the Sinkhorn algorithm for solving the entropic regularized multimarginal optimal transport. We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $\tilde{\mathcal{O}}(m^3(n+1)^{m}/ \varepsilon^2)$ where $\varepsilon > 0$ stands for the desired tolerance.
△ Less
Submitted 24 February, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.
-
FedXGBoost: Privacy-Preserving XGBoost for Federated Learning
Authors:
Nhan Khanh Le,
Yang Liu,
Quang Minh Nguyen,
Qingchen Liu,
Fangzhou Liu,
Quanwei Cai,
Sandra Hirche
Abstract:
Federated learning is the distributed machine learning framework that enables collaborative training across multiple parties while ensuring data privacy. Practical adaptation of XGBoost, the state-of-the-art tree boosting framework, to federated learning remains limited due to high cost incurred by conventional privacy-preserving methods. To address the problem, we propose two variants of federate…
▽ More
Federated learning is the distributed machine learning framework that enables collaborative training across multiple parties while ensuring data privacy. Practical adaptation of XGBoost, the state-of-the-art tree boosting framework, to federated learning remains limited due to high cost incurred by conventional privacy-preserving methods. To address the problem, we propose two variants of federated XGBoost with privacy guarantee: FedXGBoost-SMM and FedXGBoost-LDP. Our first protocol FedXGBoost-SMM deploys enhanced secure matrix multiplication method to preserve privacy with lossless accuracy and lower overhead than encryption-based techniques. Developed independently, the second protocol FedXGBoost-LDP is heuristically designed with noise perturbation for local differential privacy, and empirically evaluated on real-world and synthetic datasets.
△ Less
Submitted 12 August, 2021; v1 submitted 20 June, 2021;
originally announced June 2021.
-
VXSlate: Combining Head Movement and Mobile Touch for Large Virtual Display Interaction
Authors:
Khanh-Duy Le,
Tanh Quang Tran,
Karol Chlasta,
Krzysztof Krejtz,
Morten Fjeld,
Andreas Kunz
Abstract:
Virtual Reality (VR) headsets can open opportunities for users to accomplish complex tasks on large virtual displays, using compact setups. However, interacting with large virtual displays using existing interaction techniques might cause fatigue, especially for precise manipulations, due to the lack of physical surfaces. We designed VXSlate, an interaction technique that uses a large virtual disp…
▽ More
Virtual Reality (VR) headsets can open opportunities for users to accomplish complex tasks on large virtual displays, using compact setups. However, interacting with large virtual displays using existing interaction techniques might cause fatigue, especially for precise manipulations, due to the lack of physical surfaces. We designed VXSlate, an interaction technique that uses a large virtual display, as an expansion of a tablet. VXSlate combines a user's headmovement, as tracked by the VR headset, and touch interaction on the tablet. The user's headmovement position both a virtual representation of the tablet and of the user's hand on the large virtual display. The user's multi-touch interactions perform finely-tuned content manipulations.
△ Less
Submitted 26 March, 2021; v1 submitted 14 March, 2021;
originally announced March 2021.
-
On Robust Optimal Transport: Computational Complexity and Barycenter Computation
Authors:
Khang Le,
Huy Nguyen,
Quang Nguyen,
Tung Pham,
Hung Bui,
Nhat Ho
Abstract:
We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions a…
▽ More
We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions and $\varepsilon$ is the desired error. Furthermore, we investigate a fixed-support robust barycenter problem between $m$ discrete probability distributions with at most $n$ number of supports and develop an approximating algorithm based on iterative Bregman projections (IBP). For the specific case $m = 2$, we show that this algorithm can approximate the optimal barycenter value in $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon})$ time, thus being better than the previous complexity $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon^2})$ of the IBP algorithm for approximating the Wasserstein barycenter.
△ Less
Submitted 27 October, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Interpretation of smartphone-captured radiographs utilizing a deep learning-based approach
Authors:
Hieu X. Le,
Phuong D. Nguyen,
Thang H. Nguyen,
Khanh N. Q. Le,
Thanh T. Nguyen
Abstract:
Recently, computer-aided diagnostic systems (CADs) that could automatically interpret medical images effectively have been the emerging subject of recent academic attention. For radiographs, several deep learning-based systems or models have been developed to study the multi-label diseases recognition tasks. However, none of them have been trained to work on smartphone-captured chest radiographs.…
▽ More
Recently, computer-aided diagnostic systems (CADs) that could automatically interpret medical images effectively have been the emerging subject of recent academic attention. For radiographs, several deep learning-based systems or models have been developed to study the multi-label diseases recognition tasks. However, none of them have been trained to work on smartphone-captured chest radiographs. In this study, we proposed a system that comprises a sequence of deep learning-based neural networks trained on the newly released CheXphoto dataset to tackle this issue. The proposed approach achieved promising results of 0.684 in AUC and 0.699 in average F1 score. To the best of our knowledge, this is the first published study that showed to be capable of processing smartphone-captured radiographs.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
A novel approach to remove foreign objects from chest X-ray images
Authors:
Hieu X. Le,
Phuong D. Nguyen,
Thang H. Nguyen,
Khanh N. Q. Le,
Thanh T. Nguyen
Abstract:
We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, a…
▽ More
We initially proposed a deep learning approach for foreign objects inpainting in smartphone-camera captured chest radiographs utilizing the cheXphoto dataset. Foreign objects which can significantly affect the quality of a computer-aided diagnostic prediction are captured under various settings. In this paper, we used multi-method to tackle both removal and inpainting chest radiographs. Firstly, an object detection model is trained to separate the foreign objects from the given image. Subsequently, the binary mask of each object is extracted utilizing a segmentation model. Each pair of the binary mask and the extracted object are then used for inpainting purposes. Finally, the in-painted regions are now merged back to the original image, resulting in a clean and non-foreign-object-existing output. To conclude, we achieved state-of-the-art accuracy. The experimental results showed a new approach to the possible applications of this method for chest X-ray images detection.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm
Authors:
Khiem Pham,
Khang Le,
Nhat Ho,
Tung Pham,
Hung Bui
Abstract:
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $\varepsilon$-approximate solution to the UOT problem is of order…
▽ More
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $\varepsilon$-approximate solution to the UOT problem is of order $\widetilde{\mathcal{O}}(n^2/ \varepsilon)$, which is near-linear time. To the best of our knowledge, this complexity is better than the complexity of the Sinkhorn algorithm for solving the Optimal Transport (OT) problem, which is of order $\widetilde{\mathcal{O}}(n^2/\varepsilon^2)$. Our proof technique is based on the geometric convergence of the Sinkhorn updates to the optimal dual solution of the entropic regularized UOT problem and some properties of the primal solution. It is also different from the proof for the complexity of the Sinkhorn algorithm for approximating the OT problem since the UOT solution does not have to meet the marginal constraints.
△ Less
Submitted 18 November, 2020; v1 submitted 9 February, 2020;
originally announced February 2020.
-
Simultaneous Feature Aggregating and Hashing for Compact Binary Code Learning
Authors:
Thanh-Toan Do,
Khoa Le,
Tuan Hoang,
Huu Le,
Tam V. Nguyen,
Ngai-Man Cheung
Abstract:
Representing images by compact hash codes is an attractive approach for large-scale content-based image retrieval. In most state-of-the-art hashing-based image retrieval systems, for each image, local descriptors are first aggregated as a global representation vector. This global vector is then subjected to a hashing function to generate a binary hash code. In previous works, the aggregating and t…
▽ More
Representing images by compact hash codes is an attractive approach for large-scale content-based image retrieval. In most state-of-the-art hashing-based image retrieval systems, for each image, local descriptors are first aggregated as a global representation vector. This global vector is then subjected to a hashing function to generate a binary hash code. In previous works, the aggregating and the hashing processes are designed independently. Hence these frameworks may generate suboptimal hash codes. In this paper, we first propose a novel unsupervised hashing framework in which feature aggregating and hashing are designed simultaneously and optimized jointly. Specifically, our joint optimization generates aggregated representations that can be better reconstructed by some binary codes. This leads to more discriminative binary hash codes and improved retrieval accuracy. In addition, the proposed method is flexible. It can be extended for supervised hashing. When the data label is available, the framework can be adapted to learn binary codes which minimize the reconstruction loss w.r.t. label vectors. Furthermore, we also propose a fast version of the state-of-the-art hashing method Binary Autoencoder to be used in our proposed frameworks. Extensive experiments on benchmark datasets under various settings show that the proposed methods outperform state-of-the-art unsupervised and supervised hashing methods.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Exploring Combinations of Ontological Features and Keywords for Text Retrieval
Authors:
Tru H. Cao,
Khanh C. Le,
Vuong M. Ngo
Abstract:
Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It sho…
▽ More
Named entities have been considered and combined with keywords to enhance information retrieval performance. However, there is not yet a formal and complete model that takes into account entity names, classes, and identifiers together. Our work explores various adaptations of the traditional Vector Space Model that combine different ontological features with keywords, and in different ways. It shows better performance of the proposed models as compared to the keyword-based Lucene, and their advantages for both text retrieval and representation of documents and queries.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Connected greedy colouring in claw-free graphs
Authors:
Ngoc Khang Le,
Nicolas Trotignon
Abstract:
An ordering of the vertices of a graph is \emph{connected} if every vertex (but the first) has a neighbor among its predecessors. The greedy colouring algorithm of a graph with a connected order consists in taking the vertices in order, and assigning to each vertex the smallest available colour. A graph is \emph{good} if the greedy algorithm on every connected order gives every connected induced s…
▽ More
An ordering of the vertices of a graph is \emph{connected} if every vertex (but the first) has a neighbor among its predecessors. The greedy colouring algorithm of a graph with a connected order consists in taking the vertices in order, and assigning to each vertex the smallest available colour. A graph is \emph{good} if the greedy algorithm on every connected order gives every connected induced subgraph of it an optimal colouring. We give the characterization of good claw-free graphs in terms of minimal forbidden induced subgraphs.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Coloring even-hole-free graphs with no star cutset
Authors:
Ngoc Khang Le
Abstract:
A \emph{hole} is a chordless cycle of length at least $4$. A graph is \emph{even-hole-free} if it does not contain any hole of even length as an induced subgraph. In this paper, we study the class of even-hole-free graphs with no star cutset. We give the optimal upper bound for its chromatic number in terms of clique number and a polynomial-time algorithm to color any graph in this class. The latt…
▽ More
A \emph{hole} is a chordless cycle of length at least $4$. A graph is \emph{even-hole-free} if it does not contain any hole of even length as an induced subgraph. In this paper, we study the class of even-hole-free graphs with no star cutset. We give the optimal upper bound for its chromatic number in terms of clique number and a polynomial-time algorithm to color any graph in this class. The latter is, in fact, a direct consequence of our proof that this class has bounded rank-width.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Analysis and Design of Cost-Effective, High-Throughput LDPC Decoders
Authors:
Thien Truong Nguyen-Ly,
Valentin Savin,
Khoa Le,
David Declercq,
Fakhreddine Ghaffari,
Oana Boncalo
Abstract:
This paper introduces a new approach to cost-effective, high-throughput hardware designs for Low Density Parity Check (LDPC) decoders. The proposed approach, called Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs), exploits the robustness of message-passing LDPC decoders to inaccuracies in the calculation of exchanged messages, and it is shown to provide a unified framework for several…
▽ More
This paper introduces a new approach to cost-effective, high-throughput hardware designs for Low Density Parity Check (LDPC) decoders. The proposed approach, called Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs), exploits the robustness of message-passing LDPC decoders to inaccuracies in the calculation of exchanged messages, and it is shown to provide a unified framework for several designs previously proposed in the literature. NS-FAIDs are optimized by density evolution for regular and irregular LDPC codes, and are shown to provide different trade-offs between hardware complexity and decoding performance. Two hardware architectures targeting high-throughput applications are also proposed, integrating both Min-Sum (MS) and NS-FAID decoding kernels. ASIC post synthesis implementation results on 65nm CMOS technology show that NS-FAIDs yield significant improvements in the throughput to area ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly degraded error correction performance.
△ Less
Submitted 23 August, 2017;
originally announced September 2017.
-
Detecting induced subdivision of $K_4$
Authors:
Ngoc Khang Le
Abstract:
In this paper, we propose a polynomial-time algorithm to test whether a given graph contains a subdivision of $K_4$ as an induced subgraph.
In this paper, we propose a polynomial-time algorithm to test whether a given graph contains a subdivision of $K_4$ as an induced subgraph.
△ Less
Submitted 14 March, 2017;
originally announced March 2017.
-
On rank-width of even-hole-free graphs
Authors:
Isolde Adler,
Ngoc Khang Le,
Haiko Müller,
Marko Radovanović,
Nicolas Trotignon,
Kristina Vušković
Abstract:
We present a class of (diamond, even hole)-free graphs with no clique cutset that has unbounded rank-width. In general, even-hole-free graphs have unbounded rank-width, because chordal graphs are even-hole-free. A.A. da Silva, A. Silva and C. Linhares-Sales (2010) showed that planar even-hole-free graphs have bounded rank-width, and N.K. Le (2016) showed that even-hole-free graphs with no star cut…
▽ More
We present a class of (diamond, even hole)-free graphs with no clique cutset that has unbounded rank-width. In general, even-hole-free graphs have unbounded rank-width, because chordal graphs are even-hole-free. A.A. da Silva, A. Silva and C. Linhares-Sales (2010) showed that planar even-hole-free graphs have bounded rank-width, and N.K. Le (2016) showed that even-hole-free graphs with no star cutset have bounded rank-width. A natural question is to ask, whether even-hole-free graphs with no clique cutsets have bounded rank-width. Our result gives a negative answer. Hence we cannot apply Courcelle and Makowsky's meta-theorem which would provide efficient algorithms for a large number of problems, including the maximum independent set problem, whose complexity remains open for (diamond, even hole)-free graphs.
△ Less
Submitted 1 September, 2017; v1 submitted 29 November, 2016;
originally announced November 2016.
-
Chromatic number of ISK4-free graphs
Authors:
Ngoc Khang Le
Abstract:
A graph $G$ is said to be ISK4-free if it does not contain any subdivision of $K_4$ as an induced subgraph. In this paper, we propose new upper bounds for chromatic number of ISK4-free graphs and $\{$ISK4, triangle$\}$-free graphs.
A graph $G$ is said to be ISK4-free if it does not contain any subdivision of $K_4$ as an induced subgraph. In this paper, we propose new upper bounds for chromatic number of ISK4-free graphs and $\{$ISK4, triangle$\}$-free graphs.
△ Less
Submitted 14 November, 2016;
originally announced November 2016.
-
Construction of the generalized Cech complex
Authors:
Ngoc Khuyen Le,
Philippe Martins,
Laurent Decreusefond,
Anais Vergne
Abstract:
In this paper, we introduce an algorithm which constructs the generalized Cech complex. The generalized Cech complex represents the topology of a wireless network whose cells are different in size. This complex is often used in many application to locate the boundary holes or to save energy consumption in wireless networks. The complexity of a construction of the Cech complex to analyze the covera…
▽ More
In this paper, we introduce an algorithm which constructs the generalized Cech complex. The generalized Cech complex represents the topology of a wireless network whose cells are different in size. This complex is often used in many application to locate the boundary holes or to save energy consumption in wireless networks. The complexity of a construction of the Cech complex to analyze the coverage structure is found to be a polynomial time.
△ Less
Submitted 31 January, 2015; v1 submitted 29 September, 2014;
originally announced September 2014.