Search | arXiv e-print repository

Greedy Heuristics for Sampling-based Motion Planning in High-Dimensional State Spaces

Authors: Phone Thiha Kyaw, Anh Vu Le, Lim Yi, Prabakaran Veerajagadheswar, Mohan Rajesh Elara, Dinh Tung Vo, Minh Bui Vu

Abstract: Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approxima… ▽ More Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approximating and searching the problem domain through random sampling. However, due to its low sampling efficiency and slow convergence rate, research has proposed many variants of RRT*, incorporating different heuristics and sampling strategies to overcome the constraints in complex planning problems. Yet, these approaches address specific convergence aspects of RRT* limitations, leaving a need for a sampling-based algorithm that can quickly find better solutions in complex high-dimensional state spaces with a faster convergence rate for practical motion planning applications. This article unifies and leverages the greedy search and heuristic techniques used in various RRT* variants to develop a greedy version of the anytime Rapidly-exploring Random Trees algorithm, denoted as Greedy RRT* (G-RRT*). It improves the initial solution-finding time of RRT* by maintaining two trees rooted at both the start and goal ends, advancing toward each other using greedy connection heuristics. It also accelerates the convergence rate of RRT* by introducing a greedy version of direct informed sampling procedure, which guides the sampling towards the promising region of the problem domain based on heuristics. We validate our approach on simulated planning problems, manipulation problems on Barrett WAM Arms, and on a self-reconfigurable robot, Panthera. Results show that G-RRT* produces asymptotically optimal solution paths and outperforms state-of-the-art RRT* variants, especially in high-dimensional planning problems. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: To be published at the International Journal of Robotics Research (IJRR)

arXiv:2305.04183 [pdf, other]

doi 10.1016/j.inffus.2023.101868

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Authors: Nghia Hieu Nguyen, Duong T. D. Vo, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the informati… ▽ More In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering) dataset, the first large-scale dataset for VQA with open-ended answers in Vietnamese, consists of 11,000+ images associated with 37,000+ question-answer pairs (QAs). Moreover, we proposed FST, QuMLAG, and MLPAG which fuse information from images and answers, then use these fused features to construct answers as humans iteratively. Our proposed methods achieve results that are competitive with SOTA models such as SAAA, MCAN, LORA, and M4C. The dataset is available to encourage the research community to develop more generalized algorithms including transformers for low-resource languages such as Vietnamese. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: submitted to Elsevier

arXiv:2302.11752 [pdf, other]

doi 10.15625/1813-9663/18157

EVJVQA Challenge: Multilingual Visual Question Answering

Authors: Ngan Luu-Thuy Nguyen, Nghia Hieu Nguyen, Duong T. D Vo, Khanh Quoc Tran, Kiet Van Nguyen

Abstract: Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addi… ▽ More Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems. We released the challenge on the Codalab evaluation system for further research. △ Less

Submitted 17 April, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: VLSP2022 EVJVQA challenge

arXiv:2212.14353 [pdf, other]

Sheaf-theoretic self-filtering network of low-cost sensors for local air quality monitoring: A causal approach

Authors: Anh-Duy Pham, Chuong Dinh Le, Hoang Viet Pham, Thinh Gia Tran, Dat Thanh Vo, Chau Long Tran, An Dinh Le, Hien Bich Vo

Abstract: Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using… ▽ More Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using commercial instruments. Traditional methods for air quality measurement often rely on calibrating the measurement with public standard instruments or calculating the measurements moving average over a constant period. However, this can lead to an incorrect index at the measurement location, as well as an oversmoothing effect on the signal. In this study, we propose a compact device that uses sheaf theory to detect and count vehicles as a local air quality change-causing factor. By inferring the number of vehicles into the PM2.5 index and propagating it into the recorded PM2.5 index from low-cost air monitoring sensors such as PMS7003 and BME280, we can achieve self-correction in real-time. Plus, the sheaf-theoretic method allows for easy scaling to multiple nodes for further filtering effects. By implementing sheaf theory in air quality monitoring, we can overcome the limitations of traditional methods and provide more accurate and reliable results. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2211.05407 [pdf, other]

doi 10.1109/RIVF55975.2022.10013898

UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese

Authors: Nghia Hieu Nguyen, Duong T. D. Vo, Kiet Van Nguyen

Abstract: Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not ma… ▽ More Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted for publishing at the 16th International Conference on Computing and Communication Technologies (RIVF)

arXiv:2211.05405 [pdf, other]

VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning

Authors: Nghia Hieu Nguyen, Duong T. D. Vo, Minh-Quan Ha

Abstract: Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experim… ▽ More Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP. △ Less

Submitted 20 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted for publishing at the VNU Journal of Science: Computer Science and Communication Engineering

arXiv:1910.06748 [pdf, other]

Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN

Authors: Duy Tin Vo, Richard Khoury

Abstract: Language Identification (LID) is a challenging task, especially when the input texts are short and noisy such as posts and statuses on social media or chat logs on gaming forums. The task has been tackled by either designing a feature set for a traditional classifier (e.g. Naive Bayes) or applying a deep neural network classifier (e.g. Bi-directional Gated Recurrent Unit, Encoder-Decoder). These m… ▽ More Language Identification (LID) is a challenging task, especially when the input texts are short and noisy such as posts and statuses on social media or chat logs on gaming forums. The task has been tackled by either designing a feature set for a traditional classifier (e.g. Naive Bayes) or applying a deep neural network classifier (e.g. Bi-directional Gated Recurrent Unit, Encoder-Decoder). These methods are usually trained and tested on a huge amount of private data, then used and evaluated as off-the-shelf packages by other researchers using their own datasets, and consequently the various results published are not directly comparable. In this paper, we first create a new massive labelled dataset based on one year of Twitter data. We use this dataset to test several existing language identification systems, in order to obtain a set of coherent benchmarks, and we make our dataset publicly available so that others can add to this set of benchmarks. Finally, we propose a shallow but efficient neural LID system, which is a ngram-regional convolution neural network enhanced with an attention mechanism. Experimental results show that our architecture is able to predict tens of thousands of samples per second and surpasses all state-of-the-art systems with an improvement of 5%. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: 9 pages, 5 tables, 1 figure

Showing 1–7 of 7 results for author: Vo, D T