Search | arXiv e-print repository

Greedy Heuristics for Sampling-based Motion Planning in High-Dimensional State Spaces

Authors: Phone Thiha Kyaw, Anh Vu Le, Lim Yi, Prabakaran Veerajagadheswar, Mohan Rajesh Elara, Dinh Tung Vo, Minh Bui Vu

Abstract: Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approxima… ▽ More Sampling-based motion planning algorithms are very effective at finding solutions in high-dimensional continuous state spaces as they do not require prior approximations of the problem domain compared to traditional discrete graph-based searches. The anytime version of the Rapidly-exploring Random Trees (RRT) algorithm, denoted as RRT*, often finds high-quality solutions by incrementally approximating and searching the problem domain through random sampling. However, due to its low sampling efficiency and slow convergence rate, research has proposed many variants of RRT*, incorporating different heuristics and sampling strategies to overcome the constraints in complex planning problems. Yet, these approaches address specific convergence aspects of RRT* limitations, leaving a need for a sampling-based algorithm that can quickly find better solutions in complex high-dimensional state spaces with a faster convergence rate for practical motion planning applications. This article unifies and leverages the greedy search and heuristic techniques used in various RRT* variants to develop a greedy version of the anytime Rapidly-exploring Random Trees algorithm, denoted as Greedy RRT* (G-RRT*). It improves the initial solution-finding time of RRT* by maintaining two trees rooted at both the start and goal ends, advancing toward each other using greedy connection heuristics. It also accelerates the convergence rate of RRT* by introducing a greedy version of direct informed sampling procedure, which guides the sampling towards the promising region of the problem domain based on heuristics. We validate our approach on simulated planning problems, manipulation problems on Barrett WAM Arms, and on a self-reconfigurable robot, Panthera. Results show that G-RRT* produces asymptotically optimal solution paths and outperforms state-of-the-art RRT* variants, especially in high-dimensional planning problems. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: To be published at the International Journal of Robotics Research (IJRR)

arXiv:2403.08746 [pdf, other]

iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer

Authors: Dinh-Khoi Vo, Duy-Nam Ly, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an… ▽ More Creating thematic collections in industries demands innovative designs and cohesive concepts. Designers may face challenges in maintaining thematic consistency when drawing inspiration from existing objects, landscapes, or artifacts. While AI-powered graphic design tools offer help, they often fail to generate cohesive sets based on specific thematic concepts. In response, we introduce iCONTRA, an interactive CONcept TRAnsfer system. With a user-friendly interface, iCONTRA enables both experienced designers and novices to effortlessly explore creative design concepts and efficiently generate thematic collections. We also propose a zero-shot image editing algorithm, eliminating the need for fine-tuning models, which gradually integrates information from initial objects, ensuring consistency in the generation process without influencing the background. A pilot study suggests iCONTRA's potential to reduce designers' efforts. Experimental results demonstrate its effectiveness in producing consistent and high-quality object concept transfers. iCONTRA stands as a promising tool for innovation and creative exploration in thematic collection design. The source code will be available at: https://github.com/vdkhoi20/iCONTRA. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2402.03805 [pdf, other]

Automated Description Generation for Software Patches

Authors: Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, Son Nguyen

Abstract: Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an ap… ▽ More Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PATCHEXPLAINER, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PATCHEXPLAINER is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PATCHEXPLAINER consistently outperforms existing methods, with improvements up to 189% in BLEU, 5.7X in Exact Match rate, and 154% in Semantic Similarity, affirming its effectiveness in generating software patch descriptions. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Pre-print version of PATCHEXPLAINER

arXiv:2312.06797 [pdf, other]

Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input

Authors: Trung-Hieu Hoang, Mona Zehni, Huy Phan, Duc Minh Vo, Minh N. Do

Abstract: Despite the promising performance of current 3D human pose estimation techniques, understanding and enhancing their generalization on challenging in-the-wild videos remain an open problem. In this work, we focus on the robustness of 2D-to-3D pose lifters. To this end, we develop two benchmark datasets, namely Human3.6M-C and HumanEva-I-C, to examine the robustness of video-based 3D pose lifters to… ▽ More Despite the promising performance of current 3D human pose estimation techniques, understanding and enhancing their generalization on challenging in-the-wild videos remain an open problem. In this work, we focus on the robustness of 2D-to-3D pose lifters. To this end, we develop two benchmark datasets, namely Human3.6M-C and HumanEva-I-C, to examine the robustness of video-based 3D pose lifters to a wide range of common video corruptions including temporary occlusion, motion blur, and pixel-level noise. We observe the poor generalization of state-of-the-art 3D pose lifters in the presence of corruption and establish two techniques to tackle this issue. First, we introduce Temporal Additive Gaussian Noise (TAGN) as a simple yet effective 2D input pose data augmentation. Additionally, to incorporate the confidence scores output by the 2D pose detectors, we design a confidence-aware convolution (CA-Conv) block. Extensively tested on corrupted videos, the proposed strategies consistently boost the robustness of 3D pose lifters and serve as new baselines for future research. △ Less

Submitted 15 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.18193 [pdf, other]

Persistent Test-time Adaptation in Episodic Testing Scenarios

Authors: Trung-Hieu Hoang, Duc Minh Vo, Minh N. Do

Abstract: Current test-time adaptation (TTA) approaches aim to adapt to environments that change continuously. Yet, when the environments not only change but also recur in a correlated manner over time, such as in the case of day-night surveillance cameras, it is unclear whether the adaptability of these methods is sustained after a long run. This study aims to examine the error accumulation of TTA models w… ▽ More Current test-time adaptation (TTA) approaches aim to adapt to environments that change continuously. Yet, when the environments not only change but also recur in a correlated manner over time, such as in the case of day-night surveillance cameras, it is unclear whether the adaptability of these methods is sustained after a long run. This study aims to examine the error accumulation of TTA models when they are repeatedly exposed to previous testing environments, proposing a novel testing setting called episodic TTA. To study this phenomenon, we design a simulation of TTA process on a simple yet representative $ε$-perturbed Gaussian Mixture Model Classifier and derive the theoretical findings revealing the dataset- and algorithm-dependent factors that contribute to the gradual degeneration of TTA methods through time. Our investigation has led us to propose a method, named persistent TTA (PeTTA). PeTTA senses the model divergence towards a collapsing and adjusts the adaptation strategy of TTA, striking a balance between two primary objectives: adaptation and preventing model collapse. The stability of PeTTA in the face of episodic TTA scenarios has been demonstrated through a set of comprehensive experiments on various benchmarks. △ Less

Submitted 16 January, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15879 [pdf, other]

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

Authors: Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

Abstract: Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the requirement of sustaining up-to-date object knowledge for open-world comprehension. Instead of relying on large amounts of data and/or scaling up network parameters, we introduce a highly effective retrieval-aug… ▽ More Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the requirement of sustaining up-to-date object knowledge for open-world comprehension. Instead of relying on large amounts of data and/or scaling up network parameters, we introduce a highly effective retrieval-augmented image captioning method that prompts LLMs with object names retrieved from External Visual--name memory (EVCap). We build ever-changing object knowledge memory using objects' visuals and names, enabling us to (i) update the memory at a minimal cost and (ii) effortlessly augment LLMs with retrieved object names by utilizing a lightweight and fast-to-train model. Our model, which was trained only on the COCO dataset, can adapt to out-of-domain without requiring additional fine-tuning or re-training. Our experiments conducted on benchmarks and synthetic commonsense-violating data show that EVCap, with only 3.97M trainable parameters, exhibits superior performance compared to other methods based on frozen pre-trained LLMs. Its performance is also competitive to specialist SOTAs that require extensive training. △ Less

Submitted 7 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: CVPR 2024

arXiv:2311.12897 [pdf, other]

An Efficient 3D Gaussian Representation for Monocular/Multi-view Dynamic Scenes

Authors: Kai Katsumata, Duc Minh Vo, Hideki Nakayama

Abstract: In novel view synthesis of scenes from multiple input views, 3D Gaussian splatting emerges as a viable alternative to existing radiance field approaches, delivering great visual quality and real-time rendering. While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerou… ▽ More In novel view synthesis of scenes from multiple input views, 3D Gaussian splatting emerges as a viable alternative to existing radiance field approaches, delivering great visual quality and real-time rendering. While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of $118$ frames per second (FPS) at a resolution of $1352 \times 1014$ with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: 10 pages, 10 figures

arXiv:2309.08225 [pdf, other]

Silent Vulnerability-fixing Commit Identification Based on Graph Neural Networks

Authors: Hieu Dinh Vo, Thanh Trong Vu, Son Nguyen

Abstract: The growing dependence of software projects on external libraries has generated apprehensions regarding the security of these libraries because of concealed vulnerabilities. Handling these vulnerabilities presents difficulties due to the temporal delay between remediation and public exposure. Furthermore, a substantial fraction of open-source projects covertly address vulnerabilities without any f… ▽ More The growing dependence of software projects on external libraries has generated apprehensions regarding the security of these libraries because of concealed vulnerabilities. Handling these vulnerabilities presents difficulties due to the temporal delay between remediation and public exposure. Furthermore, a substantial fraction of open-source projects covertly address vulnerabilities without any formal notification, influencing vulnerability management. Established solutions like OWASP predominantly hinge on public announcements, limiting their efficacy in uncovering undisclosed vulnerabilities. To address this challenge, the automated identification of vulnerability-fixing commits has come to the forefront. In this paper, we present VFFINDER, a novel graph-based approach for automated silent vulnerability fix identification. VFFINDER captures structural changes using Abstract Syntax Trees (ASTs) and represents them in annotated ASTs. To precisely capture the meaning of code changes, the changed code is represented in connection with the related unchanged code. In VFFINDER, the structure of the changed code and related unchanged code are captured and the structural changes are represented in annotated Abstract Syntax Trees (aAST). VFFINDER distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models to extract structural features expressed in aASTs. We conducted experiments to evaluate VFFINDER on a dataset of 11K+ vulnerability fixing commits in 507 real-world C/C++ projects. Our results show that VFFINDER significantly improves the state-of-the-art methods by 272-420% in Precision, 22-70% in Recall, and 3.2X-8.2X in F1. Especially, VFFINDER speeds up the silent fix identification process by up to 121% with the same effort reviewing 50K LOC compared to the existing approaches. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2304.08396, arXiv:2309.01971

arXiv:2309.01971 [pdf, other]

VFFINDER: A Graph-based Approach for Automated Silent Vulnerability-Fix Identification

Authors: Son Nguyen, Thanh Trong Vu, Hieu Dinh Vo

Abstract: The increasing reliance of software projects on third-party libraries has raised concerns about the security of these libraries due to hidden vulnerabilities. Managing these vulnerabilities is challenging due to the time gap between fixes and public disclosures. Moreover, a significant portion of open-source projects silently fix vulnerabilities without disclosure, impacting vulnerability manageme… ▽ More The increasing reliance of software projects on third-party libraries has raised concerns about the security of these libraries due to hidden vulnerabilities. Managing these vulnerabilities is challenging due to the time gap between fixes and public disclosures. Moreover, a significant portion of open-source projects silently fix vulnerabilities without disclosure, impacting vulnerability management. Existing tools like OWASP heavily rely on public disclosures, hindering their effectiveness in detecting unknown vulnerabilities. To tackle this problem, automated identification of vulnerability-fixing commits has emerged. However, identifying silent vulnerability fixes remains challenging. This paper presents VFFINDER, a novel graph-based approach for automated silent vulnerability fix identification. VFFINDER captures structural changes using Abstract Syntax Trees (ASTs) and represents them in annotated ASTs. VFFINDER distinguishes vulnerability-fixing commits from non-fixing ones using attention-based graph neural network models to extract structural features. We conducted experiments to evaluate VFFINDER on a dataset of 36K+ fixing and non-fixing commits in 507 real-world C/C++ projects. Our results show that VFFINDER significantly improves the state-of-the-art methods by 39-83% in Precision, 19-148% in Recall, and 30-109% in F1. Especially, VFFINDER speeds up the silent fix identification process by up to 47% with the same review effort of 5% compared to the existing approaches. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by IEEE KSE 2023

arXiv:2308.10005 [pdf, other]

Partition-and-Debias: Agnostic Biases Mitigation via A Mixture of Biases-Specific Experts

Authors: Jiaxuan Li, Duc Minh Vo, Hideki Nakayama

Abstract: Bias mitigation in image classification has been widely researched, and existing methods have yielded notable results. However, most of these methods implicitly assume that a given image contains only one type of known or unknown bias, failing to consider the complexities of real-world biases. We introduce a more challenging scenario, agnostic biases mitigation, aiming at bias removal regardless o… ▽ More Bias mitigation in image classification has been widely researched, and existing methods have yielded notable results. However, most of these methods implicitly assume that a given image contains only one type of known or unknown bias, failing to consider the complexities of real-world biases. We introduce a more challenging scenario, agnostic biases mitigation, aiming at bias removal regardless of whether the type of bias or the number of types is unknown in the datasets. To address this difficult task, we present the Partition-and-Debias (PnD) method that uses a mixture of biases-specific experts to implicitly divide the bias space into multiple subspaces and a gating module to find a consensus among experts to achieve debiased classification. Experiments on both public and constructed benchmarks demonstrated the efficacy of the PnD. Code is available at: https://github.com/Jiaxuan-Li/PnD. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2307.16169 [pdf, other]

doi 10.24132/CSRN.3301.9

StarSRGAN: Improving Real-World Blind Super-Resolution

Authors: Khoa D. Vo, Len T. Bui

Abstract: The aim of blind super-resolution (SR) in computer vision is to improve the resolution of an image without prior knowledge of the degradation process that caused the image to be low-resolution. The State of the Art (SOTA) model Real-ESRGAN has advanced perceptual loss and produced visually compelling outcomes using more complex degradation models to simulate real-world degradations. However, there… ▽ More The aim of blind super-resolution (SR) in computer vision is to improve the resolution of an image without prior knowledge of the degradation process that caused the image to be low-resolution. The State of the Art (SOTA) model Real-ESRGAN has advanced perceptual loss and produced visually compelling outcomes using more complex degradation models to simulate real-world degradations. However, there is still room to improve the super-resolved quality of Real-ESRGAN by implementing recent techniques. This research paper introduces StarSRGAN, a novel GAN model designed for blind super-resolution tasks that utilize 5 various architectures. Our model provides new SOTA performance with roughly 10% better on the MANIQA and AHIQ measures, as demonstrated by experimental comparisons with Real-ESRGAN. In addition, as a compact version, StarSRGAN Lite provides approximately 7.5 times faster reconstruction speed (real-time upsampling from 540p to 4K) but can still keep nearly 90% of image quality, thereby facilitating the development of a real-time SR experience for future research. Our codes are released at https://github.com/kynthesis/StarSRGAN. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 11 pages, 7 figures, 2 tables, accepted for oral presentation at WSCG 2023

arXiv:2307.08995 [pdf, other]

Revisiting Latent Space of GAN Inversion for Real Image Editing

Authors: Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

Abstract: The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images while maint… ▽ More The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images while maintaining the quality of edited images. More specifically, we propose $\mathcal{F}/\mathcal{Z}^{+}$ space consisting of two subspaces: $\mathcal{F}$ space of an intermediate feature map of StyleGANs enabling faithful reconstruction and $\mathcal{Z}^{+}$ space of an extended StyleGAN prior supporting high editing quality. We project the real images into the proposed space to obtain the inverted codes, by which we then move along $\mathcal{Z}^{+}$, enabling semantic editing without sacrificing image quality. Comprehensive experiments show that $\mathcal{Z}^{+}$ can replace the most commonly-used $\mathcal{W}$, $\mathcal{W}^{+}$, and $\mathcal{S}$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 10 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2306.00241

arXiv:2307.08319 [pdf, other]

Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data

Authors: Kai Katsumata, Duc Minh Vo, Tatsuya Harada, Hideki Nakayama

Abstract: Label-noise or curated unlabeled data is used to compensate for the assumption of clean labeled data in training the conditional generative adversarial network; however, satisfying such an extended assumption is occasionally laborious or impractical. As a step towards generative modeling accessible to everyone, we introduce a novel conditional image generation framework that accepts noisy-labeled… ▽ More Label-noise or curated unlabeled data is used to compensate for the assumption of clean labeled data in training the conditional generative adversarial network; however, satisfying such an extended assumption is occasionally laborious or impractical. As a step towards generative modeling accessible to everyone, we introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated unlabeled data during training: (i) closed-set and open-set label noise in labeled data and (ii) closed-set and open-set unlabeled data. To combat it, we propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data and correcting wrong labels for labeled data. Unlike popular curriculum learning, which uses a threshold to pick the training samples, our soft curriculum controls the effect of each training instance by using the weights predicted by the auxiliary classifier, resulting in the preservation of useful samples while ignoring harmful ones. Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance. In particular, the proposed approach is able to match the performance of (semi-) supervised GANs even with less than half the labeled data. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 10 pages, 13 figures

arXiv:2306.14726 [pdf, other]

Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

Authors: Hieu Dinh Vo, Son Nguyen

Abstract: Recent advances in automated vulnerability detection have achieved potential results in hel** developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and l… ▽ More Recent advances in automated vulnerability detection have achieved potential results in hel** developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by "pre-training, then fine-tuning" framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that of the classical baseline approach with an old-fashioned bag-of-word, TF-IDF. Meanwhile, these deep neural network approaches cost much more resources and require GPU. We also introduce a lightweight independent component to refine the predictions of the baseline approach. Our idea is that the types of vulnerabilities could strongly correlate to certain code tokens (distinguishing tokens) in several crucial parts of programs. The distinguishing tokens for each vulnerability type are statistically identified based on their prevalence in the type versus the others. Our results show that the baseline approach enhanced by our component can outperform the state-of-the-art deep pre-trained approaches while retaining very high efficiency. Furthermore, the proposed component could also improve the neural network approaches by up to 92.8% in macro-average F1. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.14418 [pdf, other]

Context-Encoded Code Change Representation for Automated Commit Message Generation

Authors: Thanh Trong Vu, Thanh-Dat Do, Hieu Dinh Vo

Abstract: Changes in source code are an inevitable part of software development. They are the results of indispensable activities such as fixing bugs or improving functionality. Descriptions for code changes (commit messages) help people better understand the changes. However, due to a lack of motivation and time pressure, writing high-quality commit messages remains reluctantly considered. Several methods… ▽ More Changes in source code are an inevitable part of software development. They are the results of indispensable activities such as fixing bugs or improving functionality. Descriptions for code changes (commit messages) help people better understand the changes. However, due to a lack of motivation and time pressure, writing high-quality commit messages remains reluctantly considered. Several methods have been proposed with the aim of automated commit message generation. However, the existing methods are still limited because they only utilise either the changed code or the changed code combined with surrounding statements. This paper proposes a method to represent code changes by combining the changed code and the unchanged code which have program dependence on the changed code. This method overcomes the limitations of current representations while improving the performance of 5/6 of state-of-the-art commit message generation methods by up to 15% in METEOR, 14% in ROUGE-L, and 10% in BLEU-4. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 16 pages

arXiv:2306.06620 [pdf, other]

ARIST: An Effective API Argument Recommendation Approach

Authors: Son Nguyen, Cuong Tran Manh, Kien T. Tran, Tan M. Nguyen, Thu-Trang Nguyen, Kien-Tuan Ngo, Hieu Dinh Vo

Abstract: Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers'… ▽ More Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers' expectations when they define and use API methods. To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARIST, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of the valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our evaluation on a large dataset of real-world projects shows that ARIST improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARIST outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARIST achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers' coding practice, ARIST can productively rank the expected arguments at the top-1 position in 7/10 requests. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2306.00241 [pdf, other]

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

Authors: Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

Abstract: The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and $\mathcal{Z}^+$ and integrate them into seminal GAN inversion methods to improve editing quality. Besides faithful r… ▽ More The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem. In this study, we revisit StyleGANs' hyperspherical prior $\mathcal{Z}$ and $\mathcal{Z}^+$ and integrate them into seminal GAN inversion methods to improve editing quality. Besides faithful reconstruction, our extensions achieve sophisticated editing quality with the aid of the StyleGAN prior. We project the real images into the proposed space to obtain the inverted codes, by which we then move along $\mathcal{Z}^{+}$, enabling semantic editing without sacrificing image quality. Comprehensive experiments show that $\mathcal{Z}^{+}$ can replace the most commonly-used $\mathcal{W}$, $\mathcal{W}^{+}$, and $\mathcal{S}$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: 5 pages, 9 figures, AI4CC Workshop at CVPR 2023

arXiv:2305.04183 [pdf, other]

doi 10.1016/j.inffus.2023.101868

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

Authors: Nghia Hieu Nguyen, Duong T. D. Vo, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the informati… ▽ More In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering) dataset, the first large-scale dataset for VQA with open-ended answers in Vietnamese, consists of 11,000+ images associated with 37,000+ question-answer pairs (QAs). Moreover, we proposed FST, QuMLAG, and MLPAG which fuse information from images and answers, then use these fused features to construct answers as humans iteratively. Our proposed methods achieve results that are competitive with SOTA models such as SAAA, MCAN, LORA, and M4C. The dataset is available to encourage the research community to develop more generalized algorithms including transformers for low-resource languages such as Vietnamese. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: submitted to Elsevier

arXiv:2304.08396 [pdf, other]

Code-centric Learning-based Just-In-Time Vulnerability Detection

Authors: Son Nguyen, Thu-Trang Nguyen, Thanh Trong Vu, Thanh-Dat Do, Kien-Tuan Ngo, Hieu Dinh Vo

Abstract: Attacks against computer systems exploiting software vulnerabilities can cause substantial damage to the cyber-infrastructure of our modern society and economy. To minimize the consequences, it is vital to detect and fix vulnerabilities as soon as possible. Just-in-time vulnerability detection (JIT-VD) discovers vulnerability-prone ("dangerous") commits to prevent them from being merged into sourc… ▽ More Attacks against computer systems exploiting software vulnerabilities can cause substantial damage to the cyber-infrastructure of our modern society and economy. To minimize the consequences, it is vital to detect and fix vulnerabilities as soon as possible. Just-in-time vulnerability detection (JIT-VD) discovers vulnerability-prone ("dangerous") commits to prevent them from being merged into source code and causing vulnerabilities. By JIT-VD, the commits' authors, who understand the commits properly, can review these dangerous commits and fix them if necessary while the relevant modifications are still fresh in their minds. In this paper, we propose CodeJIT, a novel code-centric learning-based approach for just-in-time vulnerability detection. The key idea of CodeJIT is that the meaning of the code changes of a commit is the direct and deciding factor for determining if the commit is dangerous for the code. Based on that idea, we design a novel graph-based representation to represent the semantics of code changes in terms of both code structures and program dependencies. A graph neural network model is developed to capture the meaning of the code changes represented by our graph-based representation and learn to discriminate between dangerous and safe commits. We conducted experiments to evaluate the JIT-VD performance of CodeJIT on a dataset of 20K+ dangerous and safe commits in 506 real-world projects from 1998 to 2022. Our results show that CodeJIT significantly improves the state-of-the-art JIT-VD methods by up to 66% in Recall, 136% in Precision, and 68% in F1. Moreover, CodeJIT correctly classifies nearly 9/10 of dangerous/safe (benign) commits and even detects 69 commits that fix a vulnerability yet produce other issues in source code △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.06602 [pdf, other]

A-CAP: Anticipation Captioning with Commonsense Knowledge

Authors: Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama

Abstract: Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time. In order to emulate this ability, we introduce a novel task called Anticipation Captioning, which generates a caption for an unseen oracle image using a sparsely temporally-ordered set of images. To tackle this new task, we propose a model called A-CAP, which incorporates commonse… ▽ More Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time. In order to emulate this ability, we introduce a novel task called Anticipation Captioning, which generates a caption for an unseen oracle image using a sparsely temporally-ordered set of images. To tackle this new task, we propose a model called A-CAP, which incorporates commonsense knowledge into a pre-trained vision-language model, allowing it to anticipate the caption. Through both qualitative and quantitative evaluations on a customized visual storytelling dataset, A-CAP outperforms other image captioning methods and establishes a strong baseline for anticipation captioning. We also address the challenges inherent in this task. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: Accepted to CVPR 2023

arXiv:2302.11752 [pdf, other]

doi 10.15625/1813-9663/18157

EVJVQA Challenge: Multilingual Visual Question Answering

Authors: Ngan Luu-Thuy Nguyen, Nghia Hieu Nguyen, Duong T. D Vo, Khanh Quoc Tran, Kiet Van Nguyen

Abstract: Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addi… ▽ More Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems. We released the challenge on the Codalab evaluation system for further research. △ Less

Submitted 17 April, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: VLSP2022 EVJVQA challenge

arXiv:2212.14353 [pdf, other]

Sheaf-theoretic self-filtering network of low-cost sensors for local air quality monitoring: A causal approach

Authors: Anh-Duy Pham, Chuong Dinh Le, Hoang Viet Pham, Thinh Gia Tran, Dat Thanh Vo, Chau Long Tran, An Dinh Le, Hien Bich Vo

Abstract: Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using… ▽ More Sheaf theory, which is a complex but powerful tool supported by topological theory, offers more flexibility and precision than traditional graph theory when it comes to modeling relationships between multiple features. In the realm of air quality monitoring, this can be incredibly useful in detecting sudden changes in local dust particle density, which can be difficult to accurately measure using commercial instruments. Traditional methods for air quality measurement often rely on calibrating the measurement with public standard instruments or calculating the measurements moving average over a constant period. However, this can lead to an incorrect index at the measurement location, as well as an oversmoothing effect on the signal. In this study, we propose a compact device that uses sheaf theory to detect and count vehicles as a local air quality change-causing factor. By inferring the number of vehicles into the PM2.5 index and propagating it into the recorded PM2.5 index from low-cost air monitoring sensors such as PMS7003 and BME280, we can achieve self-correction in real-time. Plus, the sheaf-theoretic method allows for easy scaling to multiple nodes for further filtering effects. By implementing sheaf theory in air quality monitoring, we can overcome the limitations of traditional methods and provide more accurate and reliable results. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2211.05407 [pdf, other]

doi 10.1109/RIVF55975.2022.10013898

UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese

Authors: Nghia Hieu Nguyen, Duong T. D. Vo, Kiet Van Nguyen

Abstract: Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not ma… ▽ More Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted for publishing at the 16th International Conference on Computing and Communication Technologies (RIVF)

arXiv:2211.05405 [pdf, other]

VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning

Authors: Nghia Hieu Nguyen, Duong T. D. Vo, Minh-Quan Ha

Abstract: Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experim… ▽ More Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP. △ Less

Submitted 20 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted for publishing at the VNU Journal of Science: Computer Science and Communication Engineering

arXiv:2210.08459 [pdf, other]

StoryER: Automatic Story Evaluation via Ranking, Rating and Reasoning

Authors: Hong Chen, Duc Minh Vo, Hiroya Takamura, Yusuke Miyao, Hideki Nakayama

Abstract: Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference. We go beyond this limitation by considering a novel \textbf{Story} \textbf{E}valuation method that mimics human preference when judging a story, namely \textbf{StoryER}, which consists of three sub-tasks: \textbf{R}anking, \textbf{R}ating and \textbf{R}easoning. Given eith… ▽ More Existing automatic story evaluation methods place a premium on story lexical level coherence, deviating from human preference. We go beyond this limitation by considering a novel \textbf{Story} \textbf{E}valuation method that mimics human preference when judging a story, namely \textbf{StoryER}, which consists of three sub-tasks: \textbf{R}anking, \textbf{R}ating and \textbf{R}easoning. Given either a machine-generated or a human-written story, StoryER requires the machine to output 1) a preference score that corresponds to human preference, 2) specific ratings and their corresponding confidences and 3) comments for various aspects (e.g., opening, character-sha**). To support these tasks, we introduce a well-annotated dataset comprising (i) 100k ranked story pairs; and (ii) a set of 46k ratings and comments on various aspects of the story. We finetune Longformer-Encoder-Decoder (LED) on the collected dataset, with the encoder responsible for preference score and aspect prediction and the decoder for comment generation. Our comprehensive experiments result in a competitive benchmark for each task, showing the high correlation to human preference. In addition, we have witnessed the joint learning of the preference scores, the aspect ratings, and the comments brings gain in each single task. Our dataset and benchmarks are publicly available to advance the research of story evaluation tasks.\footnote{Dataset and pre-trained model demo are available at anonymous website \url{http://storytelling-lab.com/eval} and \url{https://github.com/sairin1202/StoryER}} △ Less

Submitted 21 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

Comments: accepted by EMNLP 2022

arXiv:2209.12181 [pdf]

Using Multiple Code Representations to Prioritize Static Analysis Warnings

Authors: Thanh Trong Vu, Hieu Dinh Vo

Abstract: In order to ensure the quality of software and prevent attacks from hackers on critical systems, static analysis tools are frequently utilized to detect vulnerabilities in the early development phase. However, these tools often report a large number of warnings with a high false-positive rate, which causes many difficulties for developers. In this paper, we introduce VulRG, a novel approach to add… ▽ More In order to ensure the quality of software and prevent attacks from hackers on critical systems, static analysis tools are frequently utilized to detect vulnerabilities in the early development phase. However, these tools often report a large number of warnings with a high false-positive rate, which causes many difficulties for developers. In this paper, we introduce VulRG, a novel approach to address this problem. Specifically, VulRG predicts and ranks the warnings based on their likelihoods to be true positive. To predict that likelihood, VulRG combines two deep learning models CNN and BiGRU to capture the context of each warning in terms of program syntax, control flow, and program dependence. Our experimental results on a real-world dataset of 6,620 warnings show that VulRG's Recall at Top-50% is 90.9%. This means that using VulRG, 90% of the vulnerabilities can be found by examining only 50% of the warnings. Moreover, at Top-5%, VulRG can improve the state-of-the-art approach by +30% in both Precision and Recall. △ Less

Submitted 26 September, 2022; v1 submitted 25 September, 2022; originally announced September 2022.

Comments: 6 pages, 2 figures, 4 tables

MSC Class: 68N20 ACM Class: D.2.5

arXiv:2206.07636 [pdf, other]

doi 10.1016/j.cag.2022.07.004

SHREC 2022: Fitting and recognition of simple geometric primitives on point clouds

Authors: Chiara Romanengo, Andrea Raffo, Silvia Biasotti, Bianca Falcidieno, Vlassis Fotis, Ioannis Romanelis, Eleftheria Psatha, Konstantinos Moustakas, Ivan Sipiran, Quang-Thuc Nguyen, Chi-Bien Chu, Khoi-Nguyen Nguyen-Ngoc, Dinh-Khoi Vo, Tuan-An To, Nham-Tan Nguyen, Nhat-Quynh Le-Pham, Hai-Dang Nguyen, Minh-Triet Tran, Yifan Qie, Nabil Anwer

Abstract: This paper presents the methods that have participated in the SHREC 2022 track on the fitting and recognition of simple geometric primitives on point clouds. As simple primitives we mean the classical surface primitives derived from constructive solid geometry, i.e., planes, spheres, cylinders, cones and tori. The aim of the track is to evaluate the quality of automatic algorithms for fitting and… ▽ More This paper presents the methods that have participated in the SHREC 2022 track on the fitting and recognition of simple geometric primitives on point clouds. As simple primitives we mean the classical surface primitives derived from constructive solid geometry, i.e., planes, spheres, cylinders, cones and tori. The aim of the track is to evaluate the quality of automatic algorithms for fitting and recognising geometric primitives on point clouds. Specifically, the goal is to identify, for each point cloud, its primitive type and some geometric descriptors. For this purpose, we created a synthetic dataset, divided into a training set and a test set, containing segments perturbed with different kinds of point cloud artifacts. Among the six participants to this track, two are based on direct methods, while four are either fully based on deep learning or combine direct and neural approaches. The performance of the methods is evaluated using various classification and approximation measures. △ Less

Submitted 7 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

MSC Class: 68U05; 68U07; 65D18; 65D17 ACM Class: G.1.2; I.3.5; I.5.4

Journal ref: Computers & Graphics 107 (2022) 32-49

arXiv:2204.14249 [pdf, other]

OSSGAN: Open-Set Semi-Supervised Image Generation

Authors: Kai Katsumata, Duc Minh Vo, Hideki Nakayama

Abstract: We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set. Unlike the existing semi-superv… ▽ More We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation, where the training dataset consists of two parts: (i) labeled data and (ii) unlabeled data with samples belonging to one of the labeled data classes, namely, a closed-set, and samples not belonging to any of the labeled data classes, namely, an open-set. Unlike the existing semi-supervised image generation task, where unlabeled data only contain closed-set samples, our task is more general and lowers the data collection cost in practice by allowing open-set samples to appear. Thanks to entropy regularization, the classifier that is trained on labeled data is able to quantify sample-wise importance to the training of cGAN as confidence, allowing us to use all samples in unlabeled data. We design OSSGAN, which provides decision clues to the discriminator on the basis of whether an unlabeled image belongs to one or none of the classes of interest, smoothly integrating labeled and unlabeled data during training. The results of experiments on Tiny ImageNet and ImageNet show notable improvements over supervised BigGAN and semi-supervised methods. Our code is available at https://github.com/raven38/OSSGAN. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted at CVPR 2022

arXiv:2203.14499 [pdf, other]

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

Authors: Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama

Abstract: Novel object captioning aims at describing objects absent from training data, with the key ingredient being the provision of object vocabulary to the model. Although existing methods heavily rely on an object detection model, we view the detection step as vocabulary retrieval from an external knowledge in the form of embeddings for any object's definition from Wiktionary, where we use in the retri… ▽ More Novel object captioning aims at describing objects absent from training data, with the key ingredient being the provision of object vocabulary to the model. Although existing methods heavily rely on an object detection model, we view the detection step as vocabulary retrieval from an external knowledge in the form of embeddings for any object's definition from Wiktionary, where we use in the retrieval image region features learned from a transformers model. We propose an end-to-end Novel Object Captioning with Retrieved vocabulary from External Knowledge method (NOC-REK), which simultaneously learns vocabulary retrieval and caption generation, successfully describing novel objects outside of the training dataset. Furthermore, our model eliminates the requirement for model retraining by simply updating the external knowledge whenever a novel object appears. Our comprehensive experiments on held-out COCO and Nocaps datasets show that our NOC-REK is considerably effective against SOTAs. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Accepted at CVPR 2022

arXiv:2203.08456 [pdf, other]

PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression

Authors: Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

Abstract: We push forward neural network compression research by exploiting a novel challenging task of large-scale conditional generative adversarial networks (GANs) compression. To this end, we propose a gradually shrinking GAN (PPCD-GAN) by introducing progressive pruning residual block (PP-Res) and class-aware distillation. The PP-Res is an extension of the conventional residual block where each convolu… ▽ More We push forward neural network compression research by exploiting a novel challenging task of large-scale conditional generative adversarial networks (GANs) compression. To this end, we propose a gradually shrinking GAN (PPCD-GAN) by introducing progressive pruning residual block (PP-Res) and class-aware distillation. The PP-Res is an extension of the conventional residual block where each convolutional layer is followed by a learnable mask layer to progressively prune network parameters as training proceeds. The class-aware distillation, on the other hand, enhances the stability of training by transferring immense knowledge from a well-trained teacher model through instructive attention maps. We train the pruning and distillation processes simultaneously on a well-known GAN architecture in an end-to-end manner. After training, all redundant parameters as well as the mask layers are discarded, yielding a lighter network while retaining the performance. We comprehensively illustrate, on ImageNet 128x128 dataset, PPCD-GAN reduces up to 5.2x (81%) parameters against state-of-the-arts while kee** better performance. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: accepted at WACV 2022

arXiv:2201.02503 [pdf]

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

Authors: Doan Duy Vo, Russell Butler

Abstract: Markerless motion capture has become an active field of research in computer vision in recent years. Its extensive applications are known in a great variety of fields, including computer animation, human motion analysis, biomedical research, virtual reality, and sports science. Estimating human posture has recently gained increasing attention in the computer vision community, but due to the depth… ▽ More Markerless motion capture has become an active field of research in computer vision in recent years. Its extensive applications are known in a great variety of fields, including computer animation, human motion analysis, biomedical research, virtual reality, and sports science. Estimating human posture has recently gained increasing attention in the computer vision community, but due to the depth of uncertainty and the lack of the synthetic datasets, it is a challenging task. Various approaches have recently been proposed to solve this problem, many of which are based on deep learning. They are primarily focused on improving the performance of existing benchmarks with significant advances, especially 2D images. Based on powerful deep learning techniques and recently collected real-world datasets, we explored a model that can predict the skeleton of an animation based solely on 2D images. Frames generated from different real-world datasets with synthesized poses using different body shapes from simple to complex. The implementation process uses DeepLabCut on its own dataset to perform many necessary steps, then use the input frames to train the model. The output is an animated skeleton for human movement. The composite dataset and other results are the "ground truth" of the deep model. △ Less

Submitted 7 January, 2022; originally announced January 2022.

Comments: 11 pages, 5 figures, 2 tables

arXiv:2111.00640 [pdf, other]

VSEC: Transformer-based Model for Vietnamese Spelling Correction

Authors: Dinh-Truong Do, Ha Thanh Nguyen, Thang Ngoc Bui, Dinh Hieu Vo

Abstract: Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the contex… ▽ More Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the context if two (or more) spelling mistakes stand near each other. In this paper, we propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The embedding layer, in particular, is powered by the byte pair encoding technique. The sequence to sequence model based on the Transformer architecture makes our approach different from the previous works on the same problem. In the experiment, we train the model with a large synthetic dataset, which is randomly introduced spelling errors. We test the performance of the proposed method using a realistic dataset. This dataset contains 11,202 human-made misspellings in 9,341 different Vietnamese sentences. The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected, which improves the state-of-the-art approach 5.6% and 2.2%, respectively. △ Less

Submitted 8 November, 2021; v1 submitted 31 October, 2021; originally announced November 2021.

arXiv:2110.03296 [pdf, other]

doi 10.1109/APSEC53868.2021.00040

Ranking Warnings of Static Analysis Tools Using Representation Learning

Authors: Kien-Tuan Ngo, Dinh-Truong Do, Thu-Trang Nguyen, Hieu Dinh Vo

Abstract: Static analysis tools are frequently used to detect potential vulnerabilities in software systems. However, an inevitable problem of these tools is their large number of warnings with a high false positive rate, which consumes time and effort for investigating. In this paper, we present DeFP, a novel method for ranking static analysis warnings. Based on the intuition that warnings which have simil… ▽ More Static analysis tools are frequently used to detect potential vulnerabilities in software systems. However, an inevitable problem of these tools is their large number of warnings with a high false positive rate, which consumes time and effort for investigating. In this paper, we present DeFP, a novel method for ranking static analysis warnings. Based on the intuition that warnings which have similar contexts tend to have similar labels (true positive or false positive), DeFP is built with two BiLSTM models to capture the patterns associated with the contexts of labeled warnings. After that, for a set of new warnings, DeFP can calculate and rank them according to their likelihoods to be true positives (i.e., actual vulnerabilities). Our experimental results on a dataset of 10 real-world projects show that using DeFP, by investigating only 60% of the warnings, developers can find +90% of actual vulnerabilities. Moreover, DeFP improves the state-of-the-art approach 30% in both Precision and Recall. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Published in Proceedings of the 28th Asia-Pacific Software Engineering Conference (APSEC'21)

arXiv:2109.10156 [pdf, other]

doi 10.1109/TSE.2021.3113859

A Variability Fault Localization Approach for Software Product Lines

Authors: Thu-Trang Nguyen, Kien-Tuan Ngo, Son Nguyen, Hieu Dinh Vo

Abstract: Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to variability of failures. These unexpected behaviors are induced by variability faults which can only be exposed under some combinations of system features. The interaction among these fe… ▽ More Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to variability of failures. These unexpected behaviors are induced by variability faults which can only be exposed under some combinations of system features. The interaction among these features causes the failures of the system. Although localizing bugs in single-system engineering has been studied in-depth, variability fault localization in SPL systems still remains mostly unexplored. In this article, we present VarCop, a novel and effective variability fault localization approach. For an SPL system failed by variability bugs, VarCop isolates suspicious code statements by analyzing the overall test results of the sampled products and their source code. The isolated suspicious statements are the statements related to the interaction among the features which are necessary for the visibility of the bugs in the system. The suspiciousness of each isolated statement is assessed based on both the overall test results of the products containing the statement as well as the detailed results of the test cases executed by the statement in these products. On a large dataset of buggy SPL systems, empirical evaluation shows that VarCop significantly improves two state-of-the-art fault localization techniques by 33% and 50% in ranking the incorrect statements in the systems containing a single bug each. In about two-thirds of the cases, VarCop ranks the buggy statements at the top-3 positions in the resulting lists. For multiple-bug cases, VarCop outperforms the state-of-the-art approaches 2 times and 10 times in the proportion of bugs localized at the top-1 positions. In 22% and 65% of the buggy versions, VarCop correctly ranks at least one bug in a system at the top-1 and top-5 positions. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: Published in IEEE Transactions on Software Engineering (Early Access)

arXiv:2108.02929 [pdf]

VinaFood21: A Novel Dataset for Evaluating Vietnamese Food Recognition

Authors: Thuan Trong Nguyen, Thuan Q. Nguyen, Dung Vo, Vi Nguyen, Ngoc Ho, Nguyen D. Vo, Kiet Van Nguyen, Khang Nguyen

Abstract: Vietnam is such an attractive tourist destination with its stunning and pristine landscapes and its top-rated unique food and drink. Among thousands of Vietnamese dishes, foreigners and native people are interested in easy-to-eat tastes and easy-to-do recipes, along with reasonable prices, mouthwatering flavors, and popularity. Due to the diversity and almost all the dishes have significant simila… ▽ More Vietnam is such an attractive tourist destination with its stunning and pristine landscapes and its top-rated unique food and drink. Among thousands of Vietnamese dishes, foreigners and native people are interested in easy-to-eat tastes and easy-to-do recipes, along with reasonable prices, mouthwatering flavors, and popularity. Due to the diversity and almost all the dishes have significant similarities and the lack of quality Vietnamese food datasets, it is hard to implement an auto system to classify Vietnamese food, therefore, make people easier to discover Vietnamese food. This paper introduces a new Vietnamese food dataset named VinaFood21, which consists of 13,950 images corresponding to 21 dishes. We use 10,044 images for model training and 6,682 test images to classify each food in the VinaFood21 dataset and achieved an average accuracy of 74.81% when fine-tuning CNN EfficientNet-B0. (https://github.com/nguyenvd-uit/uit-together-dataset) △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2107.04741 [pdf, other]

doi 10.1145/3461001.3473058

Variability Fault Localization: A Benchmark

Authors: Kien-Tuan Ngo, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo

Abstract: Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to the variability of failures in SPL systems. These unexpected behaviors are caused by variability faults which can only be exposed under some combinations of system features. Although loc… ▽ More Software fault localization is one of the most expensive, tedious, and time-consuming activities in program debugging. This activity becomes even much more challenging in Software Product Line (SPL) systems due to the variability of failures in SPL systems. These unexpected behaviors are caused by variability faults which can only be exposed under some combinations of system features. Although localizing bugs in non-configurable code has been investigated in-depth, variability fault localization in SPL systems still remains mostly unexplored. To approach this challenge, we propose a benchmark for variability fault localization with a large set of 1,570 buggy versions of six SPL systems and baseline variability fault localization performance results. Our hope is to engage the community to propose new and better approaches to the problem of variability fault localization in SPL systems. △ Less

Submitted 21 September, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: Published in Proceedings of the 25th ACM International Systems and Software Product Line Conference - Volume A (SPLC '21)

arXiv:1911.08079 [pdf, other]

doi 10.1007/s00138-020-01086-1

Two-Stream FCNs to Balance Content and Style for Style Transfer

Authors: Duc Minh Vo, Akihiro Sugimoto

Abstract: Style transfer is to render given image contents in given styles, and it has an important role in both computer vision fundamental research and industrial applications. Following the success of deep learning based approaches, this problem has been re-launched recently, but still remains a difficult task because of trade-off between preserving contents and faithful rendering of styles. Indeed, how… ▽ More Style transfer is to render given image contents in given styles, and it has an important role in both computer vision fundamental research and industrial applications. Following the success of deep learning based approaches, this problem has been re-launched recently, but still remains a difficult task because of trade-off between preserving contents and faithful rendering of styles. Indeed, how well-balanced content and style are is crucial in evaluating the quality of stylized images. In this paper, we propose an end-to-end two-stream Fully Convolutional Networks (FCNs) aiming at balancing the contributions of the content and the style in rendered images. Our proposed network consists of the encoder and decoder parts. The encoder part utilizes a FCN for content and a FCN for style where the two FCNs have feature injections and are independently trained to preserve the semantic content and to learn the faithful style representation in each. The semantic content feature and the style representation feature are then concatenated adaptively and fed into the decoder to generate style-transferred (stylized) images. In order to train our proposed network, we employ a loss network, the pre-trained VGG-16, to compute content loss and style loss, both of which are efficiently used for the feature injection as well as the feature concatenation. Our intensive experiments show that our proposed model generates more balanced stylized images in content and style than state-of-the-art methods. Moreover, our proposed network achieves efficiency in speed. △ Less

Submitted 7 May, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: published in Machine Vision and Applications

arXiv:1910.06748 [pdf, other]

Language Identification on Massive Datasets of Short Message using an Attention Mechanism CNN

Authors: Duy Tin Vo, Richard Khoury

Abstract: Language Identification (LID) is a challenging task, especially when the input texts are short and noisy such as posts and statuses on social media or chat logs on gaming forums. The task has been tackled by either designing a feature set for a traditional classifier (e.g. Naive Bayes) or applying a deep neural network classifier (e.g. Bi-directional Gated Recurrent Unit, Encoder-Decoder). These m… ▽ More Language Identification (LID) is a challenging task, especially when the input texts are short and noisy such as posts and statuses on social media or chat logs on gaming forums. The task has been tackled by either designing a feature set for a traditional classifier (e.g. Naive Bayes) or applying a deep neural network classifier (e.g. Bi-directional Gated Recurrent Unit, Encoder-Decoder). These methods are usually trained and tested on a huge amount of private data, then used and evaluated as off-the-shelf packages by other researchers using their own datasets, and consequently the various results published are not directly comparable. In this paper, we first create a new massive labelled dataset based on one year of Twitter data. We use this dataset to test several existing language identification systems, in order to obtain a set of coherent benchmarks, and we make our dataset publicly available so that others can add to this set of benchmarks. Finally, we propose a shallow but efficient neural LID system, which is a ngram-regional convolution neural network enhanced with an attention mechanism. Experimental results show that our architecture is able to predict tens of thousands of samples per second and surpasses all state-of-the-art systems with an improvement of 5%. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: 9 pages, 5 tables, 1 figure

arXiv:1908.01741 [pdf, other]

Visual-Relation Conscious Image Generation from Structured-Text

Authors: Duc Minh Vo, Akihiro Sugimoto

Abstract: We propose an end-to-end network for image generation from given structured-text that consists of the visual-relation layout module and the pyramid of GANs, namely stacking-GANs. Our visual-relation layout module uses relations among entities in the structured-text in two ways: comprehensive usage and individual usage. We comprehensively use all available relations together to localize initial bou… ▽ More We propose an end-to-end network for image generation from given structured-text that consists of the visual-relation layout module and the pyramid of GANs, namely stacking-GANs. Our visual-relation layout module uses relations among entities in the structured-text in two ways: comprehensive usage and individual usage. We comprehensively use all available relations together to localize initial bounding-boxes of all the entities. We also use individual relation separately to predict from the initial bounding-boxes relation-units for all the relations in the input text. We then unify all the relation-units to produce the visual-relation layout, i.e., bounding-boxes for all the entities so that each of them uniquely corresponds to each entity while kee** its involved relations. Our visual-relation layout reflects the scene structure given in the input text. The stacking-GANs is the stack of three GANs conditioned on the visual-relation layout and the output of previous GAN, consistently capturing the scene structure. Our network realistically renders entities' details in high resolution while kee** the scene structure. Experimental results on two public datasets show outperformances of our method against state-of-the-art methods. △ Less

Submitted 18 July, 2020; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: accepted at ECCV 2020

arXiv:1801.07804 [pdf]

Vietnamese Open Information Extraction

Authors: Diem Truong, Duc-Thuan Vo, U. T Nguyen

Abstract: Open information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language have been created but there is not any system for the Vietnamese language. In this paper, we propose a method of OIE for Vietnamese using a clause… ▽ More Open information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language have been created but there is not any system for the Vietnamese language. In this paper, we propose a method of OIE for Vietnamese using a clause-based approach. Accordingly, we exploit Vietnamese dependency parsing using grammar clauses that strives to consider all possible relations in a sentence. The corresponding clause types are identified by their propositions as extractable relations based on their grammatical functions of constituents. As a result, our system is the first OIE system named vnOIE for the Vietnamese language that can generate open relations and their arguments from Vietnamese text with highly scalable extraction while being domain independent. Experimental results show that our OIE system achieves promising results with a precision of 83.71%. △ Less

Submitted 23 January, 2018; originally announced January 2018.

arXiv:1704.03402 [pdf, other]

Next Generation Business Intelligence and Analytics: A Survey

Authors: Quoc Duy Vo, Jaya Thomas, Shinyoung Cho, Pradipta De, Bong Jun Choi, Lee Sael

Abstract: Business Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the… ▽ More Business Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\&A are evolving to provide intelligence at different scales and perspectives - operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\&A will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\&A solutions. △ Less

Submitted 11 April, 2017; originally announced April 2017.

Comments: 11 pages, 4 figures

arXiv:1701.03942 [pdf, other]

doi 10.1145/2908131.2908165

Can We Find Documents in Web Archives without Knowing their Contents?

Authors: Khoi Duy Vo, Tuan Tran, Tu Ngoc Nguyen, Xiaofei Zhu, Wolfgang Nejdl

Abstract: Recent advances of preservation technologies have led to an increasing number of Web archive systems and collections. These collections are valuable to explore the past of the Web, but their value can only be uncovered with effective access and exploration mechanisms. Ideal search and rank- ing methods must be robust to the high redundancy and the temporal noise of contents, as well as scalable to… ▽ More Recent advances of preservation technologies have led to an increasing number of Web archive systems and collections. These collections are valuable to explore the past of the Web, but their value can only be uncovered with effective access and exploration mechanisms. Ideal search and rank- ing methods must be robust to the high redundancy and the temporal noise of contents, as well as scalable to the huge amount of data archived. Despite several attempts in Web archive search, facilitating access to Web archive still remains a challenging problem. In this work, we conduct a first analysis on different ranking strategies that exploit evidences from metadata instead of the full content of documents. We perform a first study to compare the usefulness of non-content evidences to Web archive search, where the evidences are mined from the metadata of file headers, links and URL strings only. Based on these findings, we propose a simple yet surprisingly effective learning model that combines multiple evidences to distinguish "good" from "bad" search results. We conduct empirical experiments quantitatively as well as qualitatively to confirm the validity of our proposed method, as a first step towards better ranking in Web archives taking meta- data into account. △ Less

Submitted 14 January, 2017; originally announced January 2017.

Comments: Published via ACM to Websci 2015

ACM Class: H.3.1

arXiv:1607.02784 [pdf]

Open Information Extraction

Authors: Duc-Thuan Vo, Ebrahim Bagheri

Abstract: Open Information Extraction (Open IE) systems aim to obtain relation tuples with highly scalable extraction in portable across domain by identifying a variety of relation phrases and their arguments in arbitrary sentences. The first generation of Open IE learns linear chain models based on unlexicalized features such as Part-of-Speech (POS) or shallow tags to label the intermediate words between p… ▽ More Open Information Extraction (Open IE) systems aim to obtain relation tuples with highly scalable extraction in portable across domain by identifying a variety of relation phrases and their arguments in arbitrary sentences. The first generation of Open IE learns linear chain models based on unlexicalized features such as Part-of-Speech (POS) or shallow tags to label the intermediate words between pair of potential arguments for identifying extractable relations. Open IE currently is developed in the second generation that is able to extract instances of the most frequently observed relation types such as Verb, Noun and Prep, Verb and Prep, and Infinitive with deep linguistic analysis. They expose simple yet principled ways in which verbs express relationships in linguistics such as verb phrase-based extraction or clause-based extraction. They obtain a significantly higher performance over previous systems in the first generation. In this paper, we describe an overview of two Open IE generations including strengths, weaknesses and application areas. △ Less

Submitted 10 July, 2016; originally announced July 2016.

Comments: This paper will appear in the Encyclopedia for Semantic Computing

arXiv:1512.04785 [pdf, other]

On Deep Representation Learning from Noisy Web Images

Authors: Phong D. Vo, Alexandru Ginsca, Hervé Le Borgne, Adrian Popescu

Abstract: The keep-growing content of Web images may be the next important data source to scale up deep neural networks, which recently obtained a great success in the ImageNet classification challenge and related tasks. This prospect, however, has not been validated on convolutional networks (convnet) -- one of best performing deep models -- because of their supervised regime. While unsupervised alternativ… ▽ More The keep-growing content of Web images may be the next important data source to scale up deep neural networks, which recently obtained a great success in the ImageNet classification challenge and related tasks. This prospect, however, has not been validated on convolutional networks (convnet) -- one of best performing deep models -- because of their supervised regime. While unsupervised alternatives are not so good as convnet in generalizing the learned model to new domains, we use convnet to leverage semi-supervised representation learning. Our approach is to use massive amounts of unlabeled and noisy Web images to train convnets as general feature detectors despite challenges coming from data such as high level of mislabeled data, outliers, and data biases. Extensive experiments are conducted at several data scales, different network architectures, and data reranking techniques. The learned representations are evaluated on nine public datasets of various topics. The best results obtained by our convnets, trained on 3.14 million Web images, outperform AlexNet trained on 1.2 million clean images of ILSVRC 2012 and is closing the gap with VGG-16. These prominent results suggest a budget solution to use deep learning in practice and motivate more research in semi-supervised representation learning. △ Less

Submitted 15 July, 2016; v1 submitted 15 December, 2015; originally announced December 2015.

Showing 1–44 of 44 results for author: Vo, D