Search | arXiv e-print repository

arXiv:2312.09000 [pdf, ps, other]

ComOM at VLSP 2023: A Dual-Stage Framework with BERTology and Unified Multi-Task Instruction Tuning Model for Vietnamese Comparative Opinion Mining

Authors: Dang Van Thin, Duong Ngoc Hao, Ngan Luu-Thuy Nguyen

Abstract: The ComOM shared task aims to extract comparative opinions from product reviews in Vietnamese language. There are two sub-tasks, including (1) Comparative Sentence Identification (CSI) and (2) Comparative Element Extraction (CEE). The first task is to identify whether the input is a comparative review, and the purpose of the second task is to extract the quintuplets mentioned in the comparative re… ▽ More The ComOM shared task aims to extract comparative opinions from product reviews in Vietnamese language. There are two sub-tasks, including (1) Comparative Sentence Identification (CSI) and (2) Comparative Element Extraction (CEE). The first task is to identify whether the input is a comparative review, and the purpose of the second task is to extract the quintuplets mentioned in the comparative review. To address this task, our team proposes a two-stage system based on fine-tuning a BERTology model for the CSI task and unified multi-task instruction tuning for the CEE task. Besides, we apply the simple data augmentation technique to increase the size of the dataset for training our model in the second stage. Experimental results show that our approach outperforms the other competitors and has achieved the top score on the official private test. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted manuscript at VLSP 2023

arXiv:2311.11732 [pdf, other]

Study change of the performance of airfoil of small wind turbine under low wind speed by CFD simulation

Authors: Le Quang Sang, Dinh Van Thin, Nguyen Huu Duc, Nguyen Duc Minh, Doan Hong Quan, Le Thi Thuy Hang

Abstract: Renewable energy has received strong attention and investment to replace fossil energy sources and reduce greenhouse gas emissions. Quite good and good wind speed areas have been invested in building large-capacity wind farms for many years. The low wind speed region occupies a very large on the world, which has been interested in the exploitation of wind energy in recent years. In this study, the… ▽ More Renewable energy has received strong attention and investment to replace fossil energy sources and reduce greenhouse gas emissions. Quite good and good wind speed areas have been invested in building large-capacity wind farms for many years. The low wind speed region occupies a very large on the world, which has been interested in the exploitation of wind energy in recent years. In this study, the original airfoil of S1010 operated at low wind speed was redesigned to increase the aerodynamic efficiency of the airfoil by using XFLR5 software. After, the new VAST-EPU-S1010 airfoil model was adjusted to the maximum thickness and the maximum thickness position. It was simulated in low wind speed conditions of 4-6 m/s by CFD simulation. The lift coefficient, drag coefficient and $C_{L}$/$C_{D}$ coefficient ratio were evaluated under the effect of the angle of attack and the maximum thickness by using the $k-ε$ model. Simulation results show that the VAST-EPU-S1010 airfoil achieved the greatest aerodynamic efficiency at the angle of attack of $3\,^{\circ}$, the maximum thickness of 8\% and the maximum thickness position of 20.32\%. The maximum value of $C_{L}$/$C_{D}$ of the new airfoil at 6 m/s is higher than at the 4 m/s by about 6.25\%. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 19 pages, 21 figures

MSC Class: 76B10 (Primary); 65B05 (Secondary) ACM Class: G.1.6; I.6.3; I.4.0

arXiv:2110.00156 [pdf, other]

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

Authors: Duc-Vu Nguyen, Linh-Bao Vo, Dang Van Thin, Ngan Luu-Thuy Nguyen

Abstract: In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG. We compare the span labeling approach with the conditional random field by using encoders with the same architecture. Since Vietnamese and Chinese have similar linguistic phenomena, we evaluated the proposed method on the Vietnamese treebank benchmark dataset and five C… ▽ More In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG. We compare the span labeling approach with the conditional random field by using encoders with the same architecture. Since Vietnamese and Chinese have similar linguistic phenomena, we evaluated the proposed method on the Vietnamese treebank benchmark dataset and five Chinese benchmark datasets. Through our experimental results, the proposed approach SpanSeg achieves higher performance than the sequence tagging approach with the state-of-the-art F-score of 98.31% on the Vietnamese treebank benchmark, when they both apply the contextual pre-trained language model XLM-RoBERTa and the predicted word boundary information. Besides, we do fine-tuning experiments for the span labeling approach on BERT and ZEN pre-trained language model for Chinese with fewer parameters, faster inference time, and competitive or higher F-scores than the previous state-of-the-art approach, word segmentation with word-hood memory networks, on five Chinese benchmarks. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: In Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2021)

arXiv:2103.09519 [pdf, other]

Investigating Monolingual and Multilingual BERTModels for Vietnamese Aspect Category Detection

Authors: Dang Van Thin, Lac Si Le, Vu Xuan Hoang, Ngan Luu-Thuy Nguyen

Abstract: Aspect category detection (ACD) is one of the challenging tasks in the Aspect-based sentiment Analysis problem. The purpose of this task is to identify the aspect categories mentioned in user-generated reviews from a set of pre-defined categories. In this paper, we investigate the performance of various monolingual pre-trained language models compared with multilingual models on the Vietnamese asp… ▽ More Aspect category detection (ACD) is one of the challenging tasks in the Aspect-based sentiment Analysis problem. The purpose of this task is to identify the aspect categories mentioned in user-generated reviews from a set of pre-defined categories. In this paper, we investigate the performance of various monolingual pre-trained language models compared with multilingual models on the Vietnamese aspect category detection problem. We conduct the experiments on two benchmark datasets for the restaurant and hotel domain. The experimental results demonstrated the effectiveness of the monolingual PhoBERT model than others on two datasets. We also evaluate the performance of the multilingual model based on the combination of whole SemEval-2016 datasets in other languages with the Vietnamese dataset. To the best of our knowledge, our research study is the first attempt at performing various available pre-trained language models on aspect category detection task and utilize the datasets from other languages based on multilingual models. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: 6 pages, 1 figure

arXiv:2006.07804 [pdf, other]

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

Authors: Duc-Vu Nguyen, Dang Van Thin, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: In this paper, we approach Vietnamese word segmentation as a binary classification by using the Support Vector Machine classifier. We inherit features from prior works such as n-gram of syllables, n-gram of syllable types, and checking conjunction of adjacent syllables in the dictionary. We propose two novel ways to feature extraction, one to reduce the overlap ambiguity and the other to increase… ▽ More In this paper, we approach Vietnamese word segmentation as a binary classification by using the Support Vector Machine classifier. We inherit features from prior works such as n-gram of syllables, n-gram of syllable types, and checking conjunction of adjacent syllables in the dictionary. We propose two novel ways to feature extraction, one to reduce the overlap ambiguity and the other to increase the ability to predict unknown words containing suffixes. Different from UETsegmenter and RDRsegmenter, two state-of-the-art Vietnamese word segmentation methods, we do not employ the longest matching algorithm as an initial processing step or any post-processing technique. According to experimental results on benchmark Vietnamese datasets, our proposed method obtained a better F1-score than the prior state-of-the-art methods UETsegmenter, and RDRsegmenter. △ Less

Submitted 14 June, 2020; originally announced June 2020.

Comments: In Proceedings of the 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019)

Showing 1–5 of 5 results for author: Van Thin, D