Skip to main content

Showing 1–14 of 14 results for author: Le-Hong, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2010.03424  [pdf, other

    cs.CL

    Cross-lingual Extended Named Entity Classification of Wikipedia Articles

    Authors: The Viet Bui, Phuong Le-Hong

    Abstract: The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-15 SHINRA task. This paper describes our method to solving the problem and discusses the official results. Our method focuses on learning cross-lingual representations, both on the word level and document level for page classification. We propose a three-stage approach including multilingual model pre-training, monolingual mode… ▽ More

    Submitted 17 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to NTCIR-15

  2. arXiv:2006.15994  [pdf, ps, other

    cs.CL

    Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models

    Authors: Viet Bui The, Oanh Tran Thi, Phuong Le-Hong

    Abstract: This paper describes our study on using mutilingual BERT embeddings and some new neural models for improving sequence tagging tasks for the Vietnamese language. We propose new model architectures and evaluate them extensively on two named entity recognition datasets of VLSP 2016 and VLSP 2018, and on two part-of-speech tagging datasets of VLSP 2010 and VLSP 2013. Our proposed models outperform exi… ▽ More

    Submitted 25 September, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Accepted at the Conference PACLIC 2020

  3. arXiv:1909.02265  [pdf, ps, other

    cs.CL

    Towards Task-Oriented Dialogue in Mixed Domains

    Authors: Tho Luong Chi, Phuong Le-Hong

    Abstract: This work investigates the task-oriented dialogue problem in mixed-domain settings. We study the effect of alternating between different domains in sequences of dialogue turns using two related state-of-the-art dialogue systems. We first show that a specialized state tracking component in multiple domains plays an important role and gives better results than an end-to-end task-oriented dialogue sy… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted for conference PACLING 2019

  4. arXiv:1810.01656  [pdf, ps, other

    cs.CL

    A Comparative Study of Neural Network Models for Sentence Classification

    Authors: Phuong Le-Hong, Anh-Cuong Le

    Abstract: This paper presents an extensive comparative study of four neural network models, including feed-forward networks, convolutional networks, recurrent networks and long short-term memory networks, on two sentence classification datasets of English and Vietnamese text. We show that on the English dataset, the convolutional network models without any feature engineering outperform some competitive sen… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

    Comments: To appear in the 5th NAFOSTED Conference on Information and Computer Science

  5. A Factoid Question Answering System for Vietnamese

    Authors: Phuong Le-Hong, Duc-Thien Bui

    Abstract: In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality map**s from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating… ▽ More

    Submitted 28 March, 2018; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: In the proceedings of the HQA'18 workshop, The Web Conference Companion, Lyon, France

  6. arXiv:1711.10124  [pdf, ps, other

    cs.CL

    Vietnamese Semantic Role Labelling

    Authors: Phuong Le-Hong, Thai Hoang Pham, Xuan Khoai Pham, Thi Minh Huyen Nguyen, Thi Luong Nguyen, Minh Hiep Nguyen

    Abstract: In this paper, we study semantic role labelling (SRL), a subtask of semantic parsing of natural language sentences and its application for the Vietnamese language. We present our effort in building Vietnamese PropBank, the first Vietnamese SRL corpus and a software system for labelling semantic roles of Vietnamese texts. In particular, we present a novel constituent extraction algorithm in the arg… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

    Comments: Accepted to the VNU Journal of Science

  7. arXiv:1709.07104  [pdf, ps, other

    cs.CL

    On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration

    Authors: Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong

    Abstract: This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-th… ▽ More

    Submitted 26 October, 2017; v1 submitted 20 September, 2017; originally announced September 2017.

    Comments: 4 pages, 2 figures, 4 tables, accepted to IALP 2017

  8. arXiv:1708.09163  [pdf, ps, other

    cs.CL

    An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing

    Authors: Phuong Le-Hong, Minh Pham Quang Nhat, Thai-Hoang Pham, Tuan-Anh Tran, Dang-Minh Nguyen

    Abstract: This paper presents an empirical study of two widely-used sequence prediction models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks (LSTMs), on two fundamental tasks for Vietnamese text processing, including part-of-speech tagging and named entity recognition. We show that a strong lower bound for labeling accuracy can be obtained by relying only on simple word-based feature… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

    Comments: To appear in the Proceedings of the 9th International Conference on Knowledge and Systems Engineering (KSE) 2017

  9. arXiv:1708.07241  [pdf, other

    cs.CL

    NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit

    Authors: Thai-Hoang Pham, Xuan-Khoai Pham, Tuan-Anh Nguyen, Phuong Le-Hong

    Abstract: This paper demonstrates neural network-based toolkit namely NNVLP for essential Vietnamese language processing tasks including part-of-speech (POS) tagging, chunking, named entity recognition (NER). Our toolkit is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which ach… ▽ More

    Submitted 19 October, 2017; v1 submitted 23 August, 2017; originally announced August 2017.

    Comments: 4 pages, 5 figures, 6 tables, accepted to IJCNLP 2017

  10. arXiv:1705.10610  [pdf, ps, other

    cs.CL

    The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition

    Authors: Thai-Hoang Pham, Phuong Le-Hong

    Abstract: This paper presents a state-of-the-art system for Vietnamese Named Entity Recognition (NER). By incorporating automatic syntactic features with word embeddings as input for bidirectional Long Short-Term Memory (Bi-LSTM), our system, although simpler than some deep learning architectures, achieves a much better result for Vietnamese NER. The proposed method achieves an overall F1 score of 92.05% on… ▽ More

    Submitted 27 August, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

    Comments: 7 pages, 9 tables, 3 figures, accepted to PACLIC 2017

  11. arXiv:1705.04044  [pdf, ps, other

    cs.CL

    End-to-end Recurrent Neural Network Models for Vietnamese Named Entity Recognition: Word-level vs. Character-level

    Authors: Thai-Hoang Pham, Phuong Le-Hong

    Abstract: This paper demonstrates end-to-end neural network architectures for Vietnamese named entity recognition. Our best model is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which achieves an F1 score of 88.59% on a standard test set. Our system is able to achieve a compara… ▽ More

    Submitted 20 July, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

    Comments: 14 pages, 5 figures, 7 tables, accepted to PACLING 2017, fix CRF formular

  12. arXiv:1705.04038  [pdf, ps, other

    cs.CL

    Building a Semantic Role Labelling System for Vietnamese

    Authors: Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong

    Abstract: Semantic role labelling (SRL) is a task in natural language processing which detects and classifies the semantic arguments associated with the predicates of a sentence. It is an important step towards understanding the meaning of a natural language. There exists SRL systems for well-studied languages like English, Chinese or Japanese but there is not any such system for the Vietnamese language. In… ▽ More

    Submitted 11 May, 2017; originally announced May 2017.

    Comments: 8 pages, ICDIM 2015

  13. arXiv:1705.04003  [pdf, ps, other

    cs.CL

    Content-based Approach for Vietnamese Spam SMS Filtering

    Authors: Thai-Hoang Pham, Phuong Le-Hong

    Abstract: Short Message Service (SMS) spam is a serious problem in Vietnam because of the availability of very cheap pre-paid SMS packages. There are some systems to detect and filter spam messages for English, most of which use machine learning techniques to analyze the content of messages and classify them. For Vietnamese, there is some research on spam email filtering but none focused on SMS. In this wor… ▽ More

    Submitted 11 May, 2017; originally announced May 2017.

    Comments: 4 pages, IALP 2016

  14. arXiv:1610.05652  [pdf, ps, other

    cs.CL

    Vietnamese Named Entity Recognition using Token Regular Expressions and Bidirectional Inference

    Authors: Phuong Le-Hong

    Abstract: This paper describes an efficient approach to improve the accuracy of a named entity recognition system for Vietnamese. The approach combines regular expressions over tokens and a bidirectional inference method in a sequence labelling model. The proposed method achieves an overall $F_1$ score of 89.66% on a test set of an evaluation campaign, organized in late 2016 by the Vietnamese Language and S… ▽ More

    Submitted 19 October, 2016; v1 submitted 18 October, 2016; originally announced October 2016.

    Comments: Submitted to the VLSP Workshop 2016