-
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Authors:
Florian Lux,
Sarina Meyer,
Lyonel Behringer,
Frank Zalkow,
Phat Do,
Matt Coler,
Emanuël A. P. Habets,
Ngoc Thang Vu
Abstract:
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech syn…
▽ More
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting
Authors:
Kim Hoang Tran,
Phuc Vuong Do,
Ngoc Quoc Ly,
Ngan Le
Abstract:
Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuance…
▽ More
Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuances of the scene and struggle with detecting actions that occupy a small portion of the frame. In particular, they face difficulties when dealing with action classes involving small objects, such as balls or yellow/red cards in soccer, which only occupy a fraction of the screen space. To address these challenges, we introduce a novel approach that analyzes and models scene entities using an adaptive attention mechanism. Particularly, our model disentangles the scene content into the global environment feature and local relevant scene entities feature. To efficiently extract environmental features while considering temporal information with less computational cost, we propose the use of a 2D backbone network with a time-shift mechanism. To accurately capture relevant scene entities, we employ a Vision-Language model in conjunction with the adaptive attention mechanism. Our model has demonstrated outstanding performance, securing the 1st place in the SoccerNet-v2 Action Spotting, FineDiving, and FineGym challenge with a substantial performance improvement of 1.6, 2.0, and 1.3 points in avg-mAP compared to the runner-up methods. Furthermore, our approach offers interpretability capabilities in contrast to other deep learning models, which are often designed as black boxes. Our code and models are released at: https://github.com/Fsoft-AIC/unifying-global-local-feature.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding
Authors:
Phong Nguyen-Thuan Do,
Son Quoc Tran,
Phu Gia Hoang,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks. To establish a standardized set of benchmarks for Vietnamese NLU, we introduce the first Vietnamese Language Understanding Evaluation (VLUE) benchm…
▽ More
The success of Natural Language Understanding (NLU) benchmarks in various languages, such as GLUE for English, CLUE for Chinese, KLUE for Korean, and IndoNLU for Indonesian, has facilitated the evaluation of new NLU models across a wide range of tasks. To establish a standardized set of benchmarks for Vietnamese NLU, we introduce the first Vietnamese Language Understanding Evaluation (VLUE) benchmark. The VLUE benchmark encompasses five datasets covering different NLU tasks, including text classification, span extraction, and natural language understanding. To provide an insightful overview of the current state of Vietnamese NLU, we then evaluate seven state-of-the-art pre-trained models, including both multilingual and Vietnamese monolingual models, on our proposed VLUE benchmark. Furthermore, we present CafeBERT, a new state-of-the-art pre-trained model that achieves superior results across all tasks in the VLUE benchmark. Our model combines the proficiency of a multilingual pre-trained model with Vietnamese linguistic knowledge. CafeBERT is developed based on the XLM-RoBERTa model, with an additional pretraining step utilizing a significant amount of Vietnamese textual data to enhance its adaptation to the Vietnamese language. For the purpose of future research, CafeBERT is made publicly available for research purposes.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Authors:
Dinh Phat Do,
Taehoon Kim,
Jaemin Na,
Jiwon Kim,
Keonho Lee,
Kyunghwan Cho,
Wonjun Hwang
Abstract:
Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situat…
▽ More
Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T .
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions
Authors:
Son Quoc Tran,
Gia-Huy Do,
Phong Nguyen-Thuan Do,
Matt Kretchmar,
Xinya Du
Abstract:
The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lac…
▽ More
The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lack valid responses. However, manually annotating unanswerable questions is labor-intensive. To address this, we propose AGent, a novel pipeline that automatically creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer. In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA. These created question sets exhibit low error rates. Additionally, models fine-tuned on these questions show comparable performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA benchmarks.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Map**, Features Input, and Source Language Selection
Authors:
Phat Do,
Matt Coler,
Jelske Dijkstra,
Esther Klabbers
Abstract:
We compare using a PHOIBLE-based phone map** method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We u…
▽ More
We compare using a PHOIBLE-based phone map** method and using phonological features input in transfer learning for TTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) and target languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek) to test the language-independence of the methods and enhance the findings' applicability. We use Character Error Rates from automatic speech recognition and predicted Mean Opinion Scores for evaluation. Results show that both phone map** and features input improve the output quality and the latter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) with a family tree-based distance measure as a criterion to select source languages in transfer learning. ASPF proves effective if label-based phone input is used, while the language distance does not have expected effects.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
Authors:
Phat Do,
Matt Coler,
Jelske Dijkstra,
Esther Klabbers
Abstract:
We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a)…
▽ More
We compare phone labels and articulatory features as input for cross-lingual transfer learning in text-to-speech (TTS) for low-resource languages (LRLs). Experiments with FastSpeech 2 and the LRL West Frisian show that using articulatory features outperformed using phone labels in both intelligibility and naturalness. For LRLs without pronunciation dictionaries, we propose two novel approaches: a) using a massively multilingual model to convert grapheme-to-phone (G2P) in both training and synthesizing, and b) using a universal phone recognizer to create a makeshift dictionary. Results show that the G2P approach performs largely on par with using a ground-truth dictionary and the phone recognition approach, while performing generally worse, remains a viable option for LRLs less suitable for the G2P approach. Within each approach, using articulatory features as input outperforms using phone labels.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Resource-Efficient Fine-Tuning Strategies for Automatic MOS Prediction in Text-to-Speech for Low-Resource Languages
Authors:
Phat Do,
Matt Coler,
Jelske Dijkstra,
Esther Klabbers
Abstract:
We train a MOS prediction model based on wav2vec 2.0 using the open-access data sets BVCC and SOMOS. Our test with neural TTS data in the low-resource language (LRL) West Frisian shows that pre-training on BVCC before fine-tuning on SOMOS leads to the best accuracy for both fine-tuned and zero-shot prediction. Further fine-tuning experiments show that using more than 30 percent of the total data d…
▽ More
We train a MOS prediction model based on wav2vec 2.0 using the open-access data sets BVCC and SOMOS. Our test with neural TTS data in the low-resource language (LRL) West Frisian shows that pre-training on BVCC before fine-tuning on SOMOS leads to the best accuracy for both fine-tuned and zero-shot prediction. Further fine-tuning experiments show that using more than 30 percent of the total data does not lead to significant improvements. In addition, fine-tuning with data from a single listener shows promising system-level accuracy, supporting the viability of one-participant pilot tests. These findings can all assist the resource-conscious development of TTS for LRLs by progressing towards better zero-shot MOS prediction and informing the design of listening tests, especially in early-stage evaluation.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension
Authors:
Son Quoc Tran,
Phong Nguyen-Thuan Do,
Kiet Van Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in develo** Vietnamese language models.…
▽ More
Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in develo** Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for develo** Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models
Authors:
Son Quoc Tran,
Phong Nguyen-Thuan Do,
Uyen Le,
Matt Kretchmar
Abstract:
Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against…
▽ More
Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to be any more robust than ones fine-tuned on SQuAD 1.1, yet they reveal a measure of hidden robustness that can be leveraged to realize actual performance gains. Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0 extends to additional out-of-domain datasets. Finally, we introduce a new adversarial attack to reveal artifacts of SQuAD 2.0 that current MRC models are learning.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
A Deep Reinforcement Learning-based Adaptive Charging Policy for WRSNs
Authors:
Ngoc Bui,
Phi Le Nguyen,
Viet Anh Nguyen,
Phan Thuan Do
Abstract:
Wireless sensor networks consist of randomly distributed sensor nodes for monitoring targets or areas of interest. Maintaining the network for continuous surveillance is a challenge due to the limited battery capacity in each sensor. Wireless power transfer technology is emerging as a reliable solution for energizing the sensors by deploying a mobile charger (MC) to recharge the sensor. However, d…
▽ More
Wireless sensor networks consist of randomly distributed sensor nodes for monitoring targets or areas of interest. Maintaining the network for continuous surveillance is a challenge due to the limited battery capacity in each sensor. Wireless power transfer technology is emerging as a reliable solution for energizing the sensors by deploying a mobile charger (MC) to recharge the sensor. However, designing an optimal charging path for the MC is challenging because of uncertainties arising in the networks. The energy consumption rate of the sensors may fluctuate significantly due to unpredictable changes in the network topology, such as node failures. These changes also lead to shifts in the importance of each sensor, which are often assumed to be the same in existing works. We address these challenges in this paper by proposing a novel adaptive charging scheme using a deep reinforcement learning (DRL) approach. Specifically, we endow the MC with a charging policy that determines the next sensor to charge conditioning on the current state of the network. We then use a deep neural network to parametrize this charging policy, which will be trained by reinforcement learning techniques. Our model can adapt to spontaneous changes in the network topology. The empirical results show that the proposed algorithm outperforms the existing on-demand algorithms by a significant margin.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source
Authors:
Kiet Van Nguyen,
Phong Nguyen-Thuan Do,
Nhat Duy Nguyen,
Tin Van Huynh,
Anh Gia-Tuan Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engi…
▽ More
Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
△ Less
Submitted 13 August, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Efficient algorithms for maximum induced matching problem in permutation and trapezoid graphs
Authors:
Viet Dung Nguyen,
Ba Thai Pham,
Phan Thuan Do
Abstract:
We first design an $\mathcal{O}(n^2)$ solution for finding a maximum induced matching in permutation graphs given their permutation models, based on a dynamic programming algorithm with the aid of the sweep line technique. With the support of the disjoint-set data structure, we improve the complexity to $\mathcal{O}(m + n)$. Consequently, we extend this result to give an $\mathcal{O}(m + n)$ algor…
▽ More
We first design an $\mathcal{O}(n^2)$ solution for finding a maximum induced matching in permutation graphs given their permutation models, based on a dynamic programming algorithm with the aid of the sweep line technique. With the support of the disjoint-set data structure, we improve the complexity to $\mathcal{O}(m + n)$. Consequently, we extend this result to give an $\mathcal{O}(m + n)$ algorithm for the same problem in trapezoid graphs. By combining our algorithms with the current best graph identification algorithms, we can solve the MIM problem in permutation and trapezoid graphs in linear and $\mathcal{O}(n^2)$ time, respectively. Our results are far better than the best known $\mathcal{O}(mn)$ algorithm for the maximum induced matching problem in both graph classes, which was proposed by Habib et al.
△ Less
Submitted 4 November, 2021; v1 submitted 18 July, 2021;
originally announced July 2021.
-
Sentence Extraction-Based Machine Reading Comprehension for Vietnamese
Authors:
Phong Nguyen-Thuan Do,
Nhat Duy Nguyen,
Tin Van Huynh,
Kiet Van Nguyen,
Anh Gia-Tuan Nguyen,
Ngan Luu-Thuy Nguyen
Abstract:
The development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading comprehension tasks in Vietnamese with large sizes, such as UIT-ViQuAD and UIT-ViNewsQA. However, the datasets are not diverse in answers to serve the research. In t…
▽ More
The development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading comprehension tasks in Vietnamese with large sizes, such as UIT-ViQuAD and UIT-ViNewsQA. However, the datasets are not diverse in answers to serve the research. In this paper, we introduce UIT-ViWikiQA, the first dataset for evaluating sentence extraction-based machine reading comprehension in the Vietnamese language. The UIT-ViWikiQA dataset is converted from the UIT-ViQuAD dataset, consisting of comprises 23.074 question-answers based on 5.109 passages of 174 Wikipedia Vietnamese articles. We propose a conversion algorithm to create the dataset for sentence extraction-based machine reading comprehension and three types of approaches for sentence extraction-based machine reading comprehension in Vietnamese. Our experiments show that the best machine model is XLM-R_Large, which achieves an exact match (EM) of 85.97% and an F1-score of 88.77% on our dataset. Besides, we analyze experimental results in terms of the question type in Vietnamese and the effect of context on the performance of the MRC models, thereby showing the challenges from the UIT-ViWikiQA dataset that we propose to the language processing community.
△ Less
Submitted 11 June, 2021; v1 submitted 19 May, 2021;
originally announced May 2021.
-
The equidistribution of some Mahonian statistics over permutations avoiding a pattern of length three
Authors:
Phan Thuan Do,
Thi Thu Huong Tran,
Vincent Vajnovszki
Abstract:
We prove the equidistribution of several multistatistics over some classes of permutations avoiding a $3$-length pattern. We deduce the equidistribution, on the one hand of inv and foze" statistics, and on the other hand that of maj and makl statistics, over these classes of pattern avoiding permutations. Here inv and maj are the celebrated Mahonian statistics, foze" is one of the statistics defin…
▽ More
We prove the equidistribution of several multistatistics over some classes of permutations avoiding a $3$-length pattern. We deduce the equidistribution, on the one hand of inv and foze" statistics, and on the other hand that of maj and makl statistics, over these classes of pattern avoiding permutations. Here inv and maj are the celebrated Mahonian statistics, foze" is one of the statistics defined in terms of generalized patterns in the 2000 pioneering paper of Babson and Steingrímsson, and makl is one of the statistics defined by Clarke, Steingrímsson and Zeng in 1997. These results solve several conjectures posed by Amini in 2018.
△ Less
Submitted 11 August, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
A Virtual Network Customization Framework for Multicast Services in NFV-enabled Core Networks
Authors:
Omar Alhussein,
Phu Thinh Do,
Qiang Ye,
Junling Li,
Weisen Shi,
Weihua Zhuang,
Xuemin,
Shen,
Xu Li,
Jaya Rao
Abstract:
The paradigm of network function virtualization (NFV) with the support of software defined networking (SDN) emerges as a promising approach for customizing network services in fifth generation (5G) networks. In this paper, a multicast service orchestration framework is presented, where joint traffic routing and virtual network function (NF) placement are studied for accommodating multicast service…
▽ More
The paradigm of network function virtualization (NFV) with the support of software defined networking (SDN) emerges as a promising approach for customizing network services in fifth generation (5G) networks. In this paper, a multicast service orchestration framework is presented, where joint traffic routing and virtual network function (NF) placement are studied for accommodating multicast services over an NFV-enabled physical substrate network. First, we investigate a joint routing and NF placement problem for a single multicast request accommodated over a physical substrate network, with both single-path and multipath traffic routing. The joint problem is formulated as a mixed integer linear programming (MILP) problem to minimize the function and link provisioning costs, under the physical network resource constraints, flow conservation constraints, and NF placement rules; Second, we develop an MILP formulation that jointly handles the static embedding of multiple service requests over the physical substrate network, where we determine the optimal combination of multiple services for embedding and their joint routing and placement configurations, such that the aggregate throughput of the physical substrate is maximized, while the function and link provisioning costs are minimized. Since the presented problem formulations are NP-hard, low complexity heuristic algorithms are proposed to find an efficient solution for both single-path and multipath routing scenarios. Simulation results are presented to demonstrate the effectiveness and accuracy of the proposed heuristic algorithms.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
An SDN-Based Transmission Protocol with In-Path Packet Caching and Retransmission
Authors:
Jiayin Chen,
Si Yan,
Qiang Ye,
Wei Quan,
Phu Thinh Do,
Weihua Zhuang,
Xuemin,
Shen,
Xu Li,
Jaya Rao
Abstract:
In this paper, a comprehensive software-defined networking (SDN) based transmission protocol (SDTP) is presented for fifth generation (5G) communication networks, where an SDN controller gathers network state information from the physical network to improve data transmission efficiency between end hosts, with in-path packet retransmission. In the SDTP, we first develop a new two-way handshake mech…
▽ More
In this paper, a comprehensive software-defined networking (SDN) based transmission protocol (SDTP) is presented for fifth generation (5G) communication networks, where an SDN controller gathers network state information from the physical network to improve data transmission efficiency between end hosts, with in-path packet retransmission. In the SDTP, we first develop a new two-way handshake mechanism for connection establishment between a pair of end host. With the aid of SDN control module, signaling exchanges for establishing E2E connections are migrated to the control plane to improve resource utilization in the data plane. A new SDTP packet header format is designed to support efficient data transmission with in-path packet caching and packet retransmission. Based on the new data packet format, a novel in-path receiver-based packet loss detection and caching-based packet retransmission scheme is proposed to achieve in-path fast recovery of lost packets. Extensive simulation results are presented to validate the effectiveness of the proposed protocol in terms of low connection establishment delay and low end-to-end packet transmission delay.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
Accuracy, Uncertainty, and Adaptability of Automatic Myocardial ASL Segmentation using Deep CNN
Authors:
Hung P. Do,
Yi Guo,
Andrew J. Yoon,
Krishna S. Nayak
Abstract:
PURPOSE: To apply deep CNN to the segmentation task in myocardial arterial spin labeled (ASL) perfusion imaging and to develop methods that measure uncertainty and that adapt the CNN model to a specific false positive vs. false negative tradeoff.
METHODS: The Monte Carlo dropout (MCD) U-Net was trained on data from 22 subjects and tested on data from 6 heart transplant recipients. Manual segment…
▽ More
PURPOSE: To apply deep CNN to the segmentation task in myocardial arterial spin labeled (ASL) perfusion imaging and to develop methods that measure uncertainty and that adapt the CNN model to a specific false positive vs. false negative tradeoff.
METHODS: The Monte Carlo dropout (MCD) U-Net was trained on data from 22 subjects and tested on data from 6 heart transplant recipients. Manual segmentation and regional myocardial blood flow (MBF) were available for comparison. We consider two global uncertainty measures, named Dice Uncertainty and MCD Uncertainty, which were calculated with and without the use of manual segmentation, respectively. Tversky loss function with a hyperparameter $β$ was used to adapt the model to a specific false positive vs. false negative tradeoff.
RESULTS: The MCD U-Net achieved Dice coefficient of mean(std) = 0.91(0.04) on the test set. MBF measured using automatic segmentations was highly correlated to that measured using the manual segmentation ($R^2$ = 0.96). Dice Uncertainty and MCD Uncertainty were in good agreement ($R^2$ = 0.64). As $β$ increased, the false positive rate systematically decreased and false negative rate systematically increased.
CONCLUSION: We demonstrate the feasibility of deep CNN for automatic segmentation of myocardial ASL, with good accuracy. We also introduce two simple methods for assessing model uncertainty. Finally, we demonstrate the ability to adapt the CNN model to a specific false positive vs. false negative tradeoff. These findings are directly relevant to automatic segmentation in quantitative cardiac MRI and are broadly applicable to automatic segmentation problems in diagnostic imaging.
△ Less
Submitted 4 November, 2019; v1 submitted 10 December, 2018;
originally announced December 2018.
-
Machine Translation between Vietnamese and English: an Empirical Study
Authors:
Hong-Hai Phan-Vu,
Viet-Trung Tran,
Van-Nam Nguyen,
Hoang-Vu Dang,
Phan-Thuan Do
Abstract:
Machine translation is shifting to an end-to-end approach based on deep neural networks. The state of the art achieves impressive results for popular language pairs such as English - French or English - Chinese. However for English - Vietnamese the shortage of parallel corpora and expensive hyper-parameter search present practical challenges to neural-based approaches. This paper highlights our ef…
▽ More
Machine translation is shifting to an end-to-end approach based on deep neural networks. The state of the art achieves impressive results for popular language pairs such as English - French or English - Chinese. However for English - Vietnamese the shortage of parallel corpora and expensive hyper-parameter search present practical challenges to neural-based approaches. This paper highlights our efforts on improving English-Vietnamese translations in two directions: (1) Building the largest open Vietnamese - English corpus to date, and (2) Extensive experiments with the latest neural models to achieve the highest BLEU scores. Our experiments provide practical examples of effectively employing different neural machine translation models with low-resource language pairs.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
Exhaustive generation for permutations avoiding a (colored) regular sets of patterns
Authors:
Phan Thuan Do,
Thi Thu Huong Tran,
Vincent Vajnovszki
Abstract:
Despite the fact that the field of pattern avoiding permutations has been skyrocketing over the last two decades, there are very few exhaustive generating algorithms for such classes of permutations. In this paper we introduce the notions of regular and colored regular set of forbidden patterns, which are particular cases of right-justified sets of forbidden patterns. We show the (colored) regular…
▽ More
Despite the fact that the field of pattern avoiding permutations has been skyrocketing over the last two decades, there are very few exhaustive generating algorithms for such classes of permutations. In this paper we introduce the notions of regular and colored regular set of forbidden patterns, which are particular cases of right-justified sets of forbidden patterns. We show the (colored) regularity of several sets of forbidden patterns (some of them involving variable length patterns) and we derive a general framework for the efficient generation of permutations avoiding them. The obtained generating algorithms are based on succession functions, a notion which is a byproduct of the ECO method introduced in the context of enumeration and random generation of combinatorial objects by Barcucci et al. in 1999, and developed later by Bacchelli et al. in 2004, for instance. For some classes of permutations falling under our general framework, the corresponding counting sequences are classical in combinatorics, such as Pell, Fibonacci, Catalan, Schröder and binomial transform of Padovan sequence.
△ Less
Submitted 15 September, 2018; v1 submitted 3 September, 2018;
originally announced September 2018.
-
Legal Question Answering using Ranking SVM and Deep Convolutional Neural Network
Authors:
Phong-Khac Do,
Huy-Tien Nguyen,
Chien-Xuan Tran,
Minh-Tien Nguyen,
Minh-Le Nguyen
Abstract:
This paper presents a study of employing Ranking SVM and Convolutional Neural Network for two missions: legal information retrieval and question answering in the Competition on Legal Information Extraction/Entailment. For the first task, our proposed model used a triple of features (LSI, Manhattan, Jaccard), and is based on paragraph level instead of article level as in previous studies. In fact,…
▽ More
This paper presents a study of employing Ranking SVM and Convolutional Neural Network for two missions: legal information retrieval and question answering in the Competition on Legal Information Extraction/Entailment. For the first task, our proposed model used a triple of features (LSI, Manhattan, Jaccard), and is based on paragraph level instead of article level as in previous studies. In fact, each single-paragraph article corresponds to a particular paragraph in a huge multiple-paragraph article. For the legal question answering task, additional statistical features from information retrieval task integrated into Convolutional Neural Network contribute to higher accuracy.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.