Search | arXiv e-print repository

doi 10.1109/IEEECONF59524.2023.10476848

Joint Optimization of Switching Point and Power Control in Dynamic TDD Cell-Free Massive MIMO

Authors: Martin Andersson, Tung T. Vu, Pål Frenger, Erik G. Larsson

Abstract: We consider a cell-free massive multiple-input multiple-output (CFmMIMO) network operating in dynamic time division duplex (DTDD). The switching point between the uplink (UL) and downlink (DL) data transmission phases can be adapted dynamically to the instantaneous quality-of-service (QoS) requirements in order to improve energy efficiency (EE). To this end, we formulate a problem of optimizing th… ▽ More We consider a cell-free massive multiple-input multiple-output (CFmMIMO) network operating in dynamic time division duplex (DTDD). The switching point between the uplink (UL) and downlink (DL) data transmission phases can be adapted dynamically to the instantaneous quality-of-service (QoS) requirements in order to improve energy efficiency (EE). To this end, we formulate a problem of optimizing the DTDD switching point jointly with the UL and DL power control coefficients, and the large-scale fading decoding (LSFD) weights for EE maximization. Then, we propose an iterative algorithm to solve the formulated challenging problem using successive convex approximation with an approximate stationary solution. Simulation results show that optimizing switching points remarkably improves EE compared with baseline schemes that adjust switching points heuristically. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Presented at the Asilomar Conference on Signals, Systems, and Computers 2023

arXiv:2406.06406 [pdf, other]

Controlling Emotion in Text-to-Speech with Natural Language Prompts

Authors: Thomas Bott, Florian Lux, Ngoc Thang Vu

Abstract: In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points wi… ▽ More In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points within a transformer-based architecture. Our approach is trained on merged emotional speech and text datasets and varies prompts in each training iteration to increase the generalization capabilities of the model. Objective and subjective evaluation results demonstrate the ability of the conditioned synthesis system to accurately transfer the emotions present in a prompt to speech. At the same time, precise tractability of speaker identities as well as overall high speech quality and intelligibility are maintained. △ Less

Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: accepted at Interspeech 2024

arXiv:2406.06403 [pdf, other]

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Authors: Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu

Abstract: In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech syn… ▽ More In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: accepted at Interspeech 2024

arXiv:2404.10922 [pdf, other]

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

Authors: Pavel Denisov, Ngoc Thang Vu

Abstract: Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness th… ▽ More Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness the capabilities of LLMs for speech recognition and beyond. Utilizing a multi-instructional training approach, we demonstrate the transferability of linguistic knowledge from the text to the speech modality. Our experiments, conducted on 1900 hours of transcribed data from 139 languages, establish that a multilingual speech representation can be effectively learned and aligned with a multilingual LLM. While this learned representation initially shows limitations in task generalization, we address this issue by generating synthetic targets in a multi-instructional style. Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: NAACL Findings 2024

arXiv:2403.17913 [pdf, ps, other]

Enhancing Indoor and Outdoor THz Communications with Beyond Diagonal-IRS: Optimization and Performance Analysis

Authors: Asad Mahmood, Thang X. Vu, Symeon Chatzinotas, Björn Ottersten

Abstract: This work investigates the application of Beyond Diagonal Intelligent Reflective Surface (BD-IRS) to enhance THz downlink communication systems, operating in a hybrid: reflective and transmissive mode, to simultaneously provide services to indoor and outdoor users. We propose an optimization framework that jointly optimizes the beamforming vectors and phase shifts in the hybrid reflective/transmis… ▽ More This work investigates the application of Beyond Diagonal Intelligent Reflective Surface (BD-IRS) to enhance THz downlink communication systems, operating in a hybrid: reflective and transmissive mode, to simultaneously provide services to indoor and outdoor users. We propose an optimization framework that jointly optimizes the beamforming vectors and phase shifts in the hybrid reflective/transmissive mode, aiming to maximize the system sum rate. To tackle the challenges in solving the joint design problem, we employ the conjugate gradient method and propose an iterative algorithm that successively optimizes the hybrid beamforming vectors and the phase shifts. Through comprehensive numerical simulations, our findings demonstrate a significant improvement in rate when compared to existing benchmark schemes, including time- and frequency-divided approaches, by approximately $30.5\%$ and $69.9\%$ respectively and even outperforms the STAR-IRS system by $76.99\%$. This underscores the significant influence of IRS elements on system performance relative to that of base station antennas, highlighting their pivotal role in advancing the communication system efficacy. △ Less

Submitted 9 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.01102 [pdf, other]

doi 10.1016/j.epsr.2024.110191

Real-time hybrid controls of energy storage and load shedding for integrated power and energy systems of ships

Authors: Linh Vu, Thai-Thanh Nguyen, Bang Le-Huy Nguyen, Md Isfakul Anam, Tuyen Vu

Abstract: This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this pap… ▽ More This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this paper aims to achieve multiple objectives. The primary objective is to minimize shed loads, while the secondary objective is to effectively manage different types of ESS. Considering the diverse ramp-rate characteristics of generators, SCESS, and BESS, the proposed EMS exploits these differences to determine an optimal long-term schedule for minimizing shed loads. Furthermore, the proposed EMS balances the state-of-charge (SoC) of ESS and prioritizes the SCESS's SoC levels to ensure the efficient operation of BESS and SCESS. For better computational efficiency, we introduce the receding horizon optimization method, enabling real-time EMS implementation. A comparison with the fixed horizon optimization (FHO) validates its effectiveness. Simulation studies and results demonstrate that the proposed EMS efficiently manages generators, BESS, and SCESS, ensuring system resilience under generation shortages. Additionally, the proposed methodology significantly reduces the computational burden compared to the FHO technique while maintaining acceptable resilience performance. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 15 pages, 17 figures

Journal ref: Electric Power Systems Research, volume 229, pages 110191, year 2024

arXiv:2403.00674 [pdf, other]

doi 10.1109/OJCOMS.2024.3373170

Cell-Free Massive MIMO with Multi-Antenna Users and Phase Misalignments: A Novel Partially Coherent Transmission Framework

Authors: Unnikrishnan Kunnath Ganesan, Tung Thanh Vu, Erik G. Larsson

Abstract: Cell-free massive multiple-input multiple-output (MIMO) is a promising technology for next-generation communication systems. This work proposes a novel partially coherent (PC) transmission framework to cope with the challenge of phase misalignment among the access points (APs), which is important for unlocking the full potential of cell-free massive MIMO technology. With the PC operation, the APs… ▽ More Cell-free massive multiple-input multiple-output (MIMO) is a promising technology for next-generation communication systems. This work proposes a novel partially coherent (PC) transmission framework to cope with the challenge of phase misalignment among the access points (APs), which is important for unlocking the full potential of cell-free massive MIMO technology. With the PC operation, the APs are only required to be phase-aligned within clusters. Each cluster transmits the same data stream towards each user equipment (UE), while different clusters send different data streams. We first propose a novel algorithm to group APs into clusters such that the distance between two APs is always smaller than a reference distance ensuring the phase alignment of these APs. Then, we propose new algorithms that optimize the combining at UEs and precoding at APs to maximize the downlink sum data rates. We also propose a novel algorithm for data stream allocation to further improve the sum data rate of the PC operation. Numerical results show that the PC operation using the proposed framework with a sufficiently small reference distance can offer a sum rate close to the sum rate of the ideal fully coherent (FC) operation that requires network-wide phase alignment. This demonstrates the potential of PC operation in practical deployments of cell-free massive MIMO networks. △ Less

Submitted 3 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 17 pages, 10 figures. Published in IEEE Open Journal of the Communications Society

arXiv:2401.05425 [pdf, other]

An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection

Authors: Abdul Aziz, Nhat Pham, Neel Vora, Cody Reynolds, Jaime Lehnen, Pooja Venkatesh, Zhuoran Yao, Jay Harvey, Tam Vu, Kan Ding, Phuc Nguyen

Abstract: Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scal… ▽ More Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scalp-based EEG test, despite being the gold standard for diagnosing epilepsy, is costly, necessitates hospitalization, demands skilled professionals for operation, and is discomforting for users. In this paper, we propose EarSD, a novel lightweight, unobtrusive, and socially acceptable ear-worn system to detect epileptic seizure onsets by measuring the physiological signals from behind the user's ears. EarSD includes an integrated custom-built sensing, computing, and communication PCB to collect and amplify the signals of interest, remove the noises caused by motion artifacts and environmental impacts, and stream the data wirelessly to the computer or mobile phone nearby, where data are uploaded to the host computer for further processing. We conducted both in-lab and in-hospital experiments with epileptic seizure patients who were hospitalized for seizure studies. The preliminary results confirm that EarSD can detect seizures with up to 95.3 percent accuracy by just using classical machine learning algorithms. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.02701 [pdf, ps, other]

Joint User Association and Power Control for Cell-Free Massive MIMO

Authors: Chongzheng Hao, Tung Thanh Vu, Hien Quoc Ngo, Minh N. Dao, Xiaoyu Dang, Chenghua Wang, Michail Matthaiou

Abstract: This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs,… ▽ More This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs, we formulate a mixed-integer nonconvex optimization problem under constraints on the per-AP transmit power, quality-of-service rate requirements, maximum fronthaul signalling load, and maximum number of UEs served by each AP. In order to solve the formulated problem efficiently, we propose two different schemes according to the different sizes of the CFmMIMO systems. For small-scale CFmMIMO systems, we present a successive convex approximation (SCA) method to obtain a stationary solution and also develop a learning-based method (JointCFNet) to reduce the computational complexity. For large-scale CFmMIMO systems, we propose a low-complexity suboptimal algorithm using accelerated projected gradient (APG) techniques. Numerical results show that our JointCFNet can yield similar performance and significantly decrease the run time compared with the SCA algorithm in small-scale systems. The presented APG approach is confirmed to run much faster than the SCA algorithm in the large-scale system while obtaining an SE performance close to that of the SCA approach. Moreover, the median sum SE of the APG method is up to about 2.8 fold higher than that of the heuristic baseline scheme. △ Less

Submitted 20 May, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: minor revision of the previous version

arXiv:2312.17738 [pdf, other]

Physics-informed Graphical Neural Network for Power System State Estimation

Authors: Quang-Ha Ngo, Bang L. H. Nguyen, Tuyen V. Vu, Jianhua Zhang, Tuan Ngo

Abstract: State estimation is highly critical for accurately observing the dynamic behavior of the power grids and minimizing risks from cyber threats. However, existing state estimation methods encounter challenges in accurately capturing power system dynamics, primarily because of limitations in encoding the grid topology and sparse measurements. This paper proposes a physics-informed graphical learning s… ▽ More State estimation is highly critical for accurately observing the dynamic behavior of the power grids and minimizing risks from cyber threats. However, existing state estimation methods encounter challenges in accurately capturing power system dynamics, primarily because of limitations in encoding the grid topology and sparse measurements. This paper proposes a physics-informed graphical learning state estimation method to address these limitations by leveraging both domain physical knowledge and a graph neural network (GNN). We employ a GNN architecture that can handle the graph-structured data of power systems more effectively than traditional data-driven methods. The physics-based knowledge is constructed from the branch current formulation, making the approach adaptable to both transmission and distribution systems. The validation results of three IEEE test systems show that the proposed method can achieve lower mean square error more than 20% than the conventional methods. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: 11 pages, 17 figures, journal accepted

arXiv:2312.11127 [pdf, other]

User-centric Flexible Resource Management Framework for LEO Satellites with Fully Regenerative Payload

Authors: Sovit Bhandari, Thang X. Vu, Symeon Chatzinotas

Abstract: The regenerative capabilities of next-generation satellite systems offer a novel approach to design low earth orbit (LEO) satellite communication systems, enabling full flexibility in bandwidth and spot beam management, power control, and onboard data processing. These advancements allow the implementation of intelligent spatial multiplexing techniques, addressing the ever-increasing demand for fu… ▽ More The regenerative capabilities of next-generation satellite systems offer a novel approach to design low earth orbit (LEO) satellite communication systems, enabling full flexibility in bandwidth and spot beam management, power control, and onboard data processing. These advancements allow the implementation of intelligent spatial multiplexing techniques, addressing the ever-increasing demand for future broadband data traffic. Existing satellite resource management solutions, however, do not fully exploit these capabilities. To address this issue, a novel framework called flexible resource management algorithm for LEO satellites (FLARE-LEO) is proposed to jointly design bandwidth, power, and spot beam coverage optimized for the geographic distribution of users. It incorporates multi-spot beam multicasting, spatial multiplexing, caching, and handover (HO). In particular, the spot beam coverage is optimized by using the unsupervised K-means algorithm applied to the realistic geographical user demands, followed by a proposed successive convex approximation (SCA)-based iterative algorithm for optimizing the radio resources. Furthermore, we propose two joint transmission architectures during the HO period, which jointly estimate the downlink channel state information (CSI) using deep learning and optimize the transmit power of the LEOs involved in the HO process to improve the overall system throughput. Simulations demonstrate superior performance in terms of delivery time reduction of the proposed algorithm over the existing solutions. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: To appear in IEEE JSAC

arXiv:2311.07199 [pdf, ps, other]

Joint Computation and Communication Resource Optimization for Beyond Diagonal UAV-IRS Empowered MEC Networks

Authors: Asad Mahmood, Thang X. Vu, Wali Ullah Khan, Symeon Chatzinotas, Björn Ottersten

Abstract: Recent advancements in 6G systems signal a leap towards universal connectivity and ultra-reliable, low-latency communications for real-time data devices. Yet, these advancements encounter obstacles such as limited device battery life and computational power, along with urban signal blockages. To counter these, Intelligent Reconfigurable Surfaces (IRS) within Mobile Edge Cloud (MEC) infrastructures… ▽ More Recent advancements in 6G systems signal a leap towards universal connectivity and ultra-reliable, low-latency communications for real-time data devices. Yet, these advancements encounter obstacles such as limited device battery life and computational power, along with urban signal blockages. To counter these, Intelligent Reconfigurable Surfaces (IRS) within Mobile Edge Cloud (MEC) infrastructures offer enhanced computing to overcome device limitations and create alternative communication paths. Despite these improvements, connectivity issues remain for remote areas. Our paper presents the Beyond Diagonal IRS (BD-IRS or IRS 2.0), integrated with UAVs in MEC networks (BD-IRS-UAV), providing on-demand links for remote users to offload tasks, tackling resource and battery limitations. We propose a joint optimization strategy to reduce system's worst-case latency and UAV hovering time by optimizing BD-IRS-UAV deployment and resource allocation. This challenge is approached by dividing it into two sub-problems: BD-IRS-UAV Placement and Computational Resource Optimization, and Communication Resource Optimization, each solved iteratively. This design significantly enhances system performance, showing a $17.75\%$ increase over traditional diagonal IRS and a $25.43\%$ improvement over IRS on buildings, with a $13.44\%$ enhancement in worst-case latency compared to binary offloading schemes. △ Less

Submitted 15 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.05049 [pdf, ps, other]

Constrained Independent Vector Analysis with Reference for Multi-Subject fMRI Analysis

Authors: Trung Vu, Francisco Laport, Hanlu Yang, Vince D. Calhoun, Tulay Adali

Abstract: Independent component analysis (ICA) is now a widely used solution for the analysis of multi-subject functional magnetic resonance imaging (fMRI) data. Independent vector analysis (IVA) generalizes ICA to multiple datasets, i.e., to multi-subject data, and in addition to higher-order statistical information in ICA, it leverages the statistical dependence across the datasets as an additional type o… ▽ More Independent component analysis (ICA) is now a widely used solution for the analysis of multi-subject functional magnetic resonance imaging (fMRI) data. Independent vector analysis (IVA) generalizes ICA to multiple datasets, i.e., to multi-subject data, and in addition to higher-order statistical information in ICA, it leverages the statistical dependence across the datasets as an additional type of statistical diversity. As such, it preserves variability in the estimation of single-subject maps but its performance might suffer when the number of datasets increases. Constrained IVA is an effective way to bypass computational issues and improve the quality of separation by incorporating available prior information. Existing constrained IVA approaches often rely on user-defined threshold values to define the constraints. However, an improperly selected threshold can have a negative impact on the final results. This paper proposes two novel methods for constrained IVA: one using an adaptive-reverse scheme to select variable thresholds for the constraints and a second one based on a threshold-free formulation by leveraging the unique structure of IVA. We demonstrate that our solutions provide an attractive solution to multi-subject fMRI analysis both by simulations and through analysis of resting state fMRI data collected from 98 subjects -- the highest number of subjects ever used by IVA algorithms. Our results show that both proposed approaches obtain significantly better separation quality and model match while providing computationally efficient and highly reproducible solutions. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 11 pages

arXiv:2310.17502 [pdf, other]

doi 10.21437/Interspeech.2023-858

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Authors: Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

Abstract: Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intui… ▽ More Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.html

arXiv:2310.17499 [pdf, other]

The IMS Toucan System for the Blizzard Challenge 2023

Authors: Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

Abstract: For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synt… ▽ More For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synthesis architecture based on Conformer and Glow. A GAN based neural vocoder that combines recent state-of-the-art approaches converts the spectrogram to the final wave. We carefully designed the data processing, training, and inference procedures for the challenge data. Our system identifier is G. Open source code and demo are available. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Published at the Blizzard Challenge Workshop 2023, colocated with the Speech Synthesis Workshop 2023, a sattelite event of the Interspeech 2023

arXiv:2310.12574 [pdf]

A reproducible 3D convolutional neural network with dual attention module (3D-DAM) for Alzheimer's disease classification

Authors: Thanh Phuong Vu, Tien Nhat Nguyen, N. Minh Nhat Hoang, Gia Minh Hoang

Abstract: Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classifi… ▽ More Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classification. We trained the model in the ADNI database and verified the generalizability of our method in two independent datasets (AIBL and OASIS1). Our method achieved state-of-the-art classification performance, with an accuracy of 91.94% for MCI progression classification and 96.30% for Alzheimer's disease classification on the ADNI dataset. Furthermore, the model demonstrated good generalizability, achieving an accuracy of 86.37% on the AIBL dataset and 83.42% on the OASIS1 dataset. These results indicate that our proposed approach has competitive performance and generalizability when compared to recent studies in the field. △ Less

Submitted 4 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.10549 [pdf, other]

Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey

Authors: Mai Le, Thien Huynh-The, Tan Do-Duy, Thai-Hoc Vu, Won-Joo Hwang, Quoc-Viet Pham

Abstract: The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI… ▽ More The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI for better IoT services and applications. Therefore, existing AI-enabled IoT systems can be enhanced by implementing distributed machine learning (aka distributed learning) approaches. This work aims to provide a comprehensive survey on distributed learning for IoT services and applications in emerging networks. In particular, we first provide a background of machine learning and present a preliminary to typical distributed learning approaches, such as federated learning, multi-agent reinforcement learning, and distributed inference. Then, we provide an extensive review of distributed learning for critical IoT services (e.g., data sharing and computation offloading, localization, mobile crowdsensing, and security and privacy) and IoT applications (e.g., smart healthcare, smart grid, autonomous vehicle, aerial IoT networks, and smart industry). From the reviewed literature, we also present critical challenges of distributed learning for IoT and propose several promising solutions and research directions in this emerging area. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06103 [pdf, other]

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

Authors: Pavel Denisov, Ngoc Thang Vu

Abstract: A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four… ▽ More A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four languages in a generative manner, including the prediction of lexical fillers. We investigate how the proposed method can be improved by pretraining on widely available speech recognition data using several training objectives. Pretraining on 7000 hours of multilingual data allows us to outperform the state-of-the-art ultimately on two SLU datasets and partly on two more SLU datasets. Finally, we examine the cross-lingual capabilities of the proposed model and improve on the best known result on the PortMEDIA-Language dataset by almost half, achieving a Concept/Value Error Rate of 23.65%. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2023

arXiv:2310.01024 [pdf, other]

Joint Source-Channel Coding System for 6G Communication: Design, Prototype and Future Directions

Authors: Xinchao Zhong, Sean Longyu Ma, Hong-fu Chou, Arsham Mostaani, Thang X. Vu, Symeon Chatzinotas

Abstract: The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication… ▽ More The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication is facilitated by the utilization of flexible structural hardware design, which is constrained by the computational capabilities of edge devices. This characteristic represents a significant benefit of joint source-channel coding (JSCC), as it enables the generation of source alphabets with diverse lengths and achieves a code rate of unity. Moreover, JSCC exhibits near-capacity performance while maintaining low complexity. Therefore, we leverage not only quasi-cyclic (QC) characteristics to propose a QC-LDPC code-based JSCC scheme but also Unequal Error Protection (UEP) to ensure the recovery of semantic importance. In this study, the feasibility for using a semantic encoder/decoder that is aware of UEP can be explored based on the existing JSCC system. This approach is aimed at protecting the significance of semantic task-oriented information. Additionally, the deployment of a JSCC system can be facilitated by employing Low-Density Parity-Check (LDPC) codes on a reconfigurable device. This is achieved by reconstructing the LDPC codes as QC-LDPC codes. The QC-LDPC layered decoding technique, which has been specifically optimized for hardware parallelism and tailored for channel decoding applications, can be suitably adapted to accommodate the JSCC system. The performance of the proposed system is evaluated by conducting BER measurements using both floating-point and 6-bit quantization. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 14 pages, 9 figures, Journal

arXiv:2309.08049 [pdf, other]

doi 10.1109/OJSP.2023.3344375

VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Authors: Sarina Meyer, Xiaoxiao Miao, Ngoc Thang Vu

Abstract: Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity… ▽ More Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity of evaluation and the absence of user-friendly research frameworks. We therefore propose an efficient speaker anonymization and evaluation framework based on a modular and easily extendable structure, almost fully in Python. The framework facilitates the orchestration of several anonymization approaches in parallel and allows for interfacing between different techniques. Furthermore, we propose modifications to common evaluation methods which improves the quality of the evaluation and reduces their computation time by 65 to 95%, depending on the metric. Our code is fully open source. △ Less

Submitted 21 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: Accepted by OJSP-ICASSP 2024 https://ieeexplore.ieee.org/document/10365329

arXiv:2306.14380 [pdf, ps, other]

A New Optimal Subpattern Assignment (OSPA) Metric for Multi-target Filtering

Authors: Tuyet Vu

Abstract: This paper proposes and evaluates a new metric. This metric will overcome a limitation of the Optimal Subpattern Assignment (OSPA) metric mentioned by Schuhmacher et al.: the OSPA distance between two sets of points is insensitive to the the case where one is empty. This proposed metric called Complete OSPA (COSPA), retains all the advantages of the OSPA metric for evaluating the performance of mu… ▽ More This paper proposes and evaluates a new metric. This metric will overcome a limitation of the Optimal Subpattern Assignment (OSPA) metric mentioned by Schuhmacher et al.: the OSPA distance between two sets of points is insensitive to the the case where one is empty. This proposed metric called Complete OSPA (COSPA), retains all the advantages of the OSPA metric for evaluating the performance of multiple target filtering algorithms while also allowing separate control over the threshold of physical distance errors and cardinality errors. △ Less

Submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.14018 [pdf, other]

doi 10.1109/TSG.2023.3310893

Multi-agent Deep Reinforcement Learning for Distributed Load Restoration

Authors: Linh Vu, Tuyen Vu, Thanh-Long Vu, Anurag Srivastava

Abstract: This paper addresses the load restoration problem after power outage events. Our primary proposed methodology is using multi-agent deep reinforcement learning to optimize the load restoration process in distribution systems, modeled as networked microgrids, via determining the optimal operational sequence of circuit breakers (switches). An innovative invalid action masking technique is incorporate… ▽ More This paper addresses the load restoration problem after power outage events. Our primary proposed methodology is using multi-agent deep reinforcement learning to optimize the load restoration process in distribution systems, modeled as networked microgrids, via determining the optimal operational sequence of circuit breakers (switches). An innovative invalid action masking technique is incorporated into the multi-agent method to handle both the physical constraints in the restoration process and the curse of dimensionality as the action space of operational decisions grows exponentially with the number of circuit breakers. The features of our proposed method include centralized training for multi-agents to overcome non-stationary environment problems, decentralized execution to ease the deployment, and zero constraint violations to prevent harmful actions. Our simulations are performed in OpenDSS and Python environments to demonstrate the effectiveness of the proposed approach using the IEEE 13, 123, and 8500-node distribution test feeders. The results show that the proposed algorithm can achieve a significantly better learning curve and stability than the conventional methods. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 12 pages, 19 figures, journal under review

arXiv:2306.14017 [pdf, other]

doi 10.1109/TIA.2023.3311429

A Cyber-HIL for Investigating Control Systems in Ship Cyber Physical Systems under Communication Issues and Cyber Attacks

Authors: Linh Vu, Lam Nguyen, Mahmoud Abdelaal, Tuyen Vu, Osama Mohammed

Abstract: This paper presents a novel Cyber-Hardware-in-the-Loop (Cyber-HIL) platform for assessing control operation in ship cyber-physical systems. The proposed platform employs cutting-edge technologies, including Docker containers, real-time simulator $OPAL-RT$, and network emulator $ns3$, to create a secure and controlled testing and deployment environment for investigating the potential impact of cybe… ▽ More This paper presents a novel Cyber-Hardware-in-the-Loop (Cyber-HIL) platform for assessing control operation in ship cyber-physical systems. The proposed platform employs cutting-edge technologies, including Docker containers, real-time simulator $OPAL-RT$, and network emulator $ns3$, to create a secure and controlled testing and deployment environment for investigating the potential impact of cyber attack threats on ship control systems. Real-time experiments were conducted using an advanced load-shedding controller as a control object in both synchronous and asynchronous manners, showcasing the platform's versatility and effectiveness in identifying vulnerabilities and improving overall Ship Cyber Physical System (SCPS) security. Furthermore, the performance of the load-shedding controller under cyber attacks was evaluated by conducting tests with man-in-the-middle (MITM) and denial-of-service (DoS) attacks. These attacks were implemented on the communication channels between the controller and the simulated ship system, emulating real-world scenarios. The proposed Cyber-HIL platform provides a comprehensive and effective approach to test and validate the security of ship control systems in the face of cyber threats. △ Less

Submitted 25 August, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: 10 pages, 16 figures, journal under review

Journal ref: IEEE Transactions on Industry Applications, vol. 60, no. 2, pp. 2142-2152, March-April 2024

arXiv:2305.04519 [pdf, other]

Joint Optimization of 3D Placement and Radio Resource Allocation for per-UAV Sum Rate Maximization

Authors: Asad Mahmood, Thang X. Vu, Symeon Chatzinotas, Björn Ottersten

Abstract: Unmanned aerial vehicles (UAV) have emerged as a practical solution that provides on-demand services to users in areas where the terrestrial network is non-existent or temporarily unavailable, e.g., due to natural disasters or network congestion. In general, UAVs' user-serving capacity is typically constrained by their limited battery life and the finite communication resources that highly impact… ▽ More Unmanned aerial vehicles (UAV) have emerged as a practical solution that provides on-demand services to users in areas where the terrestrial network is non-existent or temporarily unavailable, e.g., due to natural disasters or network congestion. In general, UAVs' user-serving capacity is typically constrained by their limited battery life and the finite communication resources that highly impact their performance. This work considers the orthogonal frequency division multiple access (OFDMA) enabled multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive an efficient technique for the allocation of radio resources, $3$D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joint optimization problem into two sub-problems: (i) $3$D placement and user association and (ii) sum-rate maximization for optimal radio resource allocation, which are solved iteratively. The proposed iterative algorithm is shown via numerical results to achieve fast convergence speed after fewer than 10 iterations. The benefits of the proposed design are demonstrated via superior sum-rate performance compared to existing reference designs. Moreover, results showed that the optimal power and sub-carrier allocation help to mitigate the inter-cell interference that directly impacts the system's performance. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.00769 [pdf, other]

Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals

Authors: Tu Vu, Van Thong Huynh, Soo-Hyung Kim

Abstract: This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with s… ▽ More This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions. Additionally, we utilize Transformer and Gaussian Transformation techniques to improve signal encoding effectiveness and overall performance. Our model achieves decent results on the CASE dataset of the EPiC competition, with an RMSE score of 1.45. △ Less

Submitted 7 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.13008 [pdf, other]

Artificial Intelligence for Satellite Communication and Non-Terrestrial Networks: A Survey

Authors: G. Fontanesi, F. Ortíz, E. Lagunas, V. Monzon Baeza, M. Á. Vázquez, J. A. Vásquez-Peralvo, M. Minardi, H. N. Vu, P. J. Honnaiah, C. Lacoste, Y. Drif, T. S. Abdu, G. Eappen, J. Rehman, L. M. Garcés-Socorrás, W. A. Martins, P. Henarejos, H. Al-Hraishawi, J. C. Merlano Duncan, T. X. Vu, S. Chatzinotas

Abstract: This paper surveys the application and development of Artificial Intelligence (AI) in Satellite Communication (SatCom) and Non-Terrestrial Networks (NTN). We first present a comprehensive list of use cases, the relative challenges and the main AI tools capable of addressing those challenges. For each use case, we present the main motivation, a system description, the available non-AI solutions and… ▽ More This paper surveys the application and development of Artificial Intelligence (AI) in Satellite Communication (SatCom) and Non-Terrestrial Networks (NTN). We first present a comprehensive list of use cases, the relative challenges and the main AI tools capable of addressing those challenges. For each use case, we present the main motivation, a system description, the available non-AI solutions and the potential benefits and available works using AI. We also discuss the pros and cons of an on-board and on-ground AI-based architecture, and we revise the current commercial and research activities relevant to this topic. Next, we describe the state-of-the-art hardware solutions for develo** ML in real satellite systems. Finally, we discuss the long-term developments of AI in the SatCom and NTN sectors and potential research directions. This paper provides a comprehensive and up-to-date overview of the opportunities and challenges offered by AI to improve the performance and efficiency of NTNs. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.12798 [pdf, other]

Multi-Objective Optimization for 3D Placement and Resource Allocation in OFDMA-based Multi-UAV Networks

Authors: Asad Mahmood, Thang X. Vu, Shree Krishna Sharma, Symeon Chatzinotas, Björn Ottersten

Abstract: This work considers the orthogonal frequency division multiple access (OFDMA) technology that enables multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive the optimal allocation of radio resources, 3D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joi… ▽ More This work considers the orthogonal frequency division multiple access (OFDMA) technology that enables multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive the optimal allocation of radio resources, 3D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joint optimization problem into two sub-problems: i) 3D placement and user association and ii) sum-rate maximization for optimal radio resource allocation, which are solved iteratively. The proposed iterative algorithm is shown via numerical results to achieve fast convergence speed after less than 10 iterations. The benefits of the proposed design are demonstrated via superior sum-rate performance compared to existing reference designs. Moreover, the results declared that the optimal power and sub-carrier allocation helped mitigate the co-cell interference that directly impacts the system's performance. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.04478 [pdf, other]

Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

Authors: Daniel Ortega, Chia-Yu Li, Ngoc Thang Vu

Abstract: This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at develo** a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of u… ▽ More This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at develo** a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of using listener embeddings to mimic different backchanneling behaviours. Our experimental results on the Switchboard benchmark dataset reveal that acoustic cues are more important than lexical cues in this task and their combination with listener embeddings works best on both, manual transcriptions and automatically generated transcriptions. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Published in ICASSP 2020

arXiv:2302.09597

doi 10.1049/enc2.12107

Solving Differential-Algebraic Equations in Power System Dynamic Analysis with Quantum Computing

Authors: Huynh Trung Thanh Tran, Hieu T. Nguyen, Long T. Vu, Samuel T. Ojetola

Abstract: Power system dynamics are generally modeled by high dimensional nonlinear differential-algebraic equations (DAEs) given a large number of components forming the network. These DAEs' complexity can grow exponentially due to the increasing penetration of distributed energy resources, whereas their computation time becomes sensitive due to the increasing interconnection of the power grid with other e… ▽ More Power system dynamics are generally modeled by high dimensional nonlinear differential-algebraic equations (DAEs) given a large number of components forming the network. These DAEs' complexity can grow exponentially due to the increasing penetration of distributed energy resources, whereas their computation time becomes sensitive due to the increasing interconnection of the power grid with other energy systems. This paper demonstrates the use of quantum computing algorithms to solve DAEs for power system dynamic analysis. We leverage a symbolic programming framework to equivalently convert the power system's DAEs into ordinary differential equations (ODEs) using index reduction methods and then encode their data into qubits using amplitude encoding. The system nonlinearity is captured by Hamiltonian simulation with truncated Taylor expansion so that state variables can be updated by a quantum linear equation solver. Our results show that quantum computing can solve the power system's DAEs accurately with a computational complexity polynomial in the logarithm of the system dimension. We also illustrate the use of recent advanced tools in scientific machine learning for implementing complex computing concepts, i.e. Taylor expansion, DAEs/ODEs transformation, and quantum computing solver with abstract representation for power engineering applications. △ Less

Submitted 1 March, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: This version was uploaded as an incorrect replacement, and was intended as a replacement of arXiv:2306.01961. I need to withdraw this paper to upload it as a replacement of the correct paper

Journal ref: Energy Conversion and Economics, Volume 5, Issue 1, Feb 2024, pages 40-53

arXiv:2302.02711 [pdf, other]

Network-Aided Intelligent Traffic Steering in 6G O-RAN: A Multi-Layer Optimization Framework

Authors: Van-Dinh Nguyen, Thang X. Vu, Nhan Thanh Nguyen, Dinh C. Nguyen, Markku Juntti, Nguyen Cong Luong, Dinh Thai Hoang, Diep N. Nguyen, Symeon Chatzinotas

Abstract: To enable an intelligent, programmable and multi-vendor radio access network (RAN) for 6G networks, considerable efforts have been made in standardization and development of open RAN (O-RAN). So far, however, the applicability of O-RAN in controlling and optimizing RAN functions has not been widely investigated. In this paper, we jointly optimize the flow-split distribution, congestion control and… ▽ More To enable an intelligent, programmable and multi-vendor radio access network (RAN) for 6G networks, considerable efforts have been made in standardization and development of open RAN (O-RAN). So far, however, the applicability of O-RAN in controlling and optimizing RAN functions has not been widely investigated. In this paper, we jointly optimize the flow-split distribution, congestion control and scheduling (JFCS) to enable an intelligent traffic steering application in O-RAN. Combining tools from network utility maximization and stochastic optimization, we introduce a multi-layer optimization framework that provides fast convergence, long-term utility-optimality and significant delay reduction compared to the state-of-the-art and baseline RAN approaches. Our main contributions are three-fold: i) we propose the novel JFCS framework to efficiently and adaptively direct traffic to appropriate radio units; ii) we develop low-complexity algorithms based on the reinforcement learning, inner approximation and bisection search methods to effectively solve the JFCS problem in different time scales; and iii) the rigorous theoretical performance results are analyzed to show that there exists a scaling factor to improve the tradeoff between delay and utility-optimization. Collectively, the insights in this work will open the door towards fully automated networks with enhanced control and flexibility. Numerical results are provided to demonstrate the effectiveness of the proposed algorithms in terms of the convergence rate, long-term utility-optimality and delay reduction. △ Less

Submitted 29 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: 15 pages, 10 figures. A short version will be submitted to IEEE GLOBECOM 2023

arXiv:2301.01628 [pdf, other]

Task-Effective Compression of Observations for the Centralized Control of a Multi-agent System Over Bit-Budgeted Channels

Authors: Arsham Mostaani, Thang X. Vu, Symeon Chatzinotas, Bjorn Ottersten

Abstract: We consider a task-effective quantization problem that arises when multiple agents are controlled via a centralized controller (CC). While agents have to communicate their observations to the CC for decision-making, the bit-budgeted communications of agent-CC links may limit the task-effectiveness of the system which is measured by the system's average sum of stage costs/rewards. As a result, each… ▽ More We consider a task-effective quantization problem that arises when multiple agents are controlled via a centralized controller (CC). While agents have to communicate their observations to the CC for decision-making, the bit-budgeted communications of agent-CC links may limit the task-effectiveness of the system which is measured by the system's average sum of stage costs/rewards. As a result, each agent should compress/quantize its observation such that the average sum of stage costs/rewards of the control task is minimally impacted. We address the problem of maximizing the average sum of stage rewards by proposing two different Action-Based State Aggregation (ABSA) algorithms that carry out the indirect and joint design of control and communication policies in the multi-agent system. While the applicability of ABSA-1 is limited to single-agent systems, it provides an analytical framework that acts as a step** stone to the design of ABSA-2. ABSA-2 carries out the joint design of control and communication for a multi-agent system. We evaluate the algorithms - with average return as the performance metric - using numerical experiments performed to solve a multi-agent geometric consensus problem. The numerical results are concluded by introducing a new metric that measures the effectiveness of communications in a multi-agent system. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2211.02930 [pdf]

1-D Convolutional Graph Convolutional Networks for Fault Detection in Distributed Energy Systems

Authors: Bang L. H. Nguyen, Tuyen Vu, Thai-Thanh Nguyen, Mayank Panwar, Rob Hovsapian

Abstract: This paper presents a 1-D convolutional graph neural network for fault detection in microgrids. The combination of 1-D convolutional neural networks (1D-CNN) and graph convolutional networks (GCN) helps extract both spatial-temporal correlations from the voltage measurements in microgrids. The fault detection scheme includes fault event detection, fault type and phase classification, and fault loc… ▽ More This paper presents a 1-D convolutional graph neural network for fault detection in microgrids. The combination of 1-D convolutional neural networks (1D-CNN) and graph convolutional networks (GCN) helps extract both spatial-temporal correlations from the voltage measurements in microgrids. The fault detection scheme includes fault event detection, fault type and phase classification, and fault location. There are five neural network model training to handle these tasks. Transfer learning and fine-tuning are applied to reduce training efforts. The combined recurrent graph convolutional neural networks (1D-CGCN) is compared with the traditional ANN structure on the Potsdam 13-bus microgrid dataset. The achievable accuracy of 99.27%, 98.1%, 98.75%, and 95.6% for fault detection, fault type classification, fault phase identification, and fault location respectively. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2210.15177

arXiv:2211.02928 [pdf]

Hierarchical Control of Grid-Connected Hydrogen Electrolyzer Providing Grid Services

Authors: Bang L. H. Nguyen, Mayank Panwar, Rob Hovsapian, Yashodhan Agalgaokar, Tuyen Vu

Abstract: This paper presents the operation modes and control architecture of the grid-connected hydrogen electrolyzer systems for the provision of frequency and voltage supports. The analysis is focused on the primary and secondary loops in the hierarchical control scheme. At the power converter inner control loop, the voltage- and current-control modes are analyzed. At the primary level, the droop and opp… ▽ More This paper presents the operation modes and control architecture of the grid-connected hydrogen electrolyzer systems for the provision of frequency and voltage supports. The analysis is focused on the primary and secondary loops in the hierarchical control scheme. At the power converter inner control loop, the voltage- and current-control modes are analyzed. At the primary level, the droop and opposite droop control strategies to provide voltage and frequency support are described. Coordination between primary control and secondary, tertiary reserves is discussed. The case studies and real-time simulation results are provided using Typhoon HIL to back the theoretical investigation. △ Less

Submitted 5 November, 2022; originally announced November 2022.

arXiv:2211.02592 [pdf]

A Large-Scale Study of a Sleep Tracking and Improving Device with Closed-loop and Personalized Real-time Acoustic Stimulation

Authors: Anh Nguyen, Galen Pogoncheff, Ban Xuan Dong, Nam Bui, Hoang Truong, Nhat Pham, Linh Nguyen, Hoang Huu Nguyen, Sy Duong-Quy, Sangtae Ha, Tam Vu

Abstract: Various intervention therapies ranging from pharmaceutical to hi-tech tailored solutions have been available to treat difficulty in falling asleep commonly caused by insomnia in modern life. However, current techniques largely remain ill-suited, ineffective, and unreliable due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, an ability to keep people asleep dur… ▽ More Various intervention therapies ranging from pharmaceutical to hi-tech tailored solutions have been available to treat difficulty in falling asleep commonly caused by insomnia in modern life. However, current techniques largely remain ill-suited, ineffective, and unreliable due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, an ability to keep people asleep during the night, and a large-scale effectiveness evaluation. Here, we introduce a novel sleep aid system, called Earable, that can continuously sense multiple head-based physiological signals and simultaneously enable closed-loop auditory stimulation to entrain brain activities in time for effective sleep promotion. We develop the system in a lightweight, comfortable, and user-friendly headband with a comprehensive set of algorithms and dedicated own-designed audio stimuli. We conducted multiple protocols from 883 sleep studies on 377 subjects (241 women, 119 men) wearing either a gold-standard device (PSG), Earable, or both concurrently. We demonstrate that our system achieves (1) a strong correlation (0.89 +/- 0.03) between the physiological signals acquired by Earable and those from the gold-standard PSG, (2) an 87.8 +/- 5.3% agreement on sleep scoring using our automatic real-time sleep staging algorithm with the consensus scored by three sleep technicians, and (3) a successful non-pharmacological stimulation alternative to effectively shorten the duration of sleep falling by 24.1 +/- 0.1 minutes. These results show that the efficacy of Earable exceeds existing techniques in intentions to promote fast falling asleep, track sleep state accurately, and achieve high social acceptance for real-time closed-loop personalized neuromodulation-based home sleep care. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: 33 pages, 8 figures

arXiv:2210.15177 [pdf]

Spatial-Temporal Recurrent Graph Neural Networks for Fault Diagnostics in Power Distribution Systems

Authors: Bang Nguyen, Tuyen Vu, Thai-Thanh Nguyen, Mayank Panwar, Rob Hovsapian

Abstract: Fault diagnostics are extremely important to decide proper actions toward fault isolation and system restoration. The growing integration of inverter-based distributed energy resources imposes strong influences on fault detection using traditional overcurrent relays. This paper utilizes emerging graph learning techniques to build a new temporal recurrent graph neural network models for fault diagn… ▽ More Fault diagnostics are extremely important to decide proper actions toward fault isolation and system restoration. The growing integration of inverter-based distributed energy resources imposes strong influences on fault detection using traditional overcurrent relays. This paper utilizes emerging graph learning techniques to build a new temporal recurrent graph neural network models for fault diagnostics. The temporal recurrent graph neural network structures can extract the spatial-temporal features from data of voltage measurement units installed at the critical buses. From these features, fault event detection, fault type/phase classification, and fault location are performed. Compared with previous works, the proposed temporal recurrent graph neural networks provide a better generalization for fault diagnostics. Moreover, the proposed scheme retrieves the voltage signals instead of current signals so that there is no need to install relays at all lines of the distribution system. Therefore, the proposed scheme is generalizable and not limited by the number of relays installed. The effectiveness of the proposed method is comprehensively evaluated on the Potsdam microgrid and IEEE 123-node system in comparison with other neural network structures. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.12223 [pdf, other]

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

Authors: Florian Lux, Julia Koch, Ngoc Thang Vu

Abstract: While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnosti… ▽ More While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnostic meta learning (LAML) procedure and modifications to a TTS encoder, we show that it is possible for a system to learn speaking a new language using just 5 minutes of training data while retaining the ability to infer the voice of even unseen speakers in the newly learned language. We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker using objective metrics as well as human studies and provide our code and trained models open source. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted to AACL 2022

arXiv:2210.11642 [pdf, other]

Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses

Authors: Chia-Yu Li, Ngoc Thang Vu

Abstract: We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity map** loss to preserve relevant characteristics of the input feature after conv… ▽ More We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity map** loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text map**. Our experimental results on the WSJ eval92 and Voxforge (non English) show 8~8.5% character error rate reduction over the baseline, and the results on LibriSpeech test_clean also show noticeable improvement. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 6 pages + 2 references, 6 figures, accepted by SLT2022

arXiv:2210.08829 [pdf, other]

Intelligent Traffic Steering in Beyond 5G Open RAN based on LSTM Traffic Prediction

Authors: Fatemeh Kavehmadavani, Van-Dinh Nguyen, Thang X. Vu, Symeon Chatzinotas

Abstract: Open radio access network (ORAN) Alliance offers a disaggregated RAN functionality built using open interface specifications between blocks. To efficiently support various competing services, \textit{namely} enhanced mobile broadband (eMBB) and ultra-reliable and low-latency (uRLLC), the ORAN Alliance has introduced a standard approach toward more virtualized, open and intelligent networks. To rea… ▽ More Open radio access network (ORAN) Alliance offers a disaggregated RAN functionality built using open interface specifications between blocks. To efficiently support various competing services, \textit{namely} enhanced mobile broadband (eMBB) and ultra-reliable and low-latency (uRLLC), the ORAN Alliance has introduced a standard approach toward more virtualized, open and intelligent networks. To realize benefits of ORAN in optimizing resource utilization, this paper studies an intelligent traffic steering (TS) scheme within the proposed disaggregated ORAN architecture. For this purpose, we propose a joint intelligent traffic prediction, flow-split distribution, dynamic user association and radio resource management (JIFDR) framework in the presence of unknown dynamic traffic demands. To adapt to dynamic environments on different time scales, we decompose the formulated optimization problem into two long-term and short-term subproblems, where the optimality of the later is strongly dependent on the optimal dynamic traffic demand. We then apply a long-short-term memory (LSTM) model to effectively solve the long-term subproblem, aiming to predict dynamic traffic demands, RAN slicing, and flow-split decisions. The resulting non-convex short-term subproblem is converted to a more computationally tractable form by exploiting successive convex approximations. Finally, simulation results are provided to demonstrate the effectiveness of the proposed algorithms compared to several well-known benchmark schemes. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.07002 [pdf, other]

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

Authors: Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu

Abstract: In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. One of the challenges in this context is to create non-existent voices that sound as natural as possi… ▽ More In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. One of the challenges in this context is to create non-existent voices that sound as natural as possible. In this work, we propose to tackle this issue by generating speaker embeddings using a generative adversarial network with Wasserstein distance as cost function. By incorporating these artificial embeddings into a speech-to-text-to-speech pipeline, we outperform previous approaches in terms of privacy and utility. According to standard objective metrics and human evaluation, our approach generates intelligible and content-preserving yet privacy-protecting versions of the original recordings. △ Less

Submitted 20 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: IEEE Spoken Language Technology Workshop 2022

arXiv:2210.04041 [pdf, ps, other]

Almost-lossless compression of a low-rank random tensor

Authors: Minh Thanh Vu

Abstract: In this work, we establish an asymptotic limit of almost-lossless compression of a random, finite alphabet tensor which admits a low-rank canonical polyadic decomposition. In this work, we establish an asymptotic limit of almost-lossless compression of a random, finite alphabet tensor which admits a low-rank canonical polyadic decomposition. △ Less

Submitted 23 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: This version fixes typos and adds some remarks

MSC Class: 68P30; 15A69

arXiv:2209.07385 [pdf, other]

Resilient Communication Scheme for Distributed Decision of InterconnectingNetworks of Microgrids

Authors: Thanh Long Vu, Sayak Mukherjee, Veronica Adetola

Abstract: Networking of microgrids can provide the operational flexibility needed for the increasing number of DERs deployed at the distribution level and supporting end-use demand when there is loss of the bulk power system. But, networked microgrids are vulnerable to cyber-physical attacks and faults due to the complex interconnections. As such, it is necessary to design resilient control systems to suppo… ▽ More Networking of microgrids can provide the operational flexibility needed for the increasing number of DERs deployed at the distribution level and supporting end-use demand when there is loss of the bulk power system. But, networked microgrids are vulnerable to cyber-physical attacks and faults due to the complex interconnections. As such, it is necessary to design resilient control systems to support the operations of networked microgrids in responses to cyber-physical attacks and faults. This paper introduces a resilient communication scheme for interconnecting multiple microgrids to support critical demand, in which the interconnection decision can be made distributedly by each microgrid controller even in the presence of cyberattacks to some communication links or microgrid controllers. This scheme blends a randomized peer-to-peer communication network for exchanging information among controllers and resilient consensus algorithms for achieving reliable interconnection agreement. The network of 6 microgrids divided from a modified 123-node test distribution feeder is used to demonstrate the effectiveness of the proposed resilient communication scheme. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.05969 [pdf]

Integrated Multiport Bidirectional DC-DC Converter for HEV/FCV Applications

Authors: Bang Le-Huy Nguyen, Honnyong Cha, Tuyen Vu, Thai-Thanh Nguyen

Abstract: This paper proposes a novel integrated multiport bidirectional dc-dc converter to interface the battery, the ultra-capacitor, the fuel cell, or other energy sources with the dc-link capacitor of the hybrid energy systems such as the hybrid electric vehicle (HEV) and fuel cell vehicle (FCV) applications. The proposed converter can be applied to the distributed generation systems which include local… ▽ More This paper proposes a novel integrated multiport bidirectional dc-dc converter to interface the battery, the ultra-capacitor, the fuel cell, or other energy sources with the dc-link capacitor of the hybrid energy systems such as the hybrid electric vehicle (HEV) and fuel cell vehicle (FCV) applications. The proposed converter can be applied to the distributed generation systems which include local energy sources, storage, and loads. It can perform both buck and boost functions with fewer switches. In addition, it is extendable when more inputs and/or outputs are required. The operating principle and control strategy of the proposed converter will be analyzed in detail. For verification, simulation, and experimental results of the four utilized operating modes of an HEV/FCV are provided. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2209.05967 [pdf]

Power Converter Topologies for Electrolyzer Applications to Enable Electric Grid Services

Authors: Bang L. H. Nguyen, Mayank Panwar, Rob Hovsapian, Kazunori Nagasawa, Tuyen V. Vu

Abstract: Hydrogen electrolyzers, with their operational flexibility, can be configured as smart dynamic loads which can provide grid services and facilitate the integration of more renewable energy sources into the electrical grid. However, to enable this ability, the electrolyzer system should be able to control both active and reactive power in coordination with the low-level controller of the electrolyz… ▽ More Hydrogen electrolyzers, with their operational flexibility, can be configured as smart dynamic loads which can provide grid services and facilitate the integration of more renewable energy sources into the electrical grid. However, to enable this ability, the electrolyzer system should be able to control both active and reactive power in coordination with the low-level controller of the electrolyzer via the power electronics system interface between the utility grid and electrolyzer. This paper discusses power converter topologies and the control scheme of this power electronics interface for electrolyzer applications to enable electricity grid services. For the sake of unity, in this paper, we consider the power converter system interfacing the utility grid at the line-to-line root mean square RMS value of 480 VAC 60 Hz and supplying to the 3500 A 750 kW PEM electrolyzer stack. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2209.05962 [pdf]

Integrated Multiport Back-to-Back Power Converter for Type-4 Wind Turbine Generator with Hybrid Energy Storage System

Authors: Bang Le-Huy Nguyen, Thai-Thanh Nguyen, Van-Long Pham, Tuyen Vu, Mayank Panwar, Rob Hovsapian

Abstract: This paper proposes a novel integrated multiport bidirectional back-to-back power converter for a type-4 wind turbine that accommodates a battery and supercapacitor for energy storage. The circuit topology reduces 4 switches compared to the traditional configuration. Moreover, owing to the dual-buck structure embedded in the phase leg, the circuitry has no short-circuit path, therefore it withstan… ▽ More This paper proposes a novel integrated multiport bidirectional back-to-back power converter for a type-4 wind turbine that accommodates a battery and supercapacitor for energy storage. The circuit topology reduces 4 switches compared to the traditional configuration. Moreover, owing to the dual-buck structure embedded in the phase leg, the circuitry has no short-circuit path, therefore it withstands short-circuited events for a much longer time than the normal phase-leg and prevents the reverse current in turn-off recovery. The use of a hybrid energy storage system with battery and supercapacitor helps smooth out the power output under wind gusts and stabilizes the DC-link voltage under grid fault conditions. The case studies are carried out with a 1.5 MW wind turbine system. Simulation results are provided for the theoretical validation. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2208.00097 [pdf, other]

doi 10.1109/TGRS.2021.3105694

Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers

Authors: B. G. Palm, F. M. Bayer, R. Machado, M. I. Pettersson, V. T. Vu, R. J. Cintra

Abstract: The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter esti… ▽ More The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: 17 pages, 5 figures, 4 tables

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, v. 60, 2021

arXiv:2207.11400 [pdf, other]

doi 10.3390/s20072008

Wavelength-Resolution SAR Ground Scene Prediction Based on Image Stack

Authors: B. G. Palm, D. I. Alves, M. I. Pettersson, V. T. Vu, R. Machado, R. J. Cintra, F. M. Bayer, P. Dammert, H. Hellsten

Abstract: This paper presents five different statistical methods for ground scene prediction (GSP) in wavelength-resolution synthetic aperture radar (SAR) images. The GSP image can be used as a reference image in a change detection algorithm yielding a high probability of detection and low false alarm rate. The predictions are based on image stacks, which are composed of images from the same scene acquired… ▽ More This paper presents five different statistical methods for ground scene prediction (GSP) in wavelength-resolution synthetic aperture radar (SAR) images. The GSP image can be used as a reference image in a change detection algorithm yielding a high probability of detection and low false alarm rate. The predictions are based on image stacks, which are composed of images from the same scene acquired at different instants with the same flight geometry. The considered methods for obtaining the ground scene prediction include (i) autoregressive models; (ii) trimmed mean; (iii) median; (iv) intensity mean; and (v) mean. It is expected that the predicted image presents the true ground scene without change and preserves the ground backscattering pattern. The study indicate that the the median method provided the most accurate representation of the true ground. To show the applicability of the GSP, a change detection algorithm was considered using the median ground scene as a reference image. As a result, the median method displayed the probability of detection of $97\%$ and a false alarm rate of 0.11/km$^2, when considering military vehicles concealed in a forest. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: 15 pages, 8 figures, 3 tables

Journal ref: Sensors 2020, 20(7)

arXiv:2207.05549 [pdf, other]

PoeticTTS -- Controllable Poetry Reading for Literary Studies

Authors: Julia Koch, Florian Lux, Nadja Schauffler, Toni Bernhart, Felix Dieterle, Jonas Kuhn, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu

Abstract: Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements fo… ▽ More Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements for literary studies, we resynthesise poems by cloning prosodic values from a human reference recitation, and afterwards make use of fine-grained prosody control to manipulate the synthetic speech in a human-in-the-loop setting to alter the recitation w.r.t. specific phenomena. We find that finetuning our TTS model on poetry captures poetic intonation patterns to a large extent which is beneficial for prosody cloning and manipulation and verify the success of our approach both in an objective evaluation as well as in human studies. △ Less

Submitted 18 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: Presented at Interspeech 2022

arXiv:2207.04834 [pdf, other]

Speaker Anonymization with Phonetic Intermediate Representations

Authors: Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu

Abstract: In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic co… ▽ More In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic content as much as possible. Our experimental results on LibriSpeech and VCTK corpora reveal two key findings: 1) although automatic speech recognition produces imperfect transcriptions, our neural speech synthesis system can handle such errors, making our system feasible and robust, and 2) combining speaker embeddings from different resources is beneficial and their appropriate normalization is crucial. Overall, our final best system outperforms significantly the baselines provided in the Voice Privacy Challenge 2020 in terms of privacy robustness against a lazy-informed attacker while maintaining high intelligibility and naturalness of the anonymized speech. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted at Interspeech 2022

arXiv:2206.12229 [pdf, other]

Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech

Authors: Florian Lux, Julia Koch, Ngoc Thang Vu

Abstract: The cloning of a speaker's voice using an untranscribed reference sample is one of the great advances of modern neural text-to-speech (TTS) methods. Approaches for mimicking the prosody of a transcribed reference audio have also been proposed recently. In this work, we bring these two tasks together for the first time through utterance level normalization in conjunction with an utterance level spe… ▽ More The cloning of a speaker's voice using an untranscribed reference sample is one of the great advances of modern neural text-to-speech (TTS) methods. Approaches for mimicking the prosody of a transcribed reference audio have also been proposed recently. In this work, we bring these two tasks together for the first time through utterance level normalization in conjunction with an utterance level speaker embedding. We further introduce a lightweight aligner for extracting fine-grained prosodic features, that can be finetuned on individual samples within seconds. We show that it is possible to clone the voice of a speaker as well as the prosody of a spoken reference independently without any degradation in quality and high similarity to both original voice and prosody, as our objective evaluation and human study show. All of our code and trained models are available, alongside static and interactive demos. △ Less

Submitted 21 October, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

Comments: Accepted to IEEE SLT 2022

arXiv:2206.10832 [pdf, ps, other]

On Local Linear Convergence of Projected Gradient Descent for Unit-Modulus Least Squares

Authors: Trung Vu, Raviv Raich, Xiao Fu

Abstract: The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence… ▽ More The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence of PGD for UMLS often focus on global convergence to stationary points. As a non-convex problem, only a sublinear convergence rate has been established. However, these results do not explain the fast convergence of PGD frequently observed in practice. This manuscript presents a novel analysis of convergence of PGD for UMLS, justifying the linear convergence behavior of the algorithm near the solution. By exploiting the local structure of the objective function and the constraint set, we establish an exact expression for the convergence rate and characterize the conditions for linear convergence. Simulations show that our theoretical analysis corroborates numerical examples. Furthermore, variants of PGD with adaptive step sizes are proposed based on the new insight revealed in our convergence analysis. The variants show substantial acceleration in practice. △ Less

Submitted 1 July, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: 16 pages

Showing 1–50 of 114 results for author: Vu, T