-
Joint Optimization of Switching Point and Power Control in Dynamic TDD Cell-Free Massive MIMO
Authors:
Martin Andersson,
Tung T. Vu,
Pål Frenger,
Erik G. Larsson
Abstract:
We consider a cell-free massive multiple-input multiple-output (CFmMIMO) network operating in dynamic time division duplex (DTDD). The switching point between the uplink (UL) and downlink (DL) data transmission phases can be adapted dynamically to the instantaneous quality-of-service (QoS) requirements in order to improve energy efficiency (EE). To this end, we formulate a problem of optimizing th…
▽ More
We consider a cell-free massive multiple-input multiple-output (CFmMIMO) network operating in dynamic time division duplex (DTDD). The switching point between the uplink (UL) and downlink (DL) data transmission phases can be adapted dynamically to the instantaneous quality-of-service (QoS) requirements in order to improve energy efficiency (EE). To this end, we formulate a problem of optimizing the DTDD switching point jointly with the UL and DL power control coefficients, and the large-scale fading decoding (LSFD) weights for EE maximization. Then, we propose an iterative algorithm to solve the formulated challenging problem using successive convex approximation with an approximate stationary solution. Simulation results show that optimizing switching points remarkably improves EE compared with baseline schemes that adjust switching points heuristically.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Authors:
Thomas Bott,
Florian Lux,
Ngoc Thang Vu
Abstract:
In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points wi…
▽ More
In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points within a transformer-based architecture. Our approach is trained on merged emotional speech and text datasets and varies prompts in each training iteration to increase the generalization capabilities of the model. Objective and subjective evaluation results demonstrate the ability of the conditioned synthesis system to accurately transfer the emotions present in a prompt to speech. At the same time, precise tractability of speaker identities as well as overall high speech quality and intelligibility are maintained.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Authors:
Florian Lux,
Sarina Meyer,
Lyonel Behringer,
Frank Zalkow,
Phat Do,
Matt Coler,
Emanuël A. P. Habets,
Ngoc Thang Vu
Abstract:
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech syn…
▽ More
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
Authors:
Pavel Denisov,
Ngoc Thang Vu
Abstract:
Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness th…
▽ More
Recent advancements in language modeling have led to the emergence of Large Language Models (LLMs) capable of various natural language processing tasks. Despite their success in text-based tasks, applying LLMs to the speech domain remains limited and challenging. This paper presents BLOOMZMMS, a novel model that integrates a multilingual LLM with a multilingual speech encoder, aiming to harness the capabilities of LLMs for speech recognition and beyond. Utilizing a multi-instructional training approach, we demonstrate the transferability of linguistic knowledge from the text to the speech modality. Our experiments, conducted on 1900 hours of transcribed data from 139 languages, establish that a multilingual speech representation can be effectively learned and aligned with a multilingual LLM. While this learned representation initially shows limitations in task generalization, we address this issue by generating synthetic targets in a multi-instructional style. Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Enhancing Indoor and Outdoor THz Communications with Beyond Diagonal-IRS: Optimization and Performance Analysis
Authors:
Asad Mahmood,
Thang X. Vu,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
This work investigates the application of Beyond Diagonal Intelligent Reflective Surface (BD-IRS) to enhance THz downlink communication systems, operating in a hybrid: reflective and transmissive mode, to simultaneously provide services to indoor and outdoor users. We propose an optimization framework that jointly optimizes the beamforming vectors and phase shifts in the hybrid reflective/transmis…
▽ More
This work investigates the application of Beyond Diagonal Intelligent Reflective Surface (BD-IRS) to enhance THz downlink communication systems, operating in a hybrid: reflective and transmissive mode, to simultaneously provide services to indoor and outdoor users. We propose an optimization framework that jointly optimizes the beamforming vectors and phase shifts in the hybrid reflective/transmissive mode, aiming to maximize the system sum rate. To tackle the challenges in solving the joint design problem, we employ the conjugate gradient method and propose an iterative algorithm that successively optimizes the hybrid beamforming vectors and the phase shifts. Through comprehensive numerical simulations, our findings demonstrate a significant improvement in rate when compared to existing benchmark schemes, including time- and frequency-divided approaches, by approximately $30.5\%$ and $69.9\%$ respectively and even outperforms the STAR-IRS system by $76.99\%$. This underscores the significant influence of IRS elements on system performance relative to that of base station antennas, highlighting their pivotal role in advancing the communication system efficacy.
△ Less
Submitted 9 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Real-time hybrid controls of energy storage and load shedding for integrated power and energy systems of ships
Authors:
Linh Vu,
Thai-Thanh Nguyen,
Bang Le-Huy Nguyen,
Md Isfakul Anam,
Tuyen Vu
Abstract:
This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this pap…
▽ More
This paper presents an original energy management methodology to enhance the resilience of ship power systems. The integration of various energy storage systems (ESS), including battery energy storage systems (BESS) and super-capacitor energy storage systems (SCESS), in modern ship power systems poses challenges in designing an efficient energy management system (EMS). The EMS proposed in this paper aims to achieve multiple objectives. The primary objective is to minimize shed loads, while the secondary objective is to effectively manage different types of ESS. Considering the diverse ramp-rate characteristics of generators, SCESS, and BESS, the proposed EMS exploits these differences to determine an optimal long-term schedule for minimizing shed loads. Furthermore, the proposed EMS balances the state-of-charge (SoC) of ESS and prioritizes the SCESS's SoC levels to ensure the efficient operation of BESS and SCESS. For better computational efficiency, we introduce the receding horizon optimization method, enabling real-time EMS implementation. A comparison with the fixed horizon optimization (FHO) validates its effectiveness. Simulation studies and results demonstrate that the proposed EMS efficiently manages generators, BESS, and SCESS, ensuring system resilience under generation shortages. Additionally, the proposed methodology significantly reduces the computational burden compared to the FHO technique while maintaining acceptable resilience performance.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Cell-Free Massive MIMO with Multi-Antenna Users and Phase Misalignments: A Novel Partially Coherent Transmission Framework
Authors:
Unnikrishnan Kunnath Ganesan,
Tung Thanh Vu,
Erik G. Larsson
Abstract:
Cell-free massive multiple-input multiple-output (MIMO) is a promising technology for next-generation communication systems. This work proposes a novel partially coherent (PC) transmission framework to cope with the challenge of phase misalignment among the access points (APs), which is important for unlocking the full potential of cell-free massive MIMO technology. With the PC operation, the APs…
▽ More
Cell-free massive multiple-input multiple-output (MIMO) is a promising technology for next-generation communication systems. This work proposes a novel partially coherent (PC) transmission framework to cope with the challenge of phase misalignment among the access points (APs), which is important for unlocking the full potential of cell-free massive MIMO technology. With the PC operation, the APs are only required to be phase-aligned within clusters. Each cluster transmits the same data stream towards each user equipment (UE), while different clusters send different data streams. We first propose a novel algorithm to group APs into clusters such that the distance between two APs is always smaller than a reference distance ensuring the phase alignment of these APs. Then, we propose new algorithms that optimize the combining at UEs and precoding at APs to maximize the downlink sum data rates. We also propose a novel algorithm for data stream allocation to further improve the sum data rate of the PC operation. Numerical results show that the PC operation using the proposed framework with a sufficiently small reference distance can offer a sum rate close to the sum rate of the ideal fully coherent (FC) operation that requires network-wide phase alignment. This demonstrates the potential of PC operation in practical deployments of cell-free massive MIMO networks.
△ Less
Submitted 3 April, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
An Unobtrusive and Lightweight Ear-worn System for Continuous Epileptic Seizure Detection
Authors:
Abdul Aziz,
Nhat Pham,
Neel Vora,
Cody Reynolds,
Jaime Lehnen,
Pooja Venkatesh,
Zhuoran Yao,
Jay Harvey,
Tam Vu,
Kan Ding,
Phuc Nguyen
Abstract:
Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scal…
▽ More
Epilepsy is one of the most common neurological diseases globally, affecting around 50 million people worldwide. Fortunately, up to 70 percent of people with epilepsy could live seizure-free if properly diagnosed and treated, and a reliable technique to monitor the onset of seizures could improve the quality of life of patients who are constantly facing the fear of random seizure attacks. The scalp-based EEG test, despite being the gold standard for diagnosing epilepsy, is costly, necessitates hospitalization, demands skilled professionals for operation, and is discomforting for users. In this paper, we propose EarSD, a novel lightweight, unobtrusive, and socially acceptable ear-worn system to detect epileptic seizure onsets by measuring the physiological signals from behind the user's ears. EarSD includes an integrated custom-built sensing, computing, and communication PCB to collect and amplify the signals of interest, remove the noises caused by motion artifacts and environmental impacts, and stream the data wirelessly to the computer or mobile phone nearby, where data are uploaded to the host computer for further processing. We conducted both in-lab and in-hospital experiments with epileptic seizure patients who were hospitalized for seizure studies. The preliminary results confirm that EarSD can detect seizures with up to 95.3 percent accuracy by just using classical machine learning algorithms.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Joint User Association and Power Control for Cell-Free Massive MIMO
Authors:
Chongzheng Hao,
Tung Thanh Vu,
Hien Quoc Ngo,
Minh N. Dao,
Xiaoyu Dang,
Chenghua Wang,
Michail Matthaiou
Abstract:
This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs,…
▽ More
This work proposes novel approaches that jointly design user equipment (UE) association and power control (PC) in a downlink user-centric cell-free massive multiple-input multiple-output (CFmMIMO) network, where each UE is only served by a set of access points (APs) for reducing the fronthaul signalling and computational complexity. In order to maximize the sum spectral efficiency (SE) of the UEs, we formulate a mixed-integer nonconvex optimization problem under constraints on the per-AP transmit power, quality-of-service rate requirements, maximum fronthaul signalling load, and maximum number of UEs served by each AP. In order to solve the formulated problem efficiently, we propose two different schemes according to the different sizes of the CFmMIMO systems. For small-scale CFmMIMO systems, we present a successive convex approximation (SCA) method to obtain a stationary solution and also develop a learning-based method (JointCFNet) to reduce the computational complexity. For large-scale CFmMIMO systems, we propose a low-complexity suboptimal algorithm using accelerated projected gradient (APG) techniques. Numerical results show that our JointCFNet can yield similar performance and significantly decrease the run time compared with the SCA algorithm in small-scale systems. The presented APG approach is confirmed to run much faster than the SCA algorithm in the large-scale system while obtaining an SE performance close to that of the SCA approach. Moreover, the median sum SE of the APG method is up to about 2.8 fold higher than that of the heuristic baseline scheme.
△ Less
Submitted 20 May, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Physics-informed Graphical Neural Network for Power System State Estimation
Authors:
Quang-Ha Ngo,
Bang L. H. Nguyen,
Tuyen V. Vu,
Jianhua Zhang,
Tuan Ngo
Abstract:
State estimation is highly critical for accurately observing the dynamic behavior of the power grids and minimizing risks from cyber threats. However, existing state estimation methods encounter challenges in accurately capturing power system dynamics, primarily because of limitations in encoding the grid topology and sparse measurements. This paper proposes a physics-informed graphical learning s…
▽ More
State estimation is highly critical for accurately observing the dynamic behavior of the power grids and minimizing risks from cyber threats. However, existing state estimation methods encounter challenges in accurately capturing power system dynamics, primarily because of limitations in encoding the grid topology and sparse measurements. This paper proposes a physics-informed graphical learning state estimation method to address these limitations by leveraging both domain physical knowledge and a graph neural network (GNN). We employ a GNN architecture that can handle the graph-structured data of power systems more effectively than traditional data-driven methods. The physics-based knowledge is constructed from the branch current formulation, making the approach adaptable to both transmission and distribution systems. The validation results of three IEEE test systems show that the proposed method can achieve lower mean square error more than 20% than the conventional methods.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
User-centric Flexible Resource Management Framework for LEO Satellites with Fully Regenerative Payload
Authors:
Sovit Bhandari,
Thang X. Vu,
Symeon Chatzinotas
Abstract:
The regenerative capabilities of next-generation satellite systems offer a novel approach to design low earth orbit (LEO) satellite communication systems, enabling full flexibility in bandwidth and spot beam management, power control, and onboard data processing. These advancements allow the implementation of intelligent spatial multiplexing techniques, addressing the ever-increasing demand for fu…
▽ More
The regenerative capabilities of next-generation satellite systems offer a novel approach to design low earth orbit (LEO) satellite communication systems, enabling full flexibility in bandwidth and spot beam management, power control, and onboard data processing. These advancements allow the implementation of intelligent spatial multiplexing techniques, addressing the ever-increasing demand for future broadband data traffic. Existing satellite resource management solutions, however, do not fully exploit these capabilities. To address this issue, a novel framework called flexible resource management algorithm for LEO satellites (FLARE-LEO) is proposed to jointly design bandwidth, power, and spot beam coverage optimized for the geographic distribution of users. It incorporates multi-spot beam multicasting, spatial multiplexing, caching, and handover (HO). In particular, the spot beam coverage is optimized by using the unsupervised K-means algorithm applied to the realistic geographical user demands, followed by a proposed successive convex approximation (SCA)-based iterative algorithm for optimizing the radio resources. Furthermore, we propose two joint transmission architectures during the HO period, which jointly estimate the downlink channel state information (CSI) using deep learning and optimize the transmit power of the LEOs involved in the HO process to improve the overall system throughput. Simulations demonstrate superior performance in terms of delivery time reduction of the proposed algorithm over the existing solutions.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Joint Computation and Communication Resource Optimization for Beyond Diagonal UAV-IRS Empowered MEC Networks
Authors:
Asad Mahmood,
Thang X. Vu,
Wali Ullah Khan,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
Recent advancements in 6G systems signal a leap towards universal connectivity and ultra-reliable, low-latency communications for real-time data devices. Yet, these advancements encounter obstacles such as limited device battery life and computational power, along with urban signal blockages. To counter these, Intelligent Reconfigurable Surfaces (IRS) within Mobile Edge Cloud (MEC) infrastructures…
▽ More
Recent advancements in 6G systems signal a leap towards universal connectivity and ultra-reliable, low-latency communications for real-time data devices. Yet, these advancements encounter obstacles such as limited device battery life and computational power, along with urban signal blockages. To counter these, Intelligent Reconfigurable Surfaces (IRS) within Mobile Edge Cloud (MEC) infrastructures offer enhanced computing to overcome device limitations and create alternative communication paths. Despite these improvements, connectivity issues remain for remote areas. Our paper presents the Beyond Diagonal IRS (BD-IRS or IRS 2.0), integrated with UAVs in MEC networks (BD-IRS-UAV), providing on-demand links for remote users to offload tasks, tackling resource and battery limitations. We propose a joint optimization strategy to reduce system's worst-case latency and UAV hovering time by optimizing BD-IRS-UAV deployment and resource allocation. This challenge is approached by dividing it into two sub-problems: BD-IRS-UAV Placement and Computational Resource Optimization, and Communication Resource Optimization, each solved iteratively. This design significantly enhances system performance, showing a $17.75\%$ increase over traditional diagonal IRS and a $25.43\%$ improvement over IRS on buildings, with a $13.44\%$ enhancement in worst-case latency compared to binary offloading schemes.
△ Less
Submitted 15 March, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Constrained Independent Vector Analysis with Reference for Multi-Subject fMRI Analysis
Authors:
Trung Vu,
Francisco Laport,
Hanlu Yang,
Vince D. Calhoun,
Tulay Adali
Abstract:
Independent component analysis (ICA) is now a widely used solution for the analysis of multi-subject functional magnetic resonance imaging (fMRI) data. Independent vector analysis (IVA) generalizes ICA to multiple datasets, i.e., to multi-subject data, and in addition to higher-order statistical information in ICA, it leverages the statistical dependence across the datasets as an additional type o…
▽ More
Independent component analysis (ICA) is now a widely used solution for the analysis of multi-subject functional magnetic resonance imaging (fMRI) data. Independent vector analysis (IVA) generalizes ICA to multiple datasets, i.e., to multi-subject data, and in addition to higher-order statistical information in ICA, it leverages the statistical dependence across the datasets as an additional type of statistical diversity. As such, it preserves variability in the estimation of single-subject maps but its performance might suffer when the number of datasets increases. Constrained IVA is an effective way to bypass computational issues and improve the quality of separation by incorporating available prior information. Existing constrained IVA approaches often rely on user-defined threshold values to define the constraints. However, an improperly selected threshold can have a negative impact on the final results. This paper proposes two novel methods for constrained IVA: one using an adaptive-reverse scheme to select variable thresholds for the constraints and a second one based on a threshold-free formulation by leveraging the unique structure of IVA. We demonstrate that our solutions provide an attractive solution to multi-subject fMRI analysis both by simulations and through analysis of resting state fMRI data collected from 98 subjects -- the highest number of subjects ever used by IVA algorithms. Our results show that both proposed approaches obtain significantly better separation quality and model match while providing computationally efficient and highly reproducible solutions.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions
Authors:
Florian Lux,
Pascal Tilli,
Sarina Meyer,
Ngoc Thang Vu
Abstract:
Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intui…
▽ More
Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
The IMS Toucan System for the Blizzard Challenge 2023
Authors:
Florian Lux,
Julia Koch,
Sarina Meyer,
Thomas Bott,
Nadja Schauffler,
Pavel Denisov,
Antje Schweitzer,
Ngoc Thang Vu
Abstract:
For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synt…
▽ More
For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021. Our approach entails a rule-based text-to-phoneme processing system that includes rule-based disambiguation of homographs in the French language. It then transforms the phonemes to spectrograms as intermediate representations using a fast and efficient non-autoregressive synthesis architecture based on Conformer and Glow. A GAN based neural vocoder that combines recent state-of-the-art approaches converts the spectrogram to the final wave. We carefully designed the data processing, training, and inference procedures for the challenge data. Our system identifier is G. Open source code and demo are available.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
A reproducible 3D convolutional neural network with dual attention module (3D-DAM) for Alzheimer's disease classification
Authors:
Thanh Phuong Vu,
Tien Nhat Nguyen,
N. Minh Nhat Hoang,
Gia Minh Hoang
Abstract:
Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classifi…
▽ More
Alzheimer's disease is one of the most common types of neurodegenerative disease, characterized by the accumulation of amyloid-beta plaque and tau tangles. Recently, deep learning approaches have shown promise in Alzheimer's disease diagnosis. In this study, we propose a reproducible model that utilizes a 3D convolutional neural network with a dual attention module for Alzheimer's disease classification. We trained the model in the ADNI database and verified the generalizability of our method in two independent datasets (AIBL and OASIS1). Our method achieved state-of-the-art classification performance, with an accuracy of 91.94% for MCI progression classification and 96.30% for Alzheimer's disease classification on the ADNI dataset. Furthermore, the model demonstrated good generalizability, achieving an accuracy of 86.37% on the AIBL dataset and 83.42% on the OASIS1 dataset. These results indicate that our proposed approach has competitive performance and generalizability when compared to recent studies in the field.
△ Less
Submitted 4 March, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey
Authors:
Mai Le,
Thien Huynh-The,
Tan Do-Duy,
Thai-Hoc Vu,
Won-Joo Hwang,
Quoc-Viet Pham
Abstract:
The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI…
▽ More
The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI for better IoT services and applications. Therefore, existing AI-enabled IoT systems can be enhanced by implementing distributed machine learning (aka distributed learning) approaches. This work aims to provide a comprehensive survey on distributed learning for IoT services and applications in emerging networks. In particular, we first provide a background of machine learning and present a preliminary to typical distributed learning approaches, such as federated learning, multi-agent reinforcement learning, and distributed inference. Then, we provide an extensive review of distributed learning for critical IoT services (e.g., data sharing and computation offloading, localization, mobile crowdsensing, and security and privacy) and IoT applications (e.g., smart healthcare, smart grid, autonomous vehicle, aerial IoT networks, and smart industry). From the reviewed literature, we also present critical challenges of distributed learning for IoT and propose several promising solutions and research directions in this emerging area.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
Authors:
Pavel Denisov,
Ngoc Thang Vu
Abstract:
A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four…
▽ More
A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling. In this work, we propose a unified method that integrates multilingual pretrained speech and text models and performs E2E-SLU on six datasets in four languages in a generative manner, including the prediction of lexical fillers. We investigate how the proposed method can be improved by pretraining on widely available speech recognition data using several training objectives. Pretraining on 7000 hours of multilingual data allows us to outperform the state-of-the-art ultimately on two SLU datasets and partly on two more SLU datasets. Finally, we examine the cross-lingual capabilities of the proposed model and improve on the best known result on the PortMEDIA-Language dataset by almost half, achieving a Concept/Value Error Rate of 23.65%.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Joint Source-Channel Coding System for 6G Communication: Design, Prototype and Future Directions
Authors:
Xinchao Zhong,
Sean Longyu Ma,
Hong-fu Chou,
Arsham Mostaani,
Thang X. Vu,
Symeon Chatzinotas
Abstract:
The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication…
▽ More
The goal of semantic communication is to surpass optimal Shannon's criterion regarding a notable problem for future communication which lies in the integration of collaborative efforts between the intelligence of the transmission source and the joint design of source coding and channel coding. The convergence of scholarly investigation and applicable products in the field of semantic communication is facilitated by the utilization of flexible structural hardware design, which is constrained by the computational capabilities of edge devices. This characteristic represents a significant benefit of joint source-channel coding (JSCC), as it enables the generation of source alphabets with diverse lengths and achieves a code rate of unity. Moreover, JSCC exhibits near-capacity performance while maintaining low complexity. Therefore, we leverage not only quasi-cyclic (QC) characteristics to propose a QC-LDPC code-based JSCC scheme but also Unequal Error Protection (UEP) to ensure the recovery of semantic importance. In this study, the feasibility for using a semantic encoder/decoder that is aware of UEP can be explored based on the existing JSCC system. This approach is aimed at protecting the significance of semantic task-oriented information. Additionally, the deployment of a JSCC system can be facilitated by employing Low-Density Parity-Check (LDPC) codes on a reconfigurable device. This is achieved by reconstructing the LDPC codes as QC-LDPC codes. The QC-LDPC layered decoding technique, which has been specifically optimized for hardware parallelism and tailored for channel decoding applications, can be suitably adapted to accommodate the JSCC system. The performance of the proposed system is evaluated by conducting BER measurements using both floating-point and 6-bit quantization.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research
Authors:
Sarina Meyer,
Xiaoxiao Miao,
Ngoc Thang Vu
Abstract:
Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity…
▽ More
Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity of evaluation and the absence of user-friendly research frameworks. We therefore propose an efficient speaker anonymization and evaluation framework based on a modular and easily extendable structure, almost fully in Python. The framework facilitates the orchestration of several anonymization approaches in parallel and allows for interfacing between different techniques. Furthermore, we propose modifications to common evaluation methods which improves the quality of the evaluation and reduces their computation time by 65 to 95%, depending on the metric. Our code is fully open source.
△ Less
Submitted 21 December, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
A New Optimal Subpattern Assignment (OSPA) Metric for Multi-target Filtering
Authors:
Tuyet Vu
Abstract:
This paper proposes and evaluates a new metric. This metric will overcome a limitation of the Optimal Subpattern Assignment (OSPA) metric mentioned by Schuhmacher et al.: the OSPA distance between two sets of points is insensitive to the the case where one is empty. This proposed metric called Complete OSPA (COSPA), retains all the advantages of the OSPA metric for evaluating the performance of mu…
▽ More
This paper proposes and evaluates a new metric. This metric will overcome a limitation of the Optimal Subpattern Assignment (OSPA) metric mentioned by Schuhmacher et al.: the OSPA distance between two sets of points is insensitive to the the case where one is empty. This proposed metric called Complete OSPA (COSPA), retains all the advantages of the OSPA metric for evaluating the performance of multiple target filtering algorithms while also allowing separate control over the threshold of physical distance errors and cardinality errors.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Multi-agent Deep Reinforcement Learning for Distributed Load Restoration
Authors:
Linh Vu,
Tuyen Vu,
Thanh-Long Vu,
Anurag Srivastava
Abstract:
This paper addresses the load restoration problem after power outage events. Our primary proposed methodology is using multi-agent deep reinforcement learning to optimize the load restoration process in distribution systems, modeled as networked microgrids, via determining the optimal operational sequence of circuit breakers (switches). An innovative invalid action masking technique is incorporate…
▽ More
This paper addresses the load restoration problem after power outage events. Our primary proposed methodology is using multi-agent deep reinforcement learning to optimize the load restoration process in distribution systems, modeled as networked microgrids, via determining the optimal operational sequence of circuit breakers (switches). An innovative invalid action masking technique is incorporated into the multi-agent method to handle both the physical constraints in the restoration process and the curse of dimensionality as the action space of operational decisions grows exponentially with the number of circuit breakers. The features of our proposed method include centralized training for multi-agents to overcome non-stationary environment problems, decentralized execution to ease the deployment, and zero constraint violations to prevent harmful actions. Our simulations are performed in OpenDSS and Python environments to demonstrate the effectiveness of the proposed approach using the IEEE 13, 123, and 8500-node distribution test feeders. The results show that the proposed algorithm can achieve a significantly better learning curve and stability than the conventional methods.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
A Cyber-HIL for Investigating Control Systems in Ship Cyber Physical Systems under Communication Issues and Cyber Attacks
Authors:
Linh Vu,
Lam Nguyen,
Mahmoud Abdelaal,
Tuyen Vu,
Osama Mohammed
Abstract:
This paper presents a novel Cyber-Hardware-in-the-Loop (Cyber-HIL) platform for assessing control operation in ship cyber-physical systems. The proposed platform employs cutting-edge technologies, including Docker containers, real-time simulator $OPAL-RT$, and network emulator $ns3$, to create a secure and controlled testing and deployment environment for investigating the potential impact of cybe…
▽ More
This paper presents a novel Cyber-Hardware-in-the-Loop (Cyber-HIL) platform for assessing control operation in ship cyber-physical systems. The proposed platform employs cutting-edge technologies, including Docker containers, real-time simulator $OPAL-RT$, and network emulator $ns3$, to create a secure and controlled testing and deployment environment for investigating the potential impact of cyber attack threats on ship control systems. Real-time experiments were conducted using an advanced load-shedding controller as a control object in both synchronous and asynchronous manners, showcasing the platform's versatility and effectiveness in identifying vulnerabilities and improving overall Ship Cyber Physical System (SCPS) security. Furthermore, the performance of the load-shedding controller under cyber attacks was evaluated by conducting tests with man-in-the-middle (MITM) and denial-of-service (DoS) attacks. These attacks were implemented on the communication channels between the controller and the simulated ship system, emulating real-world scenarios. The proposed Cyber-HIL platform provides a comprehensive and effective approach to test and validate the security of ship control systems in the face of cyber threats.
△ Less
Submitted 25 August, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Joint Optimization of 3D Placement and Radio Resource Allocation for per-UAV Sum Rate Maximization
Authors:
Asad Mahmood,
Thang X. Vu,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
Unmanned aerial vehicles (UAV) have emerged as a practical solution that provides on-demand services to users in areas where the terrestrial network is non-existent or temporarily unavailable, e.g., due to natural disasters or network congestion. In general, UAVs' user-serving capacity is typically constrained by their limited battery life and the finite communication resources that highly impact…
▽ More
Unmanned aerial vehicles (UAV) have emerged as a practical solution that provides on-demand services to users in areas where the terrestrial network is non-existent or temporarily unavailable, e.g., due to natural disasters or network congestion. In general, UAVs' user-serving capacity is typically constrained by their limited battery life and the finite communication resources that highly impact their performance. This work considers the orthogonal frequency division multiple access (OFDMA) enabled multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive an efficient technique for the allocation of radio resources, $3$D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joint optimization problem into two sub-problems: (i) $3$D placement and user association and (ii) sum-rate maximization for optimal radio resource allocation, which are solved iteratively. The proposed iterative algorithm is shown via numerical results to achieve fast convergence speed after fewer than 10 iterations. The benefits of the proposed design are demonstrated via superior sum-rate performance compared to existing reference designs. Moreover, results showed that the optimal power and sub-carrier allocation help to mitigate the inter-cell interference that directly impacts the system's performance.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals
Authors:
Tu Vu,
Van Thong Huynh,
Soo-Hyung Kim
Abstract:
This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with s…
▽ More
This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions. Additionally, we utilize Transformer and Gaussian Transformation techniques to improve signal encoding effectiveness and overall performance. Our model achieves decent results on the CASE dataset of the EPiC competition, with an RMSE score of 1.45.
△ Less
Submitted 7 May, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Artificial Intelligence for Satellite Communication and Non-Terrestrial Networks: A Survey
Authors:
G. Fontanesi,
F. Ortíz,
E. Lagunas,
V. Monzon Baeza,
M. Á. Vázquez,
J. A. Vásquez-Peralvo,
M. Minardi,
H. N. Vu,
P. J. Honnaiah,
C. Lacoste,
Y. Drif,
T. S. Abdu,
G. Eappen,
J. Rehman,
L. M. Garcés-Socorrás,
W. A. Martins,
P. Henarejos,
H. Al-Hraishawi,
J. C. Merlano Duncan,
T. X. Vu,
S. Chatzinotas
Abstract:
This paper surveys the application and development of Artificial Intelligence (AI) in Satellite Communication (SatCom) and Non-Terrestrial Networks (NTN). We first present a comprehensive list of use cases, the relative challenges and the main AI tools capable of addressing those challenges. For each use case, we present the main motivation, a system description, the available non-AI solutions and…
▽ More
This paper surveys the application and development of Artificial Intelligence (AI) in Satellite Communication (SatCom) and Non-Terrestrial Networks (NTN). We first present a comprehensive list of use cases, the relative challenges and the main AI tools capable of addressing those challenges. For each use case, we present the main motivation, a system description, the available non-AI solutions and the potential benefits and available works using AI. We also discuss the pros and cons of an on-board and on-ground AI-based architecture, and we revise the current commercial and research activities relevant to this topic. Next, we describe the state-of-the-art hardware solutions for develo** ML in real satellite systems. Finally, we discuss the long-term developments of AI in the SatCom and NTN sectors and potential research directions. This paper provides a comprehensive and up-to-date overview of the opportunities and challenges offered by AI to improve the performance and efficiency of NTNs.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Multi-Objective Optimization for 3D Placement and Resource Allocation in OFDMA-based Multi-UAV Networks
Authors:
Asad Mahmood,
Thang X. Vu,
Shree Krishna Sharma,
Symeon Chatzinotas,
Björn Ottersten
Abstract:
This work considers the orthogonal frequency division multiple access (OFDMA) technology that enables multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive the optimal allocation of radio resources, 3D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joi…
▽ More
This work considers the orthogonal frequency division multiple access (OFDMA) technology that enables multiple unmanned aerial vehicles (multi-UAV) communication systems to provide on-demand services. The main aim of this work is to derive the optimal allocation of radio resources, 3D placement of UAVs, and user association matrices. To achieve the desired objectives, we decoupled the original joint optimization problem into two sub-problems: i) 3D placement and user association and ii) sum-rate maximization for optimal radio resource allocation, which are solved iteratively. The proposed iterative algorithm is shown via numerical results to achieve fast convergence speed after less than 10 iterations. The benefits of the proposed design are demonstrated via superior sum-rate performance compared to existing reference designs. Moreover, the results declared that the optimal power and sub-carrier allocation helped mitigate the co-cell interference that directly impacts the system's performance.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions
Authors:
Daniel Ortega,
Chia-Yu Li,
Ngoc Thang Vu
Abstract:
This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at develo** a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of u…
▽ More
This paper presents our latest investigation on modeling backchannel in conversations. Motivated by a proactive backchanneling theory, we aim at develo** a system which acts as a proactive listener by inserting backchannels, such as continuers and assessment, to influence speakers. Our model takes into account not only lexical and acoustic cues, but also introduces the simple and novel idea of using listener embeddings to mimic different backchanneling behaviours. Our experimental results on the Switchboard benchmark dataset reveal that acoustic cues are more important than lexical cues in this task and their combination with listener embeddings works best on both, manual transcriptions and automatically generated transcriptions.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Solving Differential-Algebraic Equations in Power System Dynamic Analysis with Quantum Computing
Authors:
Huynh Trung Thanh Tran,
Hieu T. Nguyen,
Long T. Vu,
Samuel T. Ojetola
Abstract:
Power system dynamics are generally modeled by high dimensional nonlinear differential-algebraic equations (DAEs) given a large number of components forming the network. These DAEs' complexity can grow exponentially due to the increasing penetration of distributed energy resources, whereas their computation time becomes sensitive due to the increasing interconnection of the power grid with other e…
▽ More
Power system dynamics are generally modeled by high dimensional nonlinear differential-algebraic equations (DAEs) given a large number of components forming the network. These DAEs' complexity can grow exponentially due to the increasing penetration of distributed energy resources, whereas their computation time becomes sensitive due to the increasing interconnection of the power grid with other energy systems. This paper demonstrates the use of quantum computing algorithms to solve DAEs for power system dynamic analysis. We leverage a symbolic programming framework to equivalently convert the power system's DAEs into ordinary differential equations (ODEs) using index reduction methods and then encode their data into qubits using amplitude encoding. The system nonlinearity is captured by Hamiltonian simulation with truncated Taylor expansion so that state variables can be updated by a quantum linear equation solver. Our results show that quantum computing can solve the power system's DAEs accurately with a computational complexity polynomial in the logarithm of the system dimension. We also illustrate the use of recent advanced tools in scientific machine learning for implementing complex computing concepts, i.e. Taylor expansion, DAEs/ODEs transformation, and quantum computing solver with abstract representation for power engineering applications.
△ Less
Submitted 1 March, 2024; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Network-Aided Intelligent Traffic Steering in 6G O-RAN: A Multi-Layer Optimization Framework
Authors:
Van-Dinh Nguyen,
Thang X. Vu,
Nhan Thanh Nguyen,
Dinh C. Nguyen,
Markku Juntti,
Nguyen Cong Luong,
Dinh Thai Hoang,
Diep N. Nguyen,
Symeon Chatzinotas
Abstract:
To enable an intelligent, programmable and multi-vendor radio access network (RAN) for 6G networks, considerable efforts have been made in standardization and development of open RAN (O-RAN). So far, however, the applicability of O-RAN in controlling and optimizing RAN functions has not been widely investigated. In this paper, we jointly optimize the flow-split distribution, congestion control and…
▽ More
To enable an intelligent, programmable and multi-vendor radio access network (RAN) for 6G networks, considerable efforts have been made in standardization and development of open RAN (O-RAN). So far, however, the applicability of O-RAN in controlling and optimizing RAN functions has not been widely investigated. In this paper, we jointly optimize the flow-split distribution, congestion control and scheduling (JFCS) to enable an intelligent traffic steering application in O-RAN. Combining tools from network utility maximization and stochastic optimization, we introduce a multi-layer optimization framework that provides fast convergence, long-term utility-optimality and significant delay reduction compared to the state-of-the-art and baseline RAN approaches. Our main contributions are three-fold: i) we propose the novel JFCS framework to efficiently and adaptively direct traffic to appropriate radio units; ii) we develop low-complexity algorithms based on the reinforcement learning, inner approximation and bisection search methods to effectively solve the JFCS problem in different time scales; and iii) the rigorous theoretical performance results are analyzed to show that there exists a scaling factor to improve the tradeoff between delay and utility-optimization. Collectively, the insights in this work will open the door towards fully automated networks with enhanced control and flexibility. Numerical results are provided to demonstrate the effectiveness of the proposed algorithms in terms of the convergence rate, long-term utility-optimality and delay reduction.
△ Less
Submitted 29 May, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Task-Effective Compression of Observations for the Centralized Control of a Multi-agent System Over Bit-Budgeted Channels
Authors:
Arsham Mostaani,
Thang X. Vu,
Symeon Chatzinotas,
Bjorn Ottersten
Abstract:
We consider a task-effective quantization problem that arises when multiple agents are controlled via a centralized controller (CC). While agents have to communicate their observations to the CC for decision-making, the bit-budgeted communications of agent-CC links may limit the task-effectiveness of the system which is measured by the system's average sum of stage costs/rewards. As a result, each…
▽ More
We consider a task-effective quantization problem that arises when multiple agents are controlled via a centralized controller (CC). While agents have to communicate their observations to the CC for decision-making, the bit-budgeted communications of agent-CC links may limit the task-effectiveness of the system which is measured by the system's average sum of stage costs/rewards. As a result, each agent should compress/quantize its observation such that the average sum of stage costs/rewards of the control task is minimally impacted. We address the problem of maximizing the average sum of stage rewards by proposing two different Action-Based State Aggregation (ABSA) algorithms that carry out the indirect and joint design of control and communication policies in the multi-agent system. While the applicability of ABSA-1 is limited to single-agent systems, it provides an analytical framework that acts as a step** stone to the design of ABSA-2. ABSA-2 carries out the joint design of control and communication for a multi-agent system. We evaluate the algorithms - with average return as the performance metric - using numerical experiments performed to solve a multi-agent geometric consensus problem. The numerical results are concluded by introducing a new metric that measures the effectiveness of communications in a multi-agent system.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
1-D Convolutional Graph Convolutional Networks for Fault Detection in Distributed Energy Systems
Authors:
Bang L. H. Nguyen,
Tuyen Vu,
Thai-Thanh Nguyen,
Mayank Panwar,
Rob Hovsapian
Abstract:
This paper presents a 1-D convolutional graph neural network for fault detection in microgrids. The combination of 1-D convolutional neural networks (1D-CNN) and graph convolutional networks (GCN) helps extract both spatial-temporal correlations from the voltage measurements in microgrids. The fault detection scheme includes fault event detection, fault type and phase classification, and fault loc…
▽ More
This paper presents a 1-D convolutional graph neural network for fault detection in microgrids. The combination of 1-D convolutional neural networks (1D-CNN) and graph convolutional networks (GCN) helps extract both spatial-temporal correlations from the voltage measurements in microgrids. The fault detection scheme includes fault event detection, fault type and phase classification, and fault location. There are five neural network model training to handle these tasks. Transfer learning and fine-tuning are applied to reduce training efforts. The combined recurrent graph convolutional neural networks (1D-CGCN) is compared with the traditional ANN structure on the Potsdam 13-bus microgrid dataset. The achievable accuracy of 99.27%, 98.1%, 98.75%, and 95.6% for fault detection, fault type classification, fault phase identification, and fault location respectively.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
Hierarchical Control of Grid-Connected Hydrogen Electrolyzer Providing Grid Services
Authors:
Bang L. H. Nguyen,
Mayank Panwar,
Rob Hovsapian,
Yashodhan Agalgaokar,
Tuyen Vu
Abstract:
This paper presents the operation modes and control architecture of the grid-connected hydrogen electrolyzer systems for the provision of frequency and voltage supports. The analysis is focused on the primary and secondary loops in the hierarchical control scheme. At the power converter inner control loop, the voltage- and current-control modes are analyzed. At the primary level, the droop and opp…
▽ More
This paper presents the operation modes and control architecture of the grid-connected hydrogen electrolyzer systems for the provision of frequency and voltage supports. The analysis is focused on the primary and secondary loops in the hierarchical control scheme. At the power converter inner control loop, the voltage- and current-control modes are analyzed. At the primary level, the droop and opposite droop control strategies to provide voltage and frequency support are described. Coordination between primary control and secondary, tertiary reserves is discussed. The case studies and real-time simulation results are provided using Typhoon HIL to back the theoretical investigation.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
A Large-Scale Study of a Sleep Tracking and Improving Device with Closed-loop and Personalized Real-time Acoustic Stimulation
Authors:
Anh Nguyen,
Galen Pogoncheff,
Ban Xuan Dong,
Nam Bui,
Hoang Truong,
Nhat Pham,
Linh Nguyen,
Hoang Huu Nguyen,
Sy Duong-Quy,
Sangtae Ha,
Tam Vu
Abstract:
Various intervention therapies ranging from pharmaceutical to hi-tech tailored solutions have been available to treat difficulty in falling asleep commonly caused by insomnia in modern life. However, current techniques largely remain ill-suited, ineffective, and unreliable due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, an ability to keep people asleep dur…
▽ More
Various intervention therapies ranging from pharmaceutical to hi-tech tailored solutions have been available to treat difficulty in falling asleep commonly caused by insomnia in modern life. However, current techniques largely remain ill-suited, ineffective, and unreliable due to their lack of precise real-time sleep tracking, in-time feedback on the therapies, an ability to keep people asleep during the night, and a large-scale effectiveness evaluation. Here, we introduce a novel sleep aid system, called Earable, that can continuously sense multiple head-based physiological signals and simultaneously enable closed-loop auditory stimulation to entrain brain activities in time for effective sleep promotion. We develop the system in a lightweight, comfortable, and user-friendly headband with a comprehensive set of algorithms and dedicated own-designed audio stimuli. We conducted multiple protocols from 883 sleep studies on 377 subjects (241 women, 119 men) wearing either a gold-standard device (PSG), Earable, or both concurrently. We demonstrate that our system achieves (1) a strong correlation (0.89 +/- 0.03) between the physiological signals acquired by Earable and those from the gold-standard PSG, (2) an 87.8 +/- 5.3% agreement on sleep scoring using our automatic real-time sleep staging algorithm with the consensus scored by three sleep technicians, and (3) a successful non-pharmacological stimulation alternative to effectively shorten the duration of sleep falling by 24.1 +/- 0.1 minutes. These results show that the efficacy of Earable exceeds existing techniques in intentions to promote fast falling asleep, track sleep state accurately, and achieve high social acceptance for real-time closed-loop personalized neuromodulation-based home sleep care.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Spatial-Temporal Recurrent Graph Neural Networks for Fault Diagnostics in Power Distribution Systems
Authors:
Bang Nguyen,
Tuyen Vu,
Thai-Thanh Nguyen,
Mayank Panwar,
Rob Hovsapian
Abstract:
Fault diagnostics are extremely important to decide proper actions toward fault isolation and system restoration. The growing integration of inverter-based distributed energy resources imposes strong influences on fault detection using traditional overcurrent relays. This paper utilizes emerging graph learning techniques to build a new temporal recurrent graph neural network models for fault diagn…
▽ More
Fault diagnostics are extremely important to decide proper actions toward fault isolation and system restoration. The growing integration of inverter-based distributed energy resources imposes strong influences on fault detection using traditional overcurrent relays. This paper utilizes emerging graph learning techniques to build a new temporal recurrent graph neural network models for fault diagnostics. The temporal recurrent graph neural network structures can extract the spatial-temporal features from data of voltage measurement units installed at the critical buses. From these features, fault event detection, fault type/phase classification, and fault location are performed. Compared with previous works, the proposed temporal recurrent graph neural networks provide a better generalization for fault diagnostics. Moreover, the proposed scheme retrieves the voltage signals instead of current signals so that there is no need to install relays at all lines of the distribution system. Therefore, the proposed scheme is generalizable and not limited by the number of relays installed. The effectiveness of the proposed method is comprehensively evaluated on the Potsdam microgrid and IEEE 123-node system in comparison with other neural network structures.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Low-Resource Multilingual and Zero-Shot Multispeaker TTS
Authors:
Florian Lux,
Julia Koch,
Ngoc Thang Vu
Abstract:
While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnosti…
▽ More
While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6,000 spoken languages. In this work, we bring together the tasks of zero-shot voice cloning and multilingual low-resource TTS. Using the language agnostic meta learning (LAML) procedure and modifications to a TTS encoder, we show that it is possible for a system to learn speaking a new language using just 5 minutes of training data while retaining the ability to infer the voice of even unseen speakers in the newly learned language. We show the success of our proposed approach in terms of intelligibility, naturalness and similarity to target speaker using objective metrics as well as human studies and provide our code and trained models open source.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
Authors:
Chia-Yu Li,
Ngoc Thang Vu
Abstract:
We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity map** loss to preserve relevant characteristics of the input feature after conv…
▽ More
We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycle-consistent loss and the identity map** loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text map**. Our experimental results on the WSJ eval92 and Voxforge (non English) show 8~8.5% character error rate reduction over the baseline, and the results on LibriSpeech test_clean also show noticeable improvement.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Intelligent Traffic Steering in Beyond 5G Open RAN based on LSTM Traffic Prediction
Authors:
Fatemeh Kavehmadavani,
Van-Dinh Nguyen,
Thang X. Vu,
Symeon Chatzinotas
Abstract:
Open radio access network (ORAN) Alliance offers a disaggregated RAN functionality built using open interface specifications between blocks. To efficiently support various competing services, \textit{namely} enhanced mobile broadband (eMBB) and ultra-reliable and low-latency (uRLLC), the ORAN Alliance has introduced a standard approach toward more virtualized, open and intelligent networks. To rea…
▽ More
Open radio access network (ORAN) Alliance offers a disaggregated RAN functionality built using open interface specifications between blocks. To efficiently support various competing services, \textit{namely} enhanced mobile broadband (eMBB) and ultra-reliable and low-latency (uRLLC), the ORAN Alliance has introduced a standard approach toward more virtualized, open and intelligent networks. To realize benefits of ORAN in optimizing resource utilization, this paper studies an intelligent traffic steering (TS) scheme within the proposed disaggregated ORAN architecture. For this purpose, we propose a joint intelligent traffic prediction, flow-split distribution, dynamic user association and radio resource management (JIFDR) framework in the presence of unknown dynamic traffic demands. To adapt to dynamic environments on different time scales, we decompose the formulated optimization problem into two long-term and short-term subproblems, where the optimality of the later is strongly dependent on the optimal dynamic traffic demand. We then apply a long-short-term memory (LSTM) model to effectively solve the long-term subproblem, aiming to predict dynamic traffic demands, RAN slicing, and flow-split decisions. The resulting non-convex short-term subproblem is converted to a more computationally tractable form by exploiting successive convex approximations. Finally, simulation results are provided to demonstrate the effectiveness of the proposed algorithms compared to several well-known benchmark schemes.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy
Authors:
Sarina Meyer,
Pascal Tilli,
Pavel Denisov,
Florian Lux,
Julia Koch,
Ngoc Thang Vu
Abstract:
In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. One of the challenges in this context is to create non-existent voices that sound as natural as possi…
▽ More
In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. One of the challenges in this context is to create non-existent voices that sound as natural as possible.
In this work, we propose to tackle this issue by generating speaker embeddings using a generative adversarial network with Wasserstein distance as cost function. By incorporating these artificial embeddings into a speech-to-text-to-speech pipeline, we outperform previous approaches in terms of privacy and utility. According to standard objective metrics and human evaluation, our approach generates intelligible and content-preserving yet privacy-protecting versions of the original recordings.
△ Less
Submitted 20 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Almost-lossless compression of a low-rank random tensor
Authors:
Minh Thanh Vu
Abstract:
In this work, we establish an asymptotic limit of almost-lossless compression of a random, finite alphabet tensor which admits a low-rank canonical polyadic decomposition.
In this work, we establish an asymptotic limit of almost-lossless compression of a random, finite alphabet tensor which admits a low-rank canonical polyadic decomposition.
△ Less
Submitted 23 October, 2022; v1 submitted 8 October, 2022;
originally announced October 2022.
-
Resilient Communication Scheme for Distributed Decision of InterconnectingNetworks of Microgrids
Authors:
Thanh Long Vu,
Sayak Mukherjee,
Veronica Adetola
Abstract:
Networking of microgrids can provide the operational flexibility needed for the increasing number of DERs deployed at the distribution level and supporting end-use demand when there is loss of the bulk power system. But, networked microgrids are vulnerable to cyber-physical attacks and faults due to the complex interconnections. As such, it is necessary to design resilient control systems to suppo…
▽ More
Networking of microgrids can provide the operational flexibility needed for the increasing number of DERs deployed at the distribution level and supporting end-use demand when there is loss of the bulk power system. But, networked microgrids are vulnerable to cyber-physical attacks and faults due to the complex interconnections. As such, it is necessary to design resilient control systems to support the operations of networked microgrids in responses to cyber-physical attacks and faults. This paper introduces a resilient communication scheme for interconnecting multiple microgrids to support critical demand, in which the interconnection decision can be made distributedly by each microgrid controller even in the presence of cyberattacks to some communication links or microgrid controllers. This scheme blends a randomized peer-to-peer communication network for exchanging information among controllers and resilient consensus algorithms for achieving reliable interconnection agreement. The network of 6 microgrids divided from a modified 123-node test distribution feeder is used to demonstrate the effectiveness of the proposed resilient communication scheme.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Integrated Multiport Bidirectional DC-DC Converter for HEV/FCV Applications
Authors:
Bang Le-Huy Nguyen,
Honnyong Cha,
Tuyen Vu,
Thai-Thanh Nguyen
Abstract:
This paper proposes a novel integrated multiport bidirectional dc-dc converter to interface the battery, the ultra-capacitor, the fuel cell, or other energy sources with the dc-link capacitor of the hybrid energy systems such as the hybrid electric vehicle (HEV) and fuel cell vehicle (FCV) applications. The proposed converter can be applied to the distributed generation systems which include local…
▽ More
This paper proposes a novel integrated multiport bidirectional dc-dc converter to interface the battery, the ultra-capacitor, the fuel cell, or other energy sources with the dc-link capacitor of the hybrid energy systems such as the hybrid electric vehicle (HEV) and fuel cell vehicle (FCV) applications. The proposed converter can be applied to the distributed generation systems which include local energy sources, storage, and loads. It can perform both buck and boost functions with fewer switches. In addition, it is extendable when more inputs and/or outputs are required. The operating principle and control strategy of the proposed converter will be analyzed in detail. For verification, simulation, and experimental results of the four utilized operating modes of an HEV/FCV are provided.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Power Converter Topologies for Electrolyzer Applications to Enable Electric Grid Services
Authors:
Bang L. H. Nguyen,
Mayank Panwar,
Rob Hovsapian,
Kazunori Nagasawa,
Tuyen V. Vu
Abstract:
Hydrogen electrolyzers, with their operational flexibility, can be configured as smart dynamic loads which can provide grid services and facilitate the integration of more renewable energy sources into the electrical grid. However, to enable this ability, the electrolyzer system should be able to control both active and reactive power in coordination with the low-level controller of the electrolyz…
▽ More
Hydrogen electrolyzers, with their operational flexibility, can be configured as smart dynamic loads which can provide grid services and facilitate the integration of more renewable energy sources into the electrical grid. However, to enable this ability, the electrolyzer system should be able to control both active and reactive power in coordination with the low-level controller of the electrolyzer via the power electronics system interface between the utility grid and electrolyzer. This paper discusses power converter topologies and the control scheme of this power electronics interface for electrolyzer applications to enable electricity grid services. For the sake of unity, in this paper, we consider the power converter system interfacing the utility grid at the line-to-line root mean square RMS value of 480 VAC 60 Hz and supplying to the 3500 A 750 kW PEM electrolyzer stack.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Integrated Multiport Back-to-Back Power Converter for Type-4 Wind Turbine Generator with Hybrid Energy Storage System
Authors:
Bang Le-Huy Nguyen,
Thai-Thanh Nguyen,
Van-Long Pham,
Tuyen Vu,
Mayank Panwar,
Rob Hovsapian
Abstract:
This paper proposes a novel integrated multiport bidirectional back-to-back power converter for a type-4 wind turbine that accommodates a battery and supercapacitor for energy storage. The circuit topology reduces 4 switches compared to the traditional configuration. Moreover, owing to the dual-buck structure embedded in the phase leg, the circuitry has no short-circuit path, therefore it withstan…
▽ More
This paper proposes a novel integrated multiport bidirectional back-to-back power converter for a type-4 wind turbine that accommodates a battery and supercapacitor for energy storage. The circuit topology reduces 4 switches compared to the traditional configuration. Moreover, owing to the dual-buck structure embedded in the phase leg, the circuitry has no short-circuit path, therefore it withstands short-circuited events for a much longer time than the normal phase-leg and prevents the reverse current in turn-off recovery. The use of a hybrid energy storage system with battery and supercapacitor helps smooth out the power output under wind gusts and stabilizes the DC-link voltage under grid fault conditions. The case studies are carried out with a 1.5 MW wind turbine system. Simulation results are provided for the theoretical validation.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers
Authors:
B. G. Palm,
F. M. Bayer,
R. Machado,
M. I. Pettersson,
V. T. Vu,
R. J. Cintra
Abstract:
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter esti…
▽ More
The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
Wavelength-Resolution SAR Ground Scene Prediction Based on Image Stack
Authors:
B. G. Palm,
D. I. Alves,
M. I. Pettersson,
V. T. Vu,
R. Machado,
R. J. Cintra,
F. M. Bayer,
P. Dammert,
H. Hellsten
Abstract:
This paper presents five different statistical methods for ground scene prediction (GSP) in wavelength-resolution synthetic aperture radar (SAR) images. The GSP image can be used as a reference image in a change detection algorithm yielding a high probability of detection and low false alarm rate. The predictions are based on image stacks, which are composed of images from the same scene acquired…
▽ More
This paper presents five different statistical methods for ground scene prediction (GSP) in wavelength-resolution synthetic aperture radar (SAR) images. The GSP image can be used as a reference image in a change detection algorithm yielding a high probability of detection and low false alarm rate. The predictions are based on image stacks, which are composed of images from the same scene acquired at different instants with the same flight geometry. The considered methods for obtaining the ground scene prediction include (i) autoregressive models; (ii) trimmed mean; (iii) median; (iv) intensity mean; and (v) mean. It is expected that the predicted image presents the true ground scene without change and preserves the ground backscattering pattern. The study indicate that the the median method provided the most accurate representation of the true ground. To show the applicability of the GSP, a change detection algorithm was considered using the median ground scene as a reference image. As a result, the median method displayed the probability of detection of $97\%$ and a false alarm rate of 0.11/km$^2, when considering military vehicles concealed in a forest.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
PoeticTTS -- Controllable Poetry Reading for Literary Studies
Authors:
Julia Koch,
Florian Lux,
Nadja Schauffler,
Toni Bernhart,
Felix Dieterle,
Jonas Kuhn,
Sandra Richter,
Gabriel Viehhauser,
Ngoc Thang Vu
Abstract:
Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements fo…
▽ More
Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements for literary studies, we resynthesise poems by cloning prosodic values from a human reference recitation, and afterwards make use of fine-grained prosody control to manipulate the synthetic speech in a human-in-the-loop setting to alter the recitation w.r.t. specific phenomena. We find that finetuning our TTS model on poetry captures poetic intonation patterns to a large extent which is beneficial for prosody cloning and manipulation and verify the success of our approach both in an objective evaluation as well as in human studies.
△ Less
Submitted 18 October, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Speaker Anonymization with Phonetic Intermediate Representations
Authors:
Sarina Meyer,
Florian Lux,
Pavel Denisov,
Julia Koch,
Pascal Tilli,
Ngoc Thang Vu
Abstract:
In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic co…
▽ More
In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic content as much as possible. Our experimental results on LibriSpeech and VCTK corpora reveal two key findings: 1) although automatic speech recognition produces imperfect transcriptions, our neural speech synthesis system can handle such errors, making our system feasible and robust, and 2) combining speaker embeddings from different resources is beneficial and their appropriate normalization is crucial. Overall, our final best system outperforms significantly the baselines provided in the Voice Privacy Challenge 2020 in terms of privacy robustness against a lazy-informed attacker while maintaining high intelligibility and naturalness of the anonymized speech.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Authors:
Florian Lux,
Julia Koch,
Ngoc Thang Vu
Abstract:
The cloning of a speaker's voice using an untranscribed reference sample is one of the great advances of modern neural text-to-speech (TTS) methods. Approaches for mimicking the prosody of a transcribed reference audio have also been proposed recently. In this work, we bring these two tasks together for the first time through utterance level normalization in conjunction with an utterance level spe…
▽ More
The cloning of a speaker's voice using an untranscribed reference sample is one of the great advances of modern neural text-to-speech (TTS) methods. Approaches for mimicking the prosody of a transcribed reference audio have also been proposed recently. In this work, we bring these two tasks together for the first time through utterance level normalization in conjunction with an utterance level speaker embedding. We further introduce a lightweight aligner for extracting fine-grained prosodic features, that can be finetuned on individual samples within seconds. We show that it is possible to clone the voice of a speaker as well as the prosody of a spoken reference independently without any degradation in quality and high similarity to both original voice and prosody, as our objective evaluation and human study show. All of our code and trained models are available, alongside static and interactive demos.
△ Less
Submitted 21 October, 2022; v1 submitted 24 June, 2022;
originally announced June 2022.
-
On Local Linear Convergence of Projected Gradient Descent for Unit-Modulus Least Squares
Authors:
Trung Vu,
Raviv Raich,
Xiao Fu
Abstract:
The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence…
▽ More
The unit-modulus least squares (UMLS) problem has a wide spectrum of applications in signal processing, e.g., phase-only beamforming, phase retrieval, radar code design, and sensor network localization. Scalable first-order methods such as projected gradient descent (PGD) have recently been studied as a simple yet efficient approach to solving the UMLS problem. Existing results on the convergence of PGD for UMLS often focus on global convergence to stationary points. As a non-convex problem, only a sublinear convergence rate has been established. However, these results do not explain the fast convergence of PGD frequently observed in practice. This manuscript presents a novel analysis of convergence of PGD for UMLS, justifying the linear convergence behavior of the algorithm near the solution. By exploiting the local structure of the objective function and the constraint set, we establish an exact expression for the convergence rate and characterize the conditions for linear convergence. Simulations show that our theoretical analysis corroborates numerical examples. Furthermore, variants of PGD with adaptive step sizes are proposed based on the new insight revealed in our convergence analysis. The variants show substantial acceleration in practice.
△ Less
Submitted 1 July, 2022; v1 submitted 22 June, 2022;
originally announced June 2022.