-
Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments
Authors:
Yongming Huang,
Xiaohu You,
Hang Zhan,
Shiwen He,
Ningning Fu,
Wei Xu
Abstract:
Intelligent communications have played a pivotal role in sha** the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only…
▽ More
Intelligent communications have played a pivotal role in sha** the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach
Authors:
Weide Liu,
Hui**g Zhan,
Hao Chen,
Fengmao Lv
Abstract:
Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate betw…
▽ More
Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision.
△ Less
Submitted 18 June, 2024; v1 submitted 28 December, 2023;
originally announced January 2024.
-
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Authors:
Yi Meng,
Xiang Li,
Zhiyong Wu,
Tingtian Li,
Zixun Sun,
Xinyu Xiao,
Chi Sun,
Hui Zhan,
Helen Meng
Abstract:
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but…
▽ More
To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but also style irrelevant information. The information irrelevant to speaking style in the text could interfere the reference audio selection and result in improper speaking styles. To improve the reference selection, we propose Contrastive Acoustic-Linguistic Module (CALM) to extract the Style-related Text Feature (STF) from the text. CALM optimizes the correlation between the speaking style embedding and the extracted STF with contrastive learning. Thus, a certain number of the most appropriate reference speeches for the input text are selected by retrieving the speeches with the top STF similarities. Then the style embeddings are weighted summarized according to their STF similarities and used to stylize the synthesized speech of TTS. Experiment results demonstrate the effectiveness of our proposed approach, with both objective evaluations and subjective evaluations on the speaking styles of the synthesized speeches outperform a baseline approach with semantic-feature-based reference selection.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Diffractive deep neural network based adaptive optics scheme for vortex beam in oceanic turbulence
Authors:
Haichao Zhan,
Le Wang,
Wennai Wang,
Shengmei Zhao
Abstract:
Vortex beam carrying orbital angular momentum (OAM) is disturbed by oceanic turbulence (OT) when propagating in underwater wireless optical communication (UWOC) system. Adaptive optics (AO) is used to compensate for distortion and improve the performance of the UWOC system. In this work, we propose a diffractive deep neural network (DDNN) based AO scheme to compensate for the distortion caused by…
▽ More
Vortex beam carrying orbital angular momentum (OAM) is disturbed by oceanic turbulence (OT) when propagating in underwater wireless optical communication (UWOC) system. Adaptive optics (AO) is used to compensate for distortion and improve the performance of the UWOC system. In this work, we propose a diffractive deep neural network (DDNN) based AO scheme to compensate for the distortion caused by OT, where the DDNN is trained to obtain the map** between the distortion intensity distribution of the vortex beam and its corresponding phase screen representating OT. The intensity pattern of the distorted vortex beam obtained in the experiment is input to the DDNN model, and the predicted phase screen can be used to compensate the distortion in real time. The experiment results show that the proposed scheme can extract quickly the characteristics of the intensity pattern of the distorted vortex beam, and output accurately the predicted phase screen. The mode purity of the compensated vortex beam is significantly improved, even with a strong OT. Our scheme may provide a new avenue for AO techniques, and is expected to promote the communication quality of UWOC system.
△ Less
Submitted 6 February, 2022;
originally announced February 2022.
-
Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech
Authors:
Haoyue Zhan,
Xinyuan Yu,
Haitong Zhang,
Yang Zhang,
Yue Lin
Abstract:
In this paper, we study the disentanglement of speaker and language representations in non-autoregressive cross-lingual TTS models from various aspects. We propose a phoneme length regulator that solves the length mismatch problem between IPA input sequence and monolingual alignment results. Using the phoneme length regulator, we present a FastPitch-based cross-lingual model with IPA symbols as in…
▽ More
In this paper, we study the disentanglement of speaker and language representations in non-autoregressive cross-lingual TTS models from various aspects. We propose a phoneme length regulator that solves the length mismatch problem between IPA input sequence and monolingual alignment results. Using the phoneme length regulator, we present a FastPitch-based cross-lingual model with IPA symbols as input representations. Our experiments show that language-independent input representations (e.g. IPA symbols), an increasing number of training speakers, and explicit modeling of speech variance information all encourage non-autoregressive cross-lingual TTS model to disentangle speaker and language representations. The subjective evaluation shows that our proposed model can achieve decent naturalness and speaker similarity in cross-language voice cloning.
△ Less
Submitted 30 August, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Revisiting IPA-based Cross-lingual Text-to-speech
Authors:
Haitong Zhang,
Haoyue Zhan,
Yang Zhang,
Xinyuan Yu,
Yue Lin
Abstract:
International Phonetic Alphabet (IPA) has been widely used in cross-lingual text-to-speech (TTS) to achieve cross-lingual voice cloning (CL VC). However, IPA itself has been understudied in cross-lingual TTS. In this paper, we report some empirical findings of building a cross-lingual TTS model using IPA as inputs. Experiments show that the way to process the IPA and suprasegmental sequence has a…
▽ More
International Phonetic Alphabet (IPA) has been widely used in cross-lingual text-to-speech (TTS) to achieve cross-lingual voice cloning (CL VC). However, IPA itself has been understudied in cross-lingual TTS. In this paper, we report some empirical findings of building a cross-lingual TTS model using IPA as inputs. Experiments show that the way to process the IPA and suprasegmental sequence has a negligible impact on the CL VC performance. Furthermore, we find that using a dataset including one speaker per language to build an IPA-based TTS system would fail CL VC since the language-unique IPA and tone/stress symbols could leak the speaker information. In addition, we experiment with different combinations of speakers in the training dataset to further investigate the effect of the number of speakers on the CL VC performance.
△ Less
Submitted 18 October, 2021; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Relationship Explainable Multi-objective Optimization Via Vector Value Function Based Reinforcement Learning
Authors:
Huixin Zhan,
Yongcan Cao
Abstract:
Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. Ho…
▽ More
Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. However, optimizing the scalarized (and weighted) loss does not necessarily provide a guarantee of high performance on each possibly conflicting objective. In this paper, we propose a vector value based reinforcement learning approach that seeks to explicitly learn the inter-objective relationship and optimize multiple objectives based on the learned relationship. In particular, the proposed method is to first define relationship matrix, a mathematical representation of the inter-objective relationship, and then create one actor and multiple critics that can co-learn the relationship matrix and action selection. The proposed approach can quantify the inter-objective relationship via reinforcement learning when the impact of one objective on another is unknown a prior. We also provide rigorous convergence analysis of the proposed approach and present a quantitative evaluation of the approach based on two testing scenarios.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation
Authors:
Huixin Zhan,
Yongcan Cao
Abstract:
Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. Ho…
▽ More
Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. However, optimizing the scalarized (and weighted) loss does not necessarily provide guarantee of high performance on each possibly conflicting objective because it is challenging to assign the right weights without knowing the relationship among these objectives. Moreover, the effectiveness of these gradient descent algorithms is limited by the agent's ability to explain their decisions and actions to human users. The purpose of this study is two-fold. First, we propose a vector value function based multi-objective reinforcement learning (V2f-MORL) approach that seeks to quantify the inter-objective relationship via reinforcement learning (RL) when the impact of one objective on others is unknown a prior. In particular, we construct one actor and multiple critics that can co-learn the policy and inter-objective relationship matrix (IORM), quantifying the impact of objectives on each other, in an iterative way. Second, we provide a semantic representation that can uncover the trade-off of decision policies made by users to reconcile conflicting objectives based on the proposed V2f-MORL approach for the explainability of the generated behaviors subject to given optimization objectives. We demonstrate the effectiveness of the proposed approach via a MuJoCo based robotics case study.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Controllable Planning, Responsibility, and Information in Automatic Driving Technology
Authors:
Dan Wan,
Hao Zhan
Abstract:
People hope automated driving technology should be always in a stable and controllable state, accurately, which can be divided into controllable planning, responsibility, and information. Otherwise, it would bring about the problems of tram dilemma, responsibility attribution, information leakage, and security. This article discusses these three types of issues separately and clarifies some misund…
▽ More
People hope automated driving technology should be always in a stable and controllable state, accurately, which can be divided into controllable planning, responsibility, and information. Otherwise, it would bring about the problems of tram dilemma, responsibility attribution, information leakage, and security. This article discusses these three types of issues separately and clarifies some misunderstandings.
△ Less
Submitted 27 June, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.