-
Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems
Authors:
Yiyang Zhu,
Enyu Shi,
Ziheng Liu,
Jiayi Zhang,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into…
▽ More
Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into the CF mMIMO system as a low-cost and power-efficient solution. In this paper, we focus on optimizing the joint precoding design of the RIS-aided CF mMIMO system to maximize the sum SE. This involves optimizing the precoding matrix at the APs and the reflection coefficients at the RIS. To tackle this problem, we propose a fully distributed multi-agent reinforcement learning (MARL) algorithm that incorporates fuzzy logic (FL). Unlike conventional approaches that rely on alternating optimization techniques, our FL-based MARL algorithm only requires local channel state information, which reduces the need for high backhaul capacity. Simulation results demonstrate that our proposed FL-MARL algorithm effectively reduces computational complexity while achieving similar performance as conventional MARL methods.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models
Authors:
Rita Frieske,
Bertram E. Shi
Abstract:
Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of halluci…
▽ More
Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of hallucinations to probable natural language outputs of the model creates a danger of deception and impacts the credibility of the system. We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models. To address this, we propose a perturbation-based method for assessing the susceptibility of an automatic speech recognition (ASR) model to hallucination at test time, which does not require access to the training dataset. We demonstrate that this method helps to distinguish between hallucinatory and non-hallucinatory models that have similar baseline word error rates. We further explore the relationship between the types of ASR errors and the types of dataset noise to determine what types of noise are most likely to create hallucinatory outputs. We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency. Finally, we discover how to induce hallucinations with a random noise injection to the utterance.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
RIS-Aided Cell-Free Massive MIMO Systems for 6G: Fundamentals, System Design, and Applications
Authors:
Enyu Shi,
Jiayi Zhang,
Hongyang Du,
Bo Ai,
Chau Yuen,
Dusit Niyato,
Khaled B. Letaief,
Xuemin Shen
Abstract:
An introduction of intelligent interconnectivity for people and things has posed higher demands and more challenges for sixth-generation (6G) networks, such as high spectral efficiency and energy efficiency, ultra-low latency, and ultra-high reliability. Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS), also called intelligent reflecting su…
▽ More
An introduction of intelligent interconnectivity for people and things has posed higher demands and more challenges for sixth-generation (6G) networks, such as high spectral efficiency and energy efficiency, ultra-low latency, and ultra-high reliability. Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS), also called intelligent reflecting surface (IRS), are two promising technologies for co** with these unprecedented demands. Given their distinct capabilities, integrating the two technologies to further enhance wireless network performances has received great research and development attention. In this paper, we provide a comprehensive survey of research on RIS-aided CF mMIMO wireless communication systems. We first introduce system models focusing on system architecture and application scenarios, channel models, and communication protocols. Subsequently, we summarize the relevant studies on system operation and resource allocation, providing in-depth analyses and discussions. Following this, we present practical challenges faced by RIS-aided CF mMIMO systems, particularly those introduced by RIS, such as hardware impairments and electromagnetic interference. We summarize corresponding analyses and solutions to further facilitate the implementation of RIS-aided CF mMIMO systems. Furthermore, we explore an interplay between RIS-aided CF mMIMO and other emerging 6G technologies, such as next-generation multiple-access (NGMA), simultaneous wireless information and power transfer (SWIPT), and millimeter wave (mmWave). Finally, we outline several research directions for future RIS-aided CF mMIMO systems.
△ Less
Submitted 22 May, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Uplink Performance of RIS-aided Cell-Free Massive MIMO System with Electromagnetic Interference
Authors:
Enyu Shi,
Jiayi Zhang,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (MIMO) and reconfigurable intelligent surface (RIS) are two promising technologies for realizing future beyond-fifth generation (B5G) networks. In this paper, we consider a practical spatially correlated RIS-aided CF massive MIMO system with multi-antenna access points (APs) over spatially correlated fading channels. Different from previous wor…
▽ More
Cell-free (CF) massive multiple-input multiple-output (MIMO) and reconfigurable intelligent surface (RIS) are two promising technologies for realizing future beyond-fifth generation (B5G) networks. In this paper, we consider a practical spatially correlated RIS-aided CF massive MIMO system with multi-antenna access points (APs) over spatially correlated fading channels. Different from previous work, the electromagnetic interference (EMI) at RIS is considered to further characterize the system performance of the actual environment. Then, we derive the closed-form expression for the system spectral efficiency (SE) with the maximum ratio (MR) combining at the APs and the large-scale fading decoding (LSFD) at the central processing unit (CPU). Moreover, to counteract the near-far effect and EMI, we propose practical fractional power control (FPC) and max-min power control algorithms to further improve the system performance. We unveil the impact of EMI, channel correlations, and different signal processing methods on the uplink SE of user equipments (UEs). The accuracy of our derived analytical results is verified by extensive Monte-Carlo simulations. Our results show that the EMI can substantially degrade the SE, especially for those UEs with unsatisfactory channel conditions. Besides, increasing the number of RIS elements is always beneficial in terms of the SE, but with diminishing returns when the number of RIS elements is sufficiently large. Furthermore, the existence of spatial correlations among RIS elements can deteriorate the system performance when RIS is impaired by EMI.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Uplink Performance of RIS-aided Cell-Free Massive MIMO System Over Spatially Correlated Channels
Authors:
Enyu Shi,
Jiayi Zhang,
Zhe Wang,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
We consider a practical spatially correlated reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input-multiple-output (mMIMO) system with multi-antenna access points (APs) over spatially correlated Rician fading channels. The minimum mean square error (MMSE) channel estimator is adopted to estimate the aggregated RIS channels. Then, we investigate the uplink spectral ef…
▽ More
We consider a practical spatially correlated reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input-multiple-output (mMIMO) system with multi-antenna access points (APs) over spatially correlated Rician fading channels. The minimum mean square error (MMSE) channel estimator is adopted to estimate the aggregated RIS channels. Then, we investigate the uplink spectral efficiency (SE) with the maximum ratio (MR) and the local minimum mean squared error (L-MMSE) combining at the APs and obtain the closed-form expression for characterizing the performance of the former. The accuracy of our derived analytical results has been verified by extensive Monte-Carlo simulations. Our results show that increasing the number of RIS elements is always beneficial, but with diminishing returns when the number of RIS elements is sufficiently large. Furthermore, the effect of the number of AP antennas on system performance is more pronounced under a small number of RIS elements, while the spatial correlation of RIS elements imposes a more severe negative impact on the system performance than that of the AP antennas.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Uplink Performance of High-Mobility Cell-Free Massive MIMO-OFDM Systems
Authors:
Jiakang Zheng,
Jiayi Zhang,
Enyu Shi,
**g Jiang,
Bo Ai
Abstract:
High-speed train (HST) communications with orthogonal frequency division multiplexing (OFDM) techniques have received significant attention in recent years. Besides, cell-free (CF) massive multiple-input multiple-output (MIMO) is considered a promising technology to achieve the ultimate performance limit. In this paper, we focus on the performance of CF massive MIMO-OFDM systems with both matched…
▽ More
High-speed train (HST) communications with orthogonal frequency division multiplexing (OFDM) techniques have received significant attention in recent years. Besides, cell-free (CF) massive multiple-input multiple-output (MIMO) is considered a promising technology to achieve the ultimate performance limit. In this paper, we focus on the performance of CF massive MIMO-OFDM systems with both matched filter and large-scale fading decoding (LSFD) receivers in HST communications. HST communications with small cell and cellular massive MIMO-OFDM systems are also analyzed for comparison. Considering the bad effect of Doppler frequency offset (DFO) on system performance, exact closed-form expressions for uplink spectral efficiency (SE) of all systems are derived. According to the simulation results, we find that the CF massive MIMO-OFDM system with LSFD achieves both larger SE and lower SE drop percentages than other systems. In addition, increasing the number of access points (APs) and antennas per AP can effectively compensate for the performance loss from the DFO. Moreover, there is an optimal vertical distance between APs and HST to achieve the maximum SE.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Authors:
Tiezheng Yu,
Rita Frieske,
Peng Xu,
Samuel Cahyawijaya,
Cheuk Tung Shadow Yiu,
Holy Lovenia,
Wenliang Dai,
Elham J. Barezi,
Qifeng Chen,
Xiaojuan Ma,
Bertram E. Shi,
Pascale Fung
Abstract:
Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech…
▽ More
Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It comprises philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and analyze them according to their speech type, data source, total size and availability. We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
△ Less
Submitted 17 January, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Multitask Emotion Recognition with Incomplete Labels
Authors:
Didan Deng,
Zhaokang Chen,
Bertram E. Shi
Abstract:
We train a unified model to perform three tasks: facial action unit detection, expression classification, and valence-arousal estimation. We address two main challenges of learning the three tasks. First, most existing datasets are highly imbalanced. Second, most existing datasets do not contain labels for all three tasks. To tackle the first challenge, we apply data balancing techniques to experi…
▽ More
We train a unified model to perform three tasks: facial action unit detection, expression classification, and valence-arousal estimation. We address two main challenges of learning the three tasks. First, most existing datasets are highly imbalanced. Second, most existing datasets do not contain labels for all three tasks. To tackle the first challenge, we apply data balancing techniques to experimental datasets. To tackle the second challenge, we propose an algorithm for the multitask model to learn from missing (incomplete) labels. This algorithm has two steps. We first train a teacher model to perform all three tasks, where each instance is trained by the ground truth label of its corresponding task. Secondly, we refer to the outputs of the teacher model as the soft labels. We use the soft labels and the ground truth to train the student model. We find that most of the student models outperform their teacher model on all the three tasks. Finally, we use model ensembling to boost performance further on the three tasks.
△ Less
Submitted 10 March, 2020; v1 submitted 10 February, 2020;
originally announced February 2020.
-
Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features
Authors:
Didan Deng,
Yuqian Zhou,
Jimin Pi,
Bertram E. Shi
Abstract:
The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encour…
▽ More
The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encouraged participants to address long-term emotion recognition by integrating cues from multiple modalities, including facial expression, audio and language. Intuitively, a multi-modal inference network should be able to leverage information from each modality and their correlations to improve recognition over that achievable by a single modality network. We describe here a multi-modal neural architecture that integrates visual information over time using an LSTM, and combines it with utterance level audio and text cues to recognize human sentiment from multimodal clips. Our model outperforms the unimodal baseline, achieving the concordance correlation coefficients (CCC) of 0.400 on the arousal task, and 0.353 on the valence task.
△ Less
Submitted 4 May, 2018; v1 submitted 2 May, 2018;
originally announced May 2018.