Search | arXiv e-print repository

arXiv:2406.15222 [pdf]

Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests. △ Less

Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: under peer review

arXiv:2406.06842 [pdf, ps, other]

Aerial Relay to Achieve Covertness and Security

Authors: Jiacheng Jiang, Hongjiang Lei, Ki-Hong Park, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: In this work, a delay-tolerant unmanned aerial vehicle (UAV) relayed covert and secure communication framework is investigated. In this framework, a legitimate UAV serves as an aerial relay to realize communication when the direct link between the terrestrial transmitter and receiver is blocked and also acts as a friendly jammer to suppress the malicious nodes presented on the ground. Subsequently… ▽ More In this work, a delay-tolerant unmanned aerial vehicle (UAV) relayed covert and secure communication framework is investigated. In this framework, a legitimate UAV serves as an aerial relay to realize communication when the direct link between the terrestrial transmitter and receiver is blocked and also acts as a friendly jammer to suppress the malicious nodes presented on the ground. Subsequently, considering the uncertainty of malicious nodes' positions, a robust fractional programming optimization problem is built to maximize energy efficiency by jointly optimizing the trajectory of the UAV, the transmit power of the transmitter, and the time-switching factor. For the extremely complicated covert constraint, Pinsker's inequality, Jensen's inequality, and the bisection search method are employed to construct a tractable shrunken one. After this, an alternate optimization-based algorithm is proposed to solve the fractional programming optimization problem. To achieve low complexity, we design the primal-dual search-based algorithm and the successive convex approximation-based algorithm, respectively, for each sub-problem. Numerical results show the effectiveness of our proposed algorithm. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 12 pages, 6 figures, submitted to IEEE Journal for review

arXiv:2406.01313 [pdf, ps, other]

3D Trajectory Design for Energy-constrained Aerial CRNs Under Probabilistic LoS Channel

Authors: Hongjiang Lei, Xiaqiu Wu, Ki-Hong Park, Gaofeng Pan

Abstract: Unmanned aerial vehicles (UAVs) have been attracting significant attention because there is a high probability of line-of-sight links being obtained between them and terrestrial nodes in high-rise urban areas. In this work, we investigate cognitive radio networks (CRNs) by jointly designing three-dimensional (3D) trajectory, the transmit power of the UAV, and user scheduling. Considering the UAV's… ▽ More Unmanned aerial vehicles (UAVs) have been attracting significant attention because there is a high probability of line-of-sight links being obtained between them and terrestrial nodes in high-rise urban areas. In this work, we investigate cognitive radio networks (CRNs) by jointly designing three-dimensional (3D) trajectory, the transmit power of the UAV, and user scheduling. Considering the UAV's onboard energy consumption, an optimization problem is formulated in which the average achievable rate of the considered system is maximized by jointly optimizing the UAV's 3D trajectory, transmission power, and user scheduling. Due to the non-convex optimization problem, a lower bound on the average achievable rate is utilized to reduce the complexity of the solution. Subsequently, the original optimization problem is decoupled into four subproblems by using block coordinate descent, and each subproblem is transformed into manageable convex optimization problems by introducing slack variables and successive convex approximation. Numerical results validate the effectiveness of our proposed algorithm and demonstrate that the 3D trajectories of UAVs can enhance the average achievable rate of aerial CRNs. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures,submitted to the IEEE journal for review

arXiv:2405.15717 [pdf, other]

Integrated Design for Wave Energy Converter Farms: Assessing Plant, Control, Layout, and Site Selection Coupling in the Presence of Irregular Waves

Authors: Saeed Azad, Suraj Khanal, Daniel R. Herber, Gaofeng Jia

Abstract: A promising direction towards reducing the levelized cost of energy for wave energy converter (WEC) farms is to improve their performance. WEC design studies generally focus on a single design domain (e.g., geometry, control, or layout) to improve the farm's performance under simplifying assumptions, such as regular waves. This strategy, however, has resulted in design recommendations that are imp… ▽ More A promising direction towards reducing the levelized cost of energy for wave energy converter (WEC) farms is to improve their performance. WEC design studies generally focus on a single design domain (e.g., geometry, control, or layout) to improve the farm's performance under simplifying assumptions, such as regular waves. This strategy, however, has resulted in design recommendations that are impractical or limited in scope because WEC farms are complex systems that exhibit strong coupling among geometry, control, and layout domains. In addition, the location of the candidate site, which has a large impact on the performance of the farm, is often overlooked. Motivated by some of the limitations observed in WEC literature, this study uses an integrated design framework, based on simultaneous control co-design (CCD) principles, to discuss the impact of site selection and wave type on WEC farm design. Interactions among plant, control, and layout are also investigated and discussed using a wide range of simulations and optimization studies. All of the studies were conducted using frequency-domain heaving cylinder WEC devices within a farm with a linear reactive controller in the presence of irregular probabilistic waves. The results provide high-level guidelines to help the WEC design community move toward an integrated design perspective. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 12 pages and 7 figures

arXiv:2405.10691 [pdf, other]

LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the develo** brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.08306 [pdf, other]

Flight Path Optimization with Optimal Control Method

Authors: Gaofeng Su, Xi Cheng, Siyuan Feng, Ke Liu, Jilin Song, Jianan Chen, Chen Zhu, Hui Lin

Abstract: This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to d… ▽ More This paper is based on a crucial issue in the aviation world: how to optimize the trajectory and controls given to the aircraft in order to optimize flight time and fuel consumption. This study aims to provide elements of a response to this problem and to define, under certain simplifying assumptions, an optimal response, using Constrained Finite Time Optimal Control(CFTOC). The first step is to define the dynamic model of the aircraft in accordance with the controllable inputs and wind disturbances. Then we will identify a precise objective in terms of optimization and implement an optimization program to solve it under the circumstances of simulated real flight situation. Finally, the optimization result is validated and discussed by different scenarios. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06794 [pdf, other]

Site-dependent Solutions of Wave Energy Converter Farms with Surrogate Models, Control Co-design, and Layout Optimization

Authors: Saeed Azad, Daniel R. Herber, Suraj Khanal, Gaofeng Jia

Abstract: Design of wave energy converter farms entails multiple domains that are coupled, and thus, their concurrent representation and consideration in early-stage design optimization has the potential to offer new insights and promising solutions with improved performance. Concurrent optimization of physical attributes (e.g., plant) and the control system design is often known as control co-design or CCD… ▽ More Design of wave energy converter farms entails multiple domains that are coupled, and thus, their concurrent representation and consideration in early-stage design optimization has the potential to offer new insights and promising solutions with improved performance. Concurrent optimization of physical attributes (e.g., plant) and the control system design is often known as control co-design or CCD. To further improve performance, the layout of the farm must be carefully optimized in order to ensure that constructive effects from hydrodynamic interactions are leveraged, while destructive effects are avoided. The variations in the joint probability distribution of waves, stemming from distinct site locations, affect the farm's performance and can potentially influence decisions regarding optimal plant selection, control strategies, and layout configurations. Therefore, this paper undertakes a concurrent exploration of control co-design and layout optimization for a farm comprising five devices, modeled as heaving cylinders in the frequency domain, situated across four distinct site locations: Alaskan Coasts, East Coast, Pacific Islands, and West Coast. The challenge of efficiently and accurately estimating hydrodynamic coefficients within the optimization loop was mitigated through the application of surrogate modeling and many-body expansion principles. Results indicate the optimized solutions exhibit variations in plant, control, and layout for each candidate site, signifying the importance of system-level design with environmental considerations from the early stages of the design process. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 9 pages, 9 figures

arXiv:2403.12467 [pdf, other]

Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

Authors: Heng Wang, Jianhua Zhang, Gaofeng Nie, Li Yu, Zhiqiang Yuan, Tongjie Li, Jialin Wang, Guangyi Liu

Abstract: Digital twin channel (DTC) is the real-time map** of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design. In this work, we first define five evolution levels of channel twins with the progression of wireless communication. The fifth level, autonomous DTC, is elaborate… ▽ More Digital twin channel (DTC) is the real-time map** of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design. In this work, we first define five evolution levels of channel twins with the progression of wireless communication. The fifth level, autonomous DTC, is elaborated with multi-dimensional factors such as methodology, characterization precision, and data category. Then, we provide detailed insights into the requirements and architecture of a complete DTC for 6G. Subsequently, a sensing-enhanced real-time channel prediction platform and experimental validations are exhibited. Finally, drawing from the vision of the 6G network, we explore the potential applications and the open issues in future DTC research. △ Less

Submitted 31 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 7 pages, 5 figures, 15 references. It is submitted to IEEE journal

arXiv:2402.14099 [pdf, other]

EXACT-Net:EHR-guided lung tumor auto-segmentation for non-small cell lung cancer radiotherapy

Authors: Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Rui Zhang, Quan Chen, Kai Ding

Abstract: Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, which accounts for 87% of diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate. Accurate tumor segmentation is a critical step in the diagnosis and treatment o… ▽ More Lung cancer is a devastating disease with the highest mortality rate among cancer types. Over 60% of non-small cell lung cancer (NSCLC) patients, which accounts for 87% of diagnoses, require radiation therapy. Rapid treatment initiation significantly increases the patient's survival rate and reduces the mortality rate. Accurate tumor segmentation is a critical step in the diagnosis and treatment of NSCLC. Manual segmentation is time and labor-consuming and causes delays in treatment initiation. Although many lung nodule detection methods, including deep learning-based models, have been proposed, there is still a long-standing problem of high false positives (FPs) with most of these methods. Here, we developed an electronic health record (EHR) guided lung tumor auto-segmentation called EXACT-Net (EHR-enhanced eXACtitude in Tumor segmentation), where the extracted information from EHRs using a pre-trained large language model (LLM), was used to remove the FPs and keep the TP nodules only. The auto-segmentation model was trained on NSCLC patients' computed tomography (CT), and the pre-trained LLM was used with the zero-shot learning approach. Our approach resulted in a 250% boost in successful nodule detection using the data from ten NSCLC patients treated in our institution. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2312.10287 [pdf, other]

Towards 6G Digital Twin Channel Using Radio Environment Knowledge Pool

Authors: Jialin Wang, Jianhua Zhang, Yuxiang Zhang, Yutong Sun, Gaofeng, Nie, Lianzheng Shi, ** Zhang, Guangyi Liu

Abstract: The digital twin channel (DTC) is crucial for 6G wireless autonomous networks as it replicates the wireless channel fading states in 6G air interface transmissions. It is well known that the physical environment influences channels. A key task for accurately twinning channels in complex 6G scenarios is establishing precise relationships between the environment and the channels. In this article, th… ▽ More The digital twin channel (DTC) is crucial for 6G wireless autonomous networks as it replicates the wireless channel fading states in 6G air interface transmissions. It is well known that the physical environment influences channels. A key task for accurately twinning channels in complex 6G scenarios is establishing precise relationships between the environment and the channels. In this article, the radio environment knowledge pool (REKP) is proposed, with its core function being to construct and store as much knowledge between the environment and channels as possible. Firstly, the research progress related to DTC is summarized, and a comparative analysis of these achievements on key indicators in digital twin is conducted, proposing the challenges faced in knowledge construction. Secondly, instructions on how to construct and update REKP are given. Then, a typical case is presented to demonstrate the great potential of REKP in enabling DTC. Finally, how to utilize REKP to address open issues in the 6G wireless communication system is discussed, including enhancing performance, reducing costs, and kee** a trustworthy DTC. △ Less

Submitted 26 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2311.06825 [pdf, ps, other]

Secure Rate-Splitting Multiple Access Transmissions in LMS Systems

Authors: Minjue He, Hui Zhao, Xiaqing Miao, Shuai Wang, Gaofeng Pan

Abstract: This letter investigates the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems, considering that the private messages intended by a terminal can be eavesdropped by any others from the broadcast signals. Specifically, the considered system has an N-antenna satellite and numerous single-antenna land users. Maximum ratio transmission (MRT)… ▽ More This letter investigates the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems, considering that the private messages intended by a terminal can be eavesdropped by any others from the broadcast signals. Specifically, the considered system has an N-antenna satellite and numerous single-antenna land users. Maximum ratio transmission (MRT) and matched-filtering (MF) precoding techniques are adopted at the satellite separately for the common messages (CMs) and for the private messages (PMs), which are both implemented based on the estimated LMS channels suffering from the Shadowed-Rician fading. Then, closed-form expressions are derived for the ergodic rates for decoding the CM, and for decoding the PM at the intended user respectively, and more importantly, we also derive the ergodic secrecy rate against eavesdrop**. Finally, numerical results are provided to validate the correctness of the proposed analysis models, as well as to show some interesting comparisons. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 5 pages, 3 figures, 1 table

arXiv:2310.13932 [pdf, ps, other]

Trajectory and Power Design for Aerial Multi-User Covert Communications

Authors: Hongjiang Lei, Jiacheng Jiang, Imran Shafique Ansari, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: Unmanned aerial vehicles (UAVs) can provide wireless access to terrestrial users, regardless of geographical constraints, and will be an important part of future communication systems. In this paper, a multi-user downlink dual-UAVs enabled covert communication system was investigated, in which a UAV transmits secure information to ground users in the presence of multiple wardens as well as a frien… ▽ More Unmanned aerial vehicles (UAVs) can provide wireless access to terrestrial users, regardless of geographical constraints, and will be an important part of future communication systems. In this paper, a multi-user downlink dual-UAVs enabled covert communication system was investigated, in which a UAV transmits secure information to ground users in the presence of multiple wardens as well as a friendly jammer UAV transmits artificial jamming signals to fight with the wardens. The scenario of wardens being outfitted with a single antenna is considered, and the detection error probability (DEP) of wardens with finite observations is researched. Then, considering the uncertainty of wardens' location, a robust optimization problem with worst-case covertness constraint is formulated to maximize the average covert rate by jointly optimizing power allocation and trajectory. To cope with the optimization problem, an algorithm based on successive convex approximation methods is proposed. Thereafter, the results are extended to the case where all the wardens are equipped with multiple antennas. After analyzing the DEP in this scenario, a tractable lower bound of the DEP is obtained by utilizing Pinsker's inequality. Subsequently, the non-convex optimization problem was established and efficiently coped by utilizing a similar algorithm as in the single-antenna scenario. Numerical results indicate the effectiveness of our proposed algorithm. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: 30 pages, 9 figures, submitted to the IEEE journal for review

arXiv:2310.13931 [pdf, ps, other]

Trajectory and power design for aerial CRNs with colluding eavesdroppers

Authors: Hongjiang Lei, Jiacheng Jiang, Haosi Yang, Ki-Hong Park, Imran Shafique Ansari, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: Unmanned aerial vehicles (UAVs) can provide wireless access services to terrestrial users without geographical limitations and will become an essential part of the future communication system. However, the openness of wireless channels and the mobility of UAVs make the security of UAV-based communication systems particularly challenging. This work investigates the security of aerial cognitive radi… ▽ More Unmanned aerial vehicles (UAVs) can provide wireless access services to terrestrial users without geographical limitations and will become an essential part of the future communication system. However, the openness of wireless channels and the mobility of UAVs make the security of UAV-based communication systems particularly challenging. This work investigates the security of aerial cognitive radio networks (CRNs) with multiple uncertainties colluding eavesdroppers. A cognitive aerial base station transmits messages to cognitive terrestrial users using the spectrum resource of the primary users. All secondary terrestrial users and illegitimate receivers jointly decode the received message. The average secrecy rate of the aerial CRNs is maximized by jointly optimizing the UAV's trajectory and transmission power. An iterative algorithm based on block coordinate descent and successive convex approximation is proposed to solve the non-convex mixed-variable optimization problem. Numerical results verify the effectiveness of our proposed algorithm and show that our scheme improves the secrecy performance of airborne CRNs. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: 10 pages, 7 figures.submitted to the IEEE journal for review

arXiv:2310.02550 [pdf, ps, other]

doi 10.1109/TGCN.2023.3309657

Convergence Analysis and Latency Minimization for Semi-Federated Learning in Massive IoT Networks

Authors: Jianyang Ren, Wanli Ni, Hui Tian, Gaofeng Nie

Abstract: As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in d… ▽ More As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in deep neural network and the limited computation and communication capabilities of IoT devices. To address this issue, we propose a semi-federated learning (SemiFL) paradigm in which network pruning and over-the-air computation are efficiently applied. To be specific, each small base station collects the raw data from its served sensors and trains its local pruned model. After that, the global aggregation of local gradients is achieved through over-the-air computation. We first analyze the performance of the proposed SemiFL by deriving its convergence upper bound. To reduce latency, a convergence-constrained SemiFL latency minimization problem is formulated. By decoupling the original problem into several sub-problems, iterative algorithms are designed to solve them efficiently. Finally, numerical simulations are conducted to verify the effectiveness of our proposed scheme in reducing latency and guaranteeing the identification accuracy. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: This paper has been accepted by IEEE Transactions on Green Communications and Networking

arXiv:2308.06547 [pdf, other]

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

Authors: Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Abstract: When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either fi… ▽ More When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

arXiv:2307.02351 [pdf, other]

doi 10.1109/TASLP.2020.2987752

Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture

Authors: Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Abstract: Recently, there has been increasing progress in end-to-end automatic speech recognition (ASR) architecture, which transcribes speech to text without any pre-trained alignments. One popular end-to-end approach is the hybrid Connectionist Temporal Classification (CTC) and attention (CTC/attention) based ASR architecture. However, how to deploy hybrid CTC/attention systems for online speech recogniti… ▽ More Recently, there has been increasing progress in end-to-end automatic speech recognition (ASR) architecture, which transcribes speech to text without any pre-trained alignments. One popular end-to-end approach is the hybrid Connectionist Temporal Classification (CTC) and attention (CTC/attention) based ASR architecture. However, how to deploy hybrid CTC/attention systems for online speech recognition is still a non-trivial problem. This article describes our proposed online hybrid CTC/attention end-to-end ASR architecture, which replaces all the offline components of conventional CTC/attention ASR architecture with their corresponding streaming components. Firstly, we propose stable monotonic chunk-wise attention (sMoChA) to stream the conventional global attention, and further propose monotonic truncated attention (MTA) to simplify sMoChA and solve the training-and-decoding mismatch problem of sMoChA. Secondly, we propose truncated CTC (T-CTC) prefix score to stream CTC prefix score calculation. Thirdly, we design dynamic waiting joint decoding (DWJD) algorithm to dynamically collect the predictions of CTC and attention in an online manner. Finally, we use latency-controlled bidirectional long short-term memory (LC-BLSTM) to stream the widely-used offline bidirectional encoder network. Experiments with LibriSpeech English and HKUST Mandarin tasks demonstrate that, compared with the offline CTC/attention model, our proposed online CTC/attention model improves the real time factor in human-computer interaction services and maintains its performance with moderate degradation. To the best of our knowledge, this is the first work to provide the full-stack online solution for CTC/attention end-to-end ASR architecture. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 28, 2020, Pages 1452 - 1465

arXiv:2302.13222 [pdf, other]

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

Authors: Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Abstract: Selecting application scenarios matching data is important for the automatic speech recognition (ASR) training, but it is difficult to measure the matching degree of the training corpus. This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora. We first use the self-supervised Hubert… ▽ More Selecting application scenarios matching data is important for the automatic speech recognition (ASR) training, but it is difficult to measure the matching degree of the training corpus. This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora. We first use the self-supervised Hubert model to discretize the speech corpora into label sequence and calculate the N-gram probability distribution. Then we calculate the Kullback-Leibler divergence between the N-grams as the SCD. Finally, we can choose the subset which has minimum SCD to the target corpus for annotation and training. Compared to previous data selection method, the SCD data selection method can focus on more acoustic details and guarantee the diversity of the selected set. We evaluate our method on different accents from Common Voice. Experiments show that the proposed SCD data selection can realize 14.8% relative improvements to the random selection, comparable or even superior to the result of supervised selection. △ Less

Submitted 25 February, 2023; originally announced February 2023.

arXiv:2210.06091 [pdf]

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

Authors: Shuhao Deng, Chengfei Li, **feng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Abstract: Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life. The ISCSLP 2022 Chinese-English Code-Switching Automatic Speech Recognition (CSASR) Challenge aims to promote the de… ▽ More Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life. The ISCSLP 2022 Chinese-English Code-Switching Automatic Speech Recognition (CSASR) Challenge aims to promote the development of code-switching automatic speech recognition. The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus and MagicData-RAMC corpus, a development and a test set for participants, which are used for CSASR model training and evaluation. Along with the challenge, we also provide the baseline system performance for reference. As a result, more than 40 teams participated in this challenge, and the winner team achieved 16.70% Mixture Error Rate (MER) performance on the test set and has achieved 9.8% MER absolute improvement compared with the baseline system. In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems. △ Less

Submitted 13 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: accepted by ISCSLP 2022

arXiv:2210.05868 [pdf, ps, other]

On Secure Uplink Transmission in Hybrid RF-FSO Cooperative Satellite-Aerial-Terrestrial Networks

Authors: Yuanyuan Ma, Tiejun Lv, Gaofeng Pan, Yunfei Chen, Mohamed-Slim Alouini

Abstract: This work investigates the secrecy outage performance of the uplink transmission of a radio-frequency (RF)-free-space optical (FSO) hybrid cooperative satellite-aerial-terrestrial network (SATN). Specifically, in the considered cooperative SATN, a terrestrial source (S) transmits its information to a satellite receiver (D) via the help of a cache-enabled aerial relay (R) terminal with the most pop… ▽ More This work investigates the secrecy outage performance of the uplink transmission of a radio-frequency (RF)-free-space optical (FSO) hybrid cooperative satellite-aerial-terrestrial network (SATN). Specifically, in the considered cooperative SATN, a terrestrial source (S) transmits its information to a satellite receiver (D) via the help of a cache-enabled aerial relay (R) terminal with the most popular content caching scheme, while a group of eavesdrop** aerial terminals (Eves) trying to overhear the transmitted confidential information. Moreover, RF and FSO transmissions are employed over S-R and R-D links, respectively. Considering the randomness of R, D, and Eves, and employing a stochastic geometry framework, the secrecy outage performance of the cooperative uplink transmission in the considered SATN is investigated and a closed-form analytical expression for the end-to-end secrecy outage probability is derived. Finally, Monte-Carlo simulations are shown to verify the accuracy of our analysis. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: 14 pages, 9 figures, accepted by IEEE Transactions on Communications

arXiv:2208.08042 [pdf, other]

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Authors: Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

Abstract: The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of "who spe… ▽ More The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person in a conversation is vital to downstream tasks, like natural language processing, machine translation, etc. People refer to the detection technology of "who speak when" as speaker diarization (SD). Traditionally, diarization error rate (DER) has been used as the standard evaluation metric of SD systems for a long time. However, DER fails to give enough importance to short conversational phrases, which are short but important on the semantic level. Also, a carefully and accurately manually-annotated testing dataset suitable for evaluating the conversational SD technologies is still unavailable in the speech community. In this paper, we design and describe the Conversational Short-phrases Speaker Diarization (CSSD) task, which consists of training and testing datasets, evaluation metric and baselines. In the dataset aspect, despite the previously open-sourced 180-hour conversational MagicData-RAMC dataset, we prepare an individual 20-hour conversational speech test dataset with carefully and artificially verified speakers timestamps annotations for the CSSD task. In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level. In the baseline aspect, we adopt a commonly used method: Variational Bayes HMM x-vector system, as the baseline of the CSSD task. Our evaluation metric is publicly available at https://github.com/SpeechClub/CDER_Metric. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.16844

arXiv:2207.02495 [pdf, other]

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies

Authors: Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan

Abstract: There is often a trade-off between performance and latency in streaming automatic speech recognition (ASR). Traditional methods such as look-ahead and chunk-based methods, usually require information from future frames to advance recognition accuracy, which incurs inevitable latency even if the computation is fast enough. A causal model that computes without any future frames can avoid this latenc… ▽ More There is often a trade-off between performance and latency in streaming automatic speech recognition (ASR). Traditional methods such as look-ahead and chunk-based methods, usually require information from future frames to advance recognition accuracy, which incurs inevitable latency even if the computation is fast enough. A causal model that computes without any future frames can avoid this latency, but its performance is significantly worse than traditional methods. In this paper, we propose corresponding revision strategies to improve the causal model. Firstly, we introduce a real-time encoder states revision strategy to modify previous states. Encoder forward computation starts once the data is received and revises the previous encoder states after several frames, which is no need to wait for any right context. Furthermore, a CTC spike position alignment decoding algorithm is designed to reduce time costs brought by the revision strategy. Experiments are all conducted on Librispeech datasets. Fine-tuning on the CTC-based wav2vec2.0 model, our best method can achieve 3.7/9.2 WERs on test-clean/other sets, which is also competitive with the chunk-based methods and the knowledge distillation methods. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: Accepted by Interspeech 2022

arXiv:2206.13760 [pdf, other]

Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization

Authors: Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Abstract: For online speaker diarization, samples arrive incrementally, and the overall distribution of the samples is invisible. Moreover, in most existing clustering-based methods, the training objective of the embedding extractor is not designed specially for clustering. To improve online speaker diarization performance, we propose a unified online clustering framework, which provides an interactive mann… ▽ More For online speaker diarization, samples arrive incrementally, and the overall distribution of the samples is invisible. Moreover, in most existing clustering-based methods, the training objective of the embedding extractor is not designed specially for clustering. To improve online speaker diarization performance, we propose a unified online clustering framework, which provides an interactive manner between embedding extractors and clustering algorithms. Specifically, the framework consists of two highly coupled parts: clustering-guided recurrent training (CGRT) and truncated beam searching clustering (TBSC). The CGRT introduces the clustering algorithm into the training process of embedding extractors, which could provide not only cluster-aware information for the embedding extractor, but also crucial parameters for the clustering process afterward. And with these parameters, which contain preliminary information of the metric space, the TBSC penalizes the probability score of each cluster, in order to output more accurate clustering results in online fashion with low latency. With the above innovations, our proposed online clustering system achieves 14.48\% DER with collar 0.25 at 2.5s latency on the AISHELL-4, while the DER of the offline agglomerative hierarchical clustering is 14.57\%. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted by Interspeech 2022

arXiv:2206.09783 [pdf, other]

Boosting Cross-Domain Speech Recognition with Self-Supervision

Authors: Han Zhu, Gaofeng Cheng, **dong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

Abstract: The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (S… ▽ More The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. However, these self-supervisions also face performance degradation in mismatched domain distributions, which previous work fails to address. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm. On the one hand, we apply continued pre-training and data replay techniques to mitigate the domain mismatch of the SSL pre-trained model. On the other hand, we propose a domain-adaptive fine-tuning approach based on the PL technique with three unique modifications: Firstly, we design a dual-branch PL method to decrease the sensitivity to the erroneous pseudo-labels; Secondly, we devise an uncertainty-aware confidence filtering strategy to improve pseudo-label correctness; Thirdly, we introduce a two-step PL approach to incorporate target domain linguistic knowledge, thus generating more accurate target domain pseudo-labels. Experimental results on various cross-domain scenarios demonstrate that the proposed approach effectively boosts the cross-domain performance and significantly outperforms previous approaches. △ Less

Submitted 30 July, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

arXiv:2206.09102 [pdf, other]

Decoupled Federated Learning for ASR with Non-IID Data

Authors: Han Zhu, **dong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Abstract: Automatic speech recognition (ASR) with federated learning (FL) makes it possible to leverage data from multiple clients without compromising privacy. The quality of FL-based ASR could be measured by recognition performance, communication and computation costs. When data among different clients are not independently and identically distributed (non-IID), the performance could degrade significantly… ▽ More Automatic speech recognition (ASR) with federated learning (FL) makes it possible to leverage data from multiple clients without compromising privacy. The quality of FL-based ASR could be measured by recognition performance, communication and computation costs. When data among different clients are not independently and identically distributed (non-IID), the performance could degrade significantly. In this work, we tackle the non-IID issue in FL-based ASR with personalized FL, which learns personalized models for each client. Concretely, we propose two types of personalized FL approaches for ASR. Firstly, we adapt the personalization layer based FL for ASR, which keeps some layers locally to learn personalization models. Secondly, to reduce the communication and computation costs, we propose decoupled federated learning (DecoupleFL). On one hand, DecoupleFL moves the computation burden to the server, thus decreasing the computation on clients. On the other hand, DecoupleFL communicates secure high-level features instead of model parameters, thus reducing communication cost when models are large. Experiments demonstrate two proposed personalized FL-based ASR approaches could reduce WER by 2.3% - 3.4% compared with FedAvg. Among them, DecoupleFL has only 11.4% communication and 75% computation cost compared with FedAvg, which is also significantly less than the personalization layer based FL. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted by Interspeech 2022

arXiv:2206.05428 [pdf, ps, other]

doi 10.1109/TVT.2022.3182507

Effect of Strong Time-Varying Transmission Distance on LEO Satellite-Terrestrial Deliveries

Authors: Yuanyuan Ma, Tiejun Lv, Tingting Li, Gaofeng Pan, Yunfei Chen, Mohamed-Slim Alouini

Abstract: In this paper, we investigate the effect of the strong time-varying transmission distance on the performance of the low-earth orbit (LEO) satellite-terrestrial transmission (STT) system. We propose a new analytical framework using finite-state Markov channel (FSMC) model and time discretization method. Moreover, to demonstrate the applications of the proposed framework, the performances of two ada… ▽ More In this paper, we investigate the effect of the strong time-varying transmission distance on the performance of the low-earth orbit (LEO) satellite-terrestrial transmission (STT) system. We propose a new analytical framework using finite-state Markov channel (FSMC) model and time discretization method. Moreover, to demonstrate the applications of the proposed framework, the performances of two adaptive transmissions, rate-adaptive transmission (RAT) and power-adaptive transmission (PAT) schemes, are evaluated for the cases when the transmit power or the transmission rate at the LEO satellite is fixed. Closed-form expressions for the throughput, energy efficiency (EE), and delay outage rate (DOR) of the considered systems are derived and verified, which are capable of addressing the capacity, energy efficiency, and outage rate performance of the considered LEO STT scenarios with the proposed analytical framework. △ Less

Submitted 11 June, 2022; originally announced June 2022.

Comments: 13 pages, 10 figures, Accepted by IEEE Transactions on Vehicular Technology

arXiv:2203.16844 [pdf, ps, other]

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

Authors: Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui **, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

Abstract: This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels,… ▽ More This paper introduces a high-quality rich annotated Mandarin conversational (RAMC) speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in MagicData-RAMC are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided. As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc. We also conduct several relevant tasks and provide experimental results to help evaluate the dataset. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: Paper on submission to Interspeech2022

arXiv:2203.09294 [pdf, other]

A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

Authors: Shi Guo, Xi Yang, Jianqi Ma, Gaofeng Ren, Lei Zhang

Abstract: Denoising and demosaicking are two essential steps to reconstruct a clean full-color image from the raw data. Recently, joint denoising and demosaicking (JDD) for burst images, namely JDD-B, has attracted much attention by using multiple raw images captured in a short time to reconstruct a single high-quality image. One key challenge of JDD-B lies in the robust alignment of image frames. State-of-… ▽ More Denoising and demosaicking are two essential steps to reconstruct a clean full-color image from the raw data. Recently, joint denoising and demosaicking (JDD) for burst images, namely JDD-B, has attracted much attention by using multiple raw images captured in a short time to reconstruct a single high-quality image. One key challenge of JDD-B lies in the robust alignment of image frames. State-of-the-art alignment methods in feature domain cannot effectively utilize the temporal information of burst images, where large shifts commonly exist due to camera and object motion. In addition, the higher resolution (e.g., 4K) of modern imaging devices results in larger displacement between frames. To address these challenges, we design a differentiable two-stage alignment scheme sequentially in patch and pixel level for effective JDD-B. The input burst images are firstly aligned in the patch level by using a differentiable progressive block matching method, which can estimate the offset between distant frames with small computational cost. Then we perform implicit pixel-wise alignment in full-resolution feature domain to refine the alignment results. The two stages are jointly trained in an end-to-end manner. Extensive experiments demonstrate the significant improvement of our method over existing JDD-B methods. Codes are available at https://github.com/GuoShi28/2StageAlign. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition 2022

arXiv:2203.03582 [pdf, other]

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Authors: Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang

Abstract: Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this is… ▽ More Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this issue, we propose two knowledge transferring methods that leverage pre-trained LMs, such as BERT and GPT2, to improve CTC-based models. The first method is based on representation learning, in which the CTC-based models use the representation produced by BERT as an auxiliary learning target. The second method is based on joint classification learning, which combines GPT2 for text modeling with a hybrid CTC/attention architecture. Experiment on AISHELL-1 corpus yields a character error rate (CER) of 4.2% on the test set. When compared to the vanilla CTC-based models fine-tuned from the wav2vec2.0 models, our knowledge transferring method reduces CER by 16.1% relatively without external LMs. △ Less

Submitted 22 February, 2022; originally announced March 2022.

Comments: ICASSP 2022

arXiv:2201.10103 [pdf, other]

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

Authors: Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang

Abstract: While Transformers have achieved promising results in end-to-end (E2E) automatic speech recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up the decoding process. For real-world deployment, ASR systems are desired to be highly accurate while achieving fast inference. Non-autoregressive (NAR) models have become a popular alternative due to their fast inference… ▽ More While Transformers have achieved promising results in end-to-end (E2E) automatic speech recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up the decoding process. For real-world deployment, ASR systems are desired to be highly accurate while achieving fast inference. Non-autoregressive (NAR) models have become a popular alternative due to their fast inference speed, but they still fall behind AR systems in recognition accuracy. To fulfill the two demands, in this paper, we propose a NAR CTC/attention model utilizing both pre-trained acoustic and language models: wav2vec2.0 and BERT. To bridge the modality gap between speech and text representations obtained from the pre-trained models, we design a novel modality conversion mechanism, which is more suitable for logographic languages. During inference, we employ a CTC branch to generate a target length, which enables the BERT to predict tokens in parallel. We also design a cache-based CTC/attention joint decoding method to improve the recognition accuracy while kee** the decoding speed fast. Experimental results show that the proposed NAR model greatly outperforms our strong wav2vec2.0 CTC baseline (15.1% relative CER reduction on AISHELL-1). The proposed NAR model significantly surpasses previous NAR systems on the AISHELL-1 benchmark and shows a potential for English tasks. △ Less

Submitted 26 January, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: Accepted by ICASSP2022

arXiv:2201.03063 [pdf, other]

Low Earth Orbit Satellite Security and Reliability: Issues, Solutions, and the Road Ahead

Authors: **yue Yue, Jian** An, Jiankang Zhang, Jia Ye, Gaofeng Pan, Shuai Wang, Pei Xiao, Lajos Hanzo

Abstract: Low Earth Orbit (LEO) satellites undergo a period of rapid development driven by ever-increasing user demands, reduced costs, and technological progress. Since there is a paucity of literature on the security and reliability issues of LEO Satellite Communication Systems (SCSs), we aim to fill this knowledge gap. Specifically, we critically appraise the inherent characteristics of LEO SCSs and elab… ▽ More Low Earth Orbit (LEO) satellites undergo a period of rapid development driven by ever-increasing user demands, reduced costs, and technological progress. Since there is a paucity of literature on the security and reliability issues of LEO Satellite Communication Systems (SCSs), we aim to fill this knowledge gap. Specifically, we critically appraise the inherent characteristics of LEO SCSs and elaborate on their security and reliability requirements. In light of this, we further discuss their vulnerabilities, including potential security attacks launched against them and reliability risks, followed by outlining the associated lessons learned. Subsequently, we discuss the corresponding security and reliability enhancement solutions, unveil a range of trade-offs, and summarize the lessons gleaned. Furthermore, we shed light on several promising future research directions for enhancing the security and reliability of LEO SCSs, such as integrated sensing and communication, computer vision aided communications, as well as challenges brought about by mega-constellation and commercialization. Finally, we summarize the lessons inferred and crystallize the take-away messages in our design guidelines. △ Less

Submitted 18 July, 2023; v1 submitted 9 January, 2022; originally announced January 2022.

arXiv:2112.12522 [pdf, other]

Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

Authors: Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang

Abstract: Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR performance by pre-training the model with additional unlabeled speech and the SSL pre-trained model has achieved the state-of-the-art result on several speech benchm… ▽ More Automatic speech recognition (ASR) has shown rapid advances in recent years but still degrades significantly in far-field and noisy environments. The recent development of self-supervised learning (SSL) technology can improve the ASR performance by pre-training the model with additional unlabeled speech and the SSL pre-trained model has achieved the state-of-the-art result on several speech benchmarks. Nevertheless, most of the previous SSL methods ignore the influence of the background noise or reverberation, which is crucial to deploying ASR systems in real-world speech applications. This study addresses the robust ASR by introducing a multi-variant consistency (MVC) based SSL method that adapts to different environments. The MVC-SSL is a robust SSL pre-training method designed for noisy and distant-talking speech in real-world applications. Compared to the previous SSL method, the MVC-SSL can calculate the contrastive loss among audios from different acoustic conditions or channels and can learn invariant representations with the change in the environment or the recording equipment. We also explore different SSL training pipelines to balance the noisy distant-talking speech and extra high resource clean speech. We evaluate the proposed method on the commercially-motivated dataset, CHiME-4, and the meeting dataset, AMI. With the help of the MVC-SSL and appropriate training pipeline, we can achieve up to 30% relative word error rate reductions over the baseline wav2vec2.0, one of the most successful SSL methods for ASR. △ Less

Submitted 4 May, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: 6 pages, 3 figures

arXiv:2110.04484 [pdf, other]

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

Authors: Han Zhu, Li Wang, **dong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Abstract: Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks. Although it enlarges the scope of its application, the capacity of the pre-trained model is not fully utilized for the ASR task, and the learned representation… ▽ More Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks. Although it enlarges the scope of its application, the capacity of the pre-trained model is not fully utilized for the ASR task, and the learned representations may not be optimal for ASR. In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR. Experiments show that compared to wav2vec 2.0, wav2vec-S only requires a marginal increment of pre-training time but could significantly improve ASR performance on in-domain, cross-domain and cross-lingual datasets. Average relative WER reductions are 24.5% and 6.6% for 1h and 10h fine-tuning, respectively. Furthermore, we show that semi-supervised pre-training could close the representation gap between the self-supervised pre-trained model and the corresponding fine-tuned model through canonical correlation analysis. △ Less

Submitted 17 June, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

Comments: Accepted by Interspeech 2022

arXiv:2108.03799 [pdf, other]

COVID-view: Diagnosis of COVID-19 using Chest CT

Authors: Shreeraj Jadhav, Gaofeng Deng, Marlene Zawin, Arie E. Kaufman

Abstract: Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from ches… ▽ More Significant work has been done towards deep learning (DL) models for automatic lung and lesion segmentation and classification of COVID-19 on chest CT data. However, comprehensive visualization systems focused on supporting the dual visual+DL diagnosis of COVID-19 are non-existent. We present COVID-view, a visualization application specially tailored for radiologists to diagnose COVID-19 from chest CT data. The system incorporates a complete pipeline of automatic lungs segmentation, localization/ isolation of lung abnormalities, followed by visualization, visual and DL analysis, and measurement/quantification tools. Our system combines the traditional 2D workflow of radiologists with newer 2D and 3D visualization techniques with DL support for a more comprehensive diagnosis. COVID-view incorporates a novel DL model for classifying the patients into positive/negative COVID-19 cases, which acts as a reading aid for the radiologist using COVID-view and provides the attention heatmap as an explainable DL for the model output. We designed and evaluated COVID-view through suggestions, close feedback and conducting case studies of real-world patient data by expert radiologists who have substantial experience diagnosing chest CT scans for COVID-19, pulmonary embolism, and other forms of lung infections. We present requirements and task analysis for the diagnosis of COVID-19 that motivate our design choices and results in a practical system which is capable of handling real-world patient cases. △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: 11 pages, 10 figures, accepted to IEEE VIS 2021 conference and IEEE Transactions on Visualization and Computer Graphics

arXiv:2107.07907 [pdf, other]

Lightness Modulated Deep Inverse Tone Map**

Authors: Kanglin Liu, Gaofeng Cao, Jiang Duan, Guo** Qiu

Abstract: Single-image HDR reconstruction or inverse tone map** (iTM) is a challenging task. In particular, recovering information in over-exposed regions is extremely difficult because details in such regions are almost completely lost. In this paper, we present a deep learning based iTM method that takes advantage of the feature extraction and map** power of deep convolutional neural networks (CNNs) a… ▽ More Single-image HDR reconstruction or inverse tone map** (iTM) is a challenging task. In particular, recovering information in over-exposed regions is extremely difficult because details in such regions are almost completely lost. In this paper, we present a deep learning based iTM method that takes advantage of the feature extraction and map** power of deep convolutional neural networks (CNNs) and uses a lightness prior to modulate the CNN to better exploit observations in the surrounding areas of the over-exposed regions to enhance the quality of HDR image reconstruction. Specifically, we introduce a Hierarchical Synthesis Network (HiSN) for inferring a HDR image from a LDR input and a Lightness Adpative Modulation Network (LAMN) to incorporate the the lightness prior knowledge in the inferring process. The HiSN hierarchically synthesizes the high-brightness component and the low-brightness component of the HDR image whilst the LAMN uses a lightness adaptive mask that separates detail-less saturated bright pixels from well-exposed lower light pixels to enable HiSN to better infer the missing information, particularly in the difficult over-exposed detail-less areas. We present experimental results to demonstrate the effectiveness of the new technique based on quantitative measures and visual comparisons. In addition, we present ablation studies of HiSN and visualization of the activation maps inside LAMN to help gain a deeper understanding of the internal working of the new iTM algorithm and explain why it can achieve much improved performance over state-of-the-art algorithms. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: 11 pages, 10 figures

arXiv:2106.15773 [pdf, other]

doi 10.1109/JIOT.2021.3091849

Online Offloading Scheduling for NOMA-Aided MEC Under Partial Device Knowledge

Authors: Meihui Hua, Hui Tian, Xinchen Lyu, Wanli Ni, Gaofeng Nie

Abstract: By exploiting the superiority of non-orthogonal multiple access (NOMA), NOMA-aided mobile edge computing (MEC) can provide scalable and low-latency computing services for the Internet of Things. However, given the prevalent stochasticity of wireless networks and sophisticated signal processing of NOMA, it is critical but challenging to design an efficient task offloading algorithm for NOMA-aided M… ▽ More By exploiting the superiority of non-orthogonal multiple access (NOMA), NOMA-aided mobile edge computing (MEC) can provide scalable and low-latency computing services for the Internet of Things. However, given the prevalent stochasticity of wireless networks and sophisticated signal processing of NOMA, it is critical but challenging to design an efficient task offloading algorithm for NOMA-aided MEC, especially under a large number of devices. This paper presents an online algorithm that jointly optimizes offloading decisions and resource allocation to maximize the long-term system utility (i.e., a measure of throughput and fairness). Since the optimization variables are temporary coupled, we first apply Lyapunov technique to decouple the long-term stochastic optimization into a series of per-slot deterministic subproblems, which does not require any prior knowledge of network dynamics. Second, we propose to transform the non-convex per-slot subproblem of optimizing NOMA power allocation equivalently to a convex form by introducing a set of auxiliary variables, whereby the time-complexity is reduced from the exponential complexity to $\mathcal{O} (M^{3/2})$. The proposed algorithm is proved to be asymptotically optimal, even under partial knowledge of the device states at the base station. Simulation results validate the superiority of the proposed algorithm in terms of system utility, stability improvement, and the overhead reduction. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: 15 pages, 6 figures. Accepted for publication in IEEE Internet of Things Journal

arXiv:2104.09177 [pdf, ps, other]

Research on Resource Allocation for Efficient Federated Learning

Authors: Jianyang Ren, Wanli Ni, Gaofeng Nie, Hui Tian

Abstract: As a promising solution to achieve efficient learning among isolated data owners and solve data privacy issues, federated learning is receiving wide attention. Using the edge server as an intermediary can effectively collect sensor data, perform local model training, and upload model parameters for global aggregation. So this paper proposes a new framework for resource allocation in a hierarchical… ▽ More As a promising solution to achieve efficient learning among isolated data owners and solve data privacy issues, federated learning is receiving wide attention. Using the edge server as an intermediary can effectively collect sensor data, perform local model training, and upload model parameters for global aggregation. So this paper proposes a new framework for resource allocation in a hierarchical network supported by edge computing. In this framework, we minimize the weighted sum of system cost and learning cost by optimizing bandwidth, computing frequency, power allocation and subcarrier assignment. To solve this challenging mixed-integer non-linear problem, we first decouple the bandwidth optimization problem(P1) from the whole problem and obtain a closed-form solution. The remaining computational frequency, power, and subcarrier joint optimization problem(P2) can be further decomposed into two sub-problems: latency and computational frequency optimization problem(P3) and transmission power and subcarrier optimization problem(P4). P3 is a convex optimization problem that is easy to solve. In the joint optimization problem(P4), the optimal power under each subcarrier selection can be obtained first through the successive convex approximation(SCA) algorithm. Substituting the optimal power value obtained back to P4, the subproblem can be regarded as an assignment problem, so the Hungarian algorithm can be effectively used to solve it. The solution of problem P2 is accomplished by solving P3 and P4 iteratively. To verify the performance of the algorithm, we compare the proposed algorithm with five algorithms; namely Equal bandwidth allocation, Learning cost guaranteed, Greedy subcarrier allocation, System cost guaranteed and Time-biased algorithm. Numerical results show the significant performance gain and the robustness of the proposed algorithm in the face of parameter changes. △ Less

Submitted 12 September, 2022; v1 submitted 19 April, 2021; originally announced April 2021.

Comments: 14 pages, 13 figures

arXiv:2010.12875 [pdf, ps, other]

Stochastic Analysis of Cooperative Satellite-UAV Communications

Authors: Yu Tian, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: In this paper, a dual-hop cooperative satellite-unmanned aerial vehicle (UAV) communication system including a satellite (S), a group of cluster headers (CHs), which are respectively with a group of uniformly distributed UAVs, is considered. Specifically, these CHs serve as aerial decode-and-forward relays to forward the information transmitted by S to UAVs. Moreover, free-space optical (FSO) and… ▽ More In this paper, a dual-hop cooperative satellite-unmanned aerial vehicle (UAV) communication system including a satellite (S), a group of cluster headers (CHs), which are respectively with a group of uniformly distributed UAVs, is considered. Specifically, these CHs serve as aerial decode-and-forward relays to forward the information transmitted by S to UAVs. Moreover, free-space optical (FSO) and radio frequency (RF) technologies are respectively adopted over S-CH and CH-UAV links to exploit FSO's high directivity over long-distance transmission and RF's omnidirectional coverage ability. The positions of the CHs in the 3-dimensional space follow the Matérn hard-core point processes type-II in which each CH can not be closer to any other ones than a predefined distance. Three different cases over CH-UAV links are considered during the performance modeling: interference-free, interference-dominated, and interference-and-noise cases. Then, the coverage performance of S-CH link and the CH-UAV links under three cases is studied and the closed-form analytical expressions of the coverage probability (CP) over both links are derived. Also, the asymptotic expressions for the CP over S-CH link and CH-UAV link in interference-free case are derived. Finally, numerical results are provided to validate our proposed analytical models and thus some meaningful conclusions are achieved. △ Less

Submitted 18 March, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

arXiv:2009.07221 [pdf, ps, other]

On NOMA-Based mmWave Communications

Authors: Yu Tian, Gaofeng Pan, Mohamed-Slim

Abstract: Non-orthogonal multiple access (NOMA) and millimeter-wave (mmWave) communication are two promising techniques to increase the system capacity in the fifth-generation (5G) mobile network. The former can achieve high spectral efficiency by modulating the information in power domain and the latter can provide extremely large spectrum resources. Fluctuating two-ray (FTR) channel model has already been… ▽ More Non-orthogonal multiple access (NOMA) and millimeter-wave (mmWave) communication are two promising techniques to increase the system capacity in the fifth-generation (5G) mobile network. The former can achieve high spectral efficiency by modulating the information in power domain and the latter can provide extremely large spectrum resources. Fluctuating two-ray (FTR) channel model has already been proved to accurately agree with the small-scale fading effects in mmWave communications in experiments. In this paper, the performance of NOMA-based communications over FTR channels in mmWave communication systems is investigated in terms of outage probability (OP) and ergodic capacity (EC). Specifically, we consider the scenario that one base station (BS) transmits signals to two users simultaneously under NOMA scheme. The BS and users are all equipped with a single antenna. Two power allocation strategies are considered: the first one is a general (fixed) power allocation scheme under which we derive the OP and EC of NOMA users in closed form; the other one is an optimal power allocation scheme that can achieve the maximum sum rate for the whole system. Under the second scheme, not only the closed-form OP and EC but also the upper and lower bounds of EC are derived. Furthermore, we also derive the asymptotic expression for the OP in high average SNR region to investigate the diversity order under these two schemes. Finally, we show the correctness and accuracy of our derived expressions by Monte-Carlo simulation. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2006.11854 [pdf, other]

Performance Analysis and Optimization of Cooperative Satellite-Aerial-Terrestrial Systems

Authors: Gaofeng Pan, Jia Ye, Yongqiang Zhang, Mohamed-Slim Alouini

Abstract: Aerial relays have been regarded as an alternative and promising solution to extend and improve satellite-terrestrial communications, as the probability of line-of-sight transmissions increases compared with adopting terrestrial relays. In this paper, a cooperative satellite-aerial-terrestrial system including a satellite transmitter (S), a group of terrestrial receivers (D), and an aerial relay (… ▽ More Aerial relays have been regarded as an alternative and promising solution to extend and improve satellite-terrestrial communications, as the probability of line-of-sight transmissions increases compared with adopting terrestrial relays. In this paper, a cooperative satellite-aerial-terrestrial system including a satellite transmitter (S), a group of terrestrial receivers (D), and an aerial relay (R) is considered. Specifically, considering the randomness of S and D and employing stochastic geometry, the coverage probability of R-D links in non-interference and interference scenarios is studied, and the outage performance of S-R link is investigated by deriving an approximated expression for the outage probability. Moreover, an optimization problem in terms of the transmit power and the transmission time over S-R and R-D links is formulated and solved to obtain the optimal end-to-end energy efficiency for the considered system. Finally, some numerical results are provided to validate our proposed analysis models, as well as to study the optimal energy efficiency performance of the considered system. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 15 pages, 17 figures

arXiv:2006.11850 [pdf, other]

doi 10.1109/TWC.2020.3002230

On the Secrecy of UAV Systems With Linear Trajectory

Authors: Gaofeng Pan, Hongjiang Lei, Jian** An, Shuo Zhang, Mohamed-Slim Alouini

Abstract: By observing the fact that moving in a straight line is a common flying behavior of unmanned aerial vehicles (UAVs) in normal applications, e.g., power line inspections, and air patrols along with highway/streets/borders, in this paper we investigate the secrecy outage performance of a UAV system with linear trajectory, where a UAV ($S$) flies in a straight line and transmits its information over… ▽ More By observing the fact that moving in a straight line is a common flying behavior of unmanned aerial vehicles (UAVs) in normal applications, e.g., power line inspections, and air patrols along with highway/streets/borders, in this paper we investigate the secrecy outage performance of a UAV system with linear trajectory, where a UAV ($S$) flies in a straight line and transmits its information over the downlink to a legitimate receiver ($D$) on the ground while an eavesdrop** UAV ($E$) trying to overhear the information delivery between $S$ and $D$. Meanwhile, some information is delivered to $S$ over the uplink from $D$, such as commanding messages to control $S$'s detecting operations, which can also be eavesdropped by $E$. The locations of $S$, $D$, and $E$ are randomly distributed. We first characterize the statistical characteristics (including cumulative distribution functions and probability density function) of the received signal-to-noise ratio over both downlink and uplink, and then the closed-form analytical expressions for the lower boundary of the secrecy outage probability of both downlink and uplink have also been derived accordingly. Finally, Monte-Carlo simulations are given to testify our proposed analytical models. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 27 pages, 13 figures

arXiv:2006.05782 [pdf, ps, other]

Applying Deep-Learning-Based Computer Vision to Wireless Communications: Methodologies, Opportunities, and Challenges

Authors: Yu Tian, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: Deep learning (DL) has seen great success in the computer vision (CV) field, and related techniques have been used in security, healthcare, remote sensing, and many other fields. As a parallel development, visual data has become universal in daily life, easily generated by ubiquitous low-cost cameras. Therefore, exploring DL-based CV may yield useful information about objects, such as their number… ▽ More Deep learning (DL) has seen great success in the computer vision (CV) field, and related techniques have been used in security, healthcare, remote sensing, and many other fields. As a parallel development, visual data has become universal in daily life, easily generated by ubiquitous low-cost cameras. Therefore, exploring DL-based CV may yield useful information about objects, such as their number, locations, distribution, motion, etc. Intuitively, DL-based CV can also facilitate and improve the designs of wireless communications, especially in dynamic network scenarios. However, so far, such work is rare in the literature. The primary purpose of this article, then, is to introduce ideas about applying DL-based CV in wireless communications to bring some novel degrees of freedom to both theoretical research and engineering applications. To illustrate how DL-based CV can be applied in wireless communications, an example of using a DL-based CV with a millimeter-wave (mmWave) system is given to realize optimal mmWave multiple-input and multiple-output (MIMO) beamforming in mobile scenarios. In this example, we propose a framework to predict future beam indices from previously observed beam indices and images of street views using ResNet, 3-dimensional ResNext, and a long short-term memory network. The experimental results show that our frameworks achieve much higher accuracy than the baseline method, and that visual data can significantly improve the performance of the MIMO beamforming system. Finally, we discuss the opportunities and challenges of applying DL-based CV in wireless communications. △ Less

Submitted 2 December, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

arXiv:2005.12561 [pdf, other]

When Full-Duplex Transmission Meets Intelligent Reflecting Surface: Opportunities and Challenges

Authors: Gaofeng Pan, Jia Ye, Jian** An, Mohamed-Slim Alouini

Abstract: Full-duplex (FD) transmission has already been regarded and developed as a promising method to improve the utilization efficiency of the limited spectrum resource, as transmitting and receiving are allowed to simultaneously occur on the same frequency band. Nowadays, benefiting from the recent development of intelligent reflecting surface (IRS), some unique electromagnetic (EM) functionalities, li… ▽ More Full-duplex (FD) transmission has already been regarded and developed as a promising method to improve the utilization efficiency of the limited spectrum resource, as transmitting and receiving are allowed to simultaneously occur on the same frequency band. Nowadays, benefiting from the recent development of intelligent reflecting surface (IRS), some unique electromagnetic (EM) functionalities, like wavefront sha**, focusing, anomalous reflection, absorption, frequency shifting, and nonreciprocity can be realized by soft-controlled elements at the IRS, showing the capability of reconfiguring the wireless propagation environment with no hardware cost and nearly zero energy consumption. To jointly exploit the virtues of both FD transmission and IRS, in this article we first introduce several EM functionalities of IRS that are profitable for FD transmission; then, some designs of FD-enabled IRS systems are proposed and discussed, followed by numerical results to demonstrate the obtained benefits. Finally, the challenges and open problems of realizing FD-enabled IRS systems are outlined and elaborated upon. △ Less

Submitted 26 May, 2020; originally announced May 2020.

arXiv:2005.00832 [pdf, other]

Flying Car Transportation System: Advances, Techniques, and Challenges

Authors: Gaofeng Pan, Mohamed-Slim Alouini

Abstract: Since the development of transport systems, humans have exploited ground-level, below-ground, and high-altitude spaces for transportation purposes. However, with the increasing burden of expanding populations and rapid urbanization in recent decades, public transportation systems and freight traffic are suffering huge pressure, plaguing local governments and straining economies. Engineers and rese… ▽ More Since the development of transport systems, humans have exploited ground-level, below-ground, and high-altitude spaces for transportation purposes. However, with the increasing burden of expanding populations and rapid urbanization in recent decades, public transportation systems and freight traffic are suffering huge pressure, plaguing local governments and straining economies. Engineers and researchers have started to re-examine, propose, and develop the underused near-ground spaces (NGS) for transportation purposes. For instance, flying cars, which are not a totally novel idea, aim at solving the traffic congestion problem and releasing the strains on existing city transport networks by utilizing unoccupied NGS. Flying cars differ from traditional grounded transportation systems that are entirely limited by their physical space, such as trains on tracks or automobiles on roads. Flying cars do not occupy or compete for high-altitude spaces used by air traffic for long-distance transfer. However, there is a clear lack of specific literature on flying cars and flying car transportation systems (FCTS), which this paper aims to address by describing modern advances, techniques, and challenges of FCTS. We explore the inherent nature of NGS transportation and devise useful proposals to facilitate the construction and commercialization of FCTS. We begin with an introduction to the increasing need for NGS transportation and we address the advantages of using flying cars. Next, we present a brief overview of the history of the development of flying cars in terms of the historic timeline and technique development. Then, we discuss and compare the state of the art in the design of flying cars, including the take-off \& landing (TOL) modes, pilot modes, operation modes, and power types, ... △ Less

Submitted 2 May, 2020; originally announced May 2020.

Comments: 15 pages, 15 figures

arXiv:2001.08290 [pdf, other]

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Authors: Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan

Abstract: Recently, Transformer has gained success in automatic speech recognition (ASR) field. However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online speech recognition. In this paper, we propose the Transformer-based online CTC/attention E2E ASR architecture, which contains the chunk self-attention encoder (chunk-SAE) and the monotonic truncated attention (MTA) based se… ▽ More Recently, Transformer has gained success in automatic speech recognition (ASR) field. However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online speech recognition. In this paper, we propose the Transformer-based online CTC/attention E2E ASR architecture, which contains the chunk self-attention encoder (chunk-SAE) and the monotonic truncated attention (MTA) based self-attention decoder (SAD). Firstly, the chunk-SAE splits the speech into isolated chunks. To reduce the computational cost and improve the performance, we propose the state reuse chunk-SAE. Sencondly, the MTA based SAD truncates the speech features monotonically and performs attention on the truncated features. To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture. We evaluate the proposed online models on the HKUST Mandarin ASR benchmark and achieve a 23.66% character error rate (CER) with a 320 ms latency. Our online model yields as little as 0.19% absolute CER degradation compared with the offline baseline, and achieves significant improvement over our prior work on Long Short-Term Memory (LSTM) based online E2E models. △ Less

Submitted 11 February, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: Accepted by ICASSP 2020

arXiv:1912.11613 [pdf, other]

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

Authors: Lu Huang, Gaofeng Cheng, Pengyuan Zhang, Yi Yang, Shumin Xu, Jiasong Sun

Abstract: Utterance-level permutation invariant training (uPIT) has achieved promising progress on single-channel multi-talker speech separation task. Long short-term memory (LSTM) and bidirectional LSTM (BLSTM) are widely used as the separation networks of uPIT, i.e. uPIT-LSTM and uPIT-BLSTM. uPIT-LSTM has lower latency but worse performance, while uPIT-BLSTM has better performance but higher latency. In t… ▽ More Utterance-level permutation invariant training (uPIT) has achieved promising progress on single-channel multi-talker speech separation task. Long short-term memory (LSTM) and bidirectional LSTM (BLSTM) are widely used as the separation networks of uPIT, i.e. uPIT-LSTM and uPIT-BLSTM. uPIT-LSTM has lower latency but worse performance, while uPIT-BLSTM has better performance but higher latency. In this paper, we propose using latency-controlled BLSTM (LC-BLSTM) during inference to fulfill low-latency and good-performance speech separation. To find a better training strategy for BLSTM-based separation network, chunk-level PIT (cPIT) and uPIT are compared. The experimental results show that uPIT outperforms cPIT when LC-BLSTM is used during inference. It is also found that the inter-chunk speaker tracing (ST) can further improve the separation performance of uPIT-LC-BLSTM. Evaluated on the WSJ0 two-talker mixed-speech separation task, the absolute gap of signal-to-distortion ratio (SDR) between uPIT-BLSTM and uPIT-LC-BLSTM is reduced to within 0.7 dB. △ Less

Submitted 25 December, 2019; originally announced December 2019.

Comments: Proceedings of APSIPA Annual Summit and Conference 2019, 18-21 November 2019, Lanzhou, China

arXiv:1905.12396 [pdf, ps, other]

doi 10.1049/el.2019.1104

Secrecy Outage Analysis Over Fluctuating Two-Ray Fading Channels

Authors: Hui Zhao, Liang Yang, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: In this letter, we analyze the secrecy outage probability (SOP) over fluctuating two-ray fading channels but with a different definition from the one adopted in [5]. Following the new defined SOP, we derive an analytical closed-form expression for our proposed SOP, as well as an asymptotic formula valid in the high signal-to-noise ratio region of the source to destination link. In the numerical re… ▽ More In this letter, we analyze the secrecy outage probability (SOP) over fluctuating two-ray fading channels but with a different definition from the one adopted in [5]. Following the new defined SOP, we derive an analytical closed-form expression for our proposed SOP, as well as an asymptotic formula valid in the high signal-to-noise ratio region of the source to destination link. In the numerical results section, we perform some Monte-Carlo simulations to validate the accuracy of our derived expressions, and also present the probability gap between our proposed SOP and the SOP in [5]. △ Less

Submitted 29 May, 2019; originally announced May 2019.

Comments: 2 Figures, 2 Pages

arXiv:1905.07589 [pdf, ps, other]

doi 10.1007/s11432-019-9892-y

Secure Analysis Over Generalized-K Channels

Authors: Luyao Zhang, Hui Zhao, Gaofeng Pan, Liang Yang, Jiawei Chen

Abstract: In this letter, we adopt the SOP definition in [4] and the simplified model of [8], and derive a closed-form expression for the proposed SOP over GK fading channels. To simplify this expression and obtain additional insights, we also perform an asymptotic analysis of the main link in the high SNR region. In this letter, we adopt the SOP definition in [4] and the simplified model of [8], and derive a closed-form expression for the proposed SOP over GK fading channels. To simplify this expression and obtain additional insights, we also perform an asymptotic analysis of the main link in the high SNR region. △ Less

Submitted 18 May, 2019; originally announced May 2019.

Comments: 1 figure, 3 pages

arXiv:1904.06168 [pdf, ps, other]

doi 10.1109/LWC.2019.2910530

Secure mmWave Communications in Cognitive Radio Networks

Authors: Hui Zhao, Jiayi Zhang, Liang Yang, Gaofeng Pan, Mohamed-Slim Alouini

Abstract: In this letter, the secrecy performance in cognitive radio networks (CRNs) over fluctuating two-ray (FTR) channels, which is used to model the millimetre wave channel, is investigated in terms of the secrecy outage probability (SOP). Specifically, we consider the case where a source (S) transmits confidential messages to a destination (D), and an eavesdropper wants to wiretap the information from… ▽ More In this letter, the secrecy performance in cognitive radio networks (CRNs) over fluctuating two-ray (FTR) channels, which is used to model the millimetre wave channel, is investigated in terms of the secrecy outage probability (SOP). Specifically, we consider the case where a source (S) transmits confidential messages to a destination (D), and an eavesdropper wants to wiretap the information from S to D. In a CRN framework, we assume that the primary user shares its spectrum with S, where S adopts the underlay strategy to control its transmit power without impairing the quality of service of the primary user. After some mathematical manipulations, an exact analytical expression for the SOP is derived. In order to get physical and technical insights into the effect of the channel parameters on the SOP, we derive an asymptotic formula for the SOP in the high signal-to-noise ratio region of the S--D link. We finally show some selected Monte-Carlo simulation results to validate the correctness of our derived analytical expressions. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: 4 pages, 3 figures

Showing 1–48 of 48 results for author: Gaofeng