-
Hierarchical Control for Vehicle Repositioning in Autonomous Mobility on Demand Systems
Authors:
Pengbo Zhu,
Giancarlo Ferrari-Trecate,
Nikolas Geroliminis
Abstract:
Balancing passenger demand and vehicle availability is crucial for ensuring the sustainability and effectiveness of urban transportation systems. To address this challenge, we propose a novel hierarchical strategy for the efficient distribution of empty vehicles in urban areas. The proposed approach employs a data-enabled predictive control algorithm to develop a high-level controller, which guide…
▽ More
Balancing passenger demand and vehicle availability is crucial for ensuring the sustainability and effectiveness of urban transportation systems. To address this challenge, we propose a novel hierarchical strategy for the efficient distribution of empty vehicles in urban areas. The proposed approach employs a data-enabled predictive control algorithm to develop a high-level controller, which guides the inter-regional allocation of idle vehicles. This algorithm utilizes historical data on passenger demand and vehicle supply in each region to construct a non-parametric representation of the system, enabling it to determine the optimal number of vehicles to be repositioned or retained in their current regions without modeling the system. At the low level, a coverage control-based controller is designed to provide inter-regional position guidance, determining the desired road intersection each vehicle should target. With the objective of optimizing area coverage, it aligns the vehicle distribution with the demand across different districts within a single region. The effectiveness of the proposed method is validated through simulation experiments on the real road network of Shenzhen, China. The integration of the two layers provides better performance compared to applying either layer in isolation, demonstrating its potential to reduce passenger waiting time and answer more requests, thus promoting the development of more efficient and sustainable transportation systems.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Multi-Static ISAC based on Network-Assisted Full-Duplex Cell-Free Networks: Performance Analysis and Duplex Mode Optimization
Authors:
Fan Zeng,
Ruoyun Liu,
Xiaoyu Sun,
**gxuan Yu,
Jiamin Li,
Pengchen Zhu,
Dongming Wang,
Xiaohu You
Abstract:
Multi-static integrated sensing and communication (ISAC) technology, which can achieve a wider coverage range and avoid self-interference, is an important trend for the future development of ISAC. Existing multi-static ISAC designs are unable to support the asymmetric uplink (UL)/downlink (DL) communication requirements in the scenario while simultaneously achieving optimal sensing performance. Th…
▽ More
Multi-static integrated sensing and communication (ISAC) technology, which can achieve a wider coverage range and avoid self-interference, is an important trend for the future development of ISAC. Existing multi-static ISAC designs are unable to support the asymmetric uplink (UL)/downlink (DL) communication requirements in the scenario while simultaneously achieving optimal sensing performance. This paper proposes a design for multi-static ISAC based on network-assisted full-duplex (NAFD) cell-free networks can well solve the above problems. Under this design, closed-form expressions for the individual comunication rate and localization error rate are derived under imperfect channel state information, which are respectively utilized to assess the communication and sensing performances. Then, we propose a deep Q-network-based accesss point (AP) duplex mode optimization algorithm to obtain the trade-off between communication and sensing from the UL and DL perspectives of the APs. Simulation results demonstrate that the NAFD-based ISAC system proposed in this paper can achieve significantly better communication performance than other ISAC systems while ensuring minimal impact on sensing performance. Then, we validate the accuracy of the derived closed-form expressions. Furthermore, the proposed optimization algorithm achieves performance comparable to that of the exhaustion method with low complexity.
△ Less
Submitted 12 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
Authors:
Ziqian Ning,
Shuai Wang,
Pengcheng Zhu,
Zhichao Wang,
Jixun Yao,
Lei Xie,
Mengxiao Bi
Abstract:
Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes…
▽ More
Streaming voice conversion has become increasingly popular for its potential in real-time applications. The recently proposed DualVC 2 has achieved robust and high-quality streaming voice conversion with a latency of about 180ms. Nonetheless, the recognition-synthesis framework hinders end-to-end optimization, and the instability of automatic speech recognition (ASR) model with short chunks makes it challenging to further reduce latency. To address these issues, we propose an end-to-end model, DualVC 3. With speaker-independent semantic tokens to guide the training of the content encoder, the dependency on ASR is removed and the model can operate under extremely small chunks, with cascading errors eliminated. A language model is trained on the content encoder output to produce pseudo context by iteratively predicting future frames, providing more contextual information for the decoder to improve conversion quality. Experimental results demonstrate that DualVC 3 achieves comparable performance to DualVC 2 in subjective and objective metrics, with a latency of only 50 ms.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Distributed Invariant Kalman Filter for Cooperative Localization using Matrix Lie Groups
Authors:
Yizhi Zhou,
Yufan Liu,
Pengxiang Zhu,
Xuan Wang
Abstract:
This paper studies the problem of Cooperative Localization (CL) for multi-robot systems, where a group of mobile robots jointly localize themselves by using measurements from onboard sensors and shared information from other robots. We propose a novel distributed invariant Kalman Filter (DInEKF) based on the Lie group theory, to solve the CL problem in a 3-D environment. Unlike the standard EKF wh…
▽ More
This paper studies the problem of Cooperative Localization (CL) for multi-robot systems, where a group of mobile robots jointly localize themselves by using measurements from onboard sensors and shared information from other robots. We propose a novel distributed invariant Kalman Filter (DInEKF) based on the Lie group theory, to solve the CL problem in a 3-D environment. Unlike the standard EKF which computes the Jacobians based on the linearization at the state estimate, DInEKF defines the robots' motion model on matrix Lie groups and offers the advantage of state estimate-independent Jacobians. This significantly improves the consistency of the estimator. Moreover, the proposed algorithm is fully distributed, relying solely on each robot's ego-motion measurements and information received from its one-hop communication neighbors. The effectiveness of the proposed algorithm is validated in both Monte-Carlo simulations and real-world experiments. The results show that the proposed DInEKF outperforms the standard distributed EKF in terms of both accuracy and consistency.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
A Minimal Set of Parameters Based Depth-Dependent Distortion Model and Its Calibration Method for Stereo Vision Systems
Authors:
Xin Ma,
Puchen Zhu,
Xiao Li,
Xiaoyin Zheng,
Jianshu Zhou,
Xuchen Wang,
Kwok Wai Samuel Au
Abstract:
Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial an…
▽ More
Depth position highly affects lens distortion, especially in close-range photography, which limits the measurement accuracy of existing stereo vision systems. Moreover, traditional depth-dependent distortion models and their calibration methods have remained complicated. In this work, we propose a minimal set of parameters based depth-dependent distortion model (MDM), which considers the radial and decentering distortions of the lens to improve the accuracy of stereo vision systems and simplify their calibration process. In addition, we present an easy and flexible calibration method for the MDM of stereo vision systems with a commonly used planar pattern, which requires cameras to observe the planar pattern in different orientations. The proposed technique is easy to use and flexible compared with classical calibration techniques for depth-dependent distortion models in which the lens must be perpendicular to the planar pattern. The experimental validation of the MDM and its calibration method showed that the MDM improved the calibration accuracy by 56.55% and 74.15% compared with the Li's distortion model and traditional Brown's distortion model. Besides, an iteration-based reconstruction method is proposed to iteratively estimate the depth information in the MDM during three-dimensional reconstruction. The results showed that the accuracy of the iteration-based reconstruction method was improved by 9.08% compared with that of the non-iteration reconstruction method.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Trainable Joint Channel Estimation, Detection and Decoding for MIMO URLLC Systems
Authors:
Yi Sun,
Hong Shen,
Bingqing Li,
Wei Xu,
Pengcheng Zhu,
Nan Hu,
Chunming Zhao
Abstract:
The receiver design for multi-input multi-output (MIMO) ultra-reliable and low-latency communication (URLLC) systems can be a tough task due to the use of short channel codes and few pilot symbols. Consequently, error propagation can occur in traditional turbo receivers, leading to performance degradation. Moreover, the processing delay induced by information exchange between different modules may…
▽ More
The receiver design for multi-input multi-output (MIMO) ultra-reliable and low-latency communication (URLLC) systems can be a tough task due to the use of short channel codes and few pilot symbols. Consequently, error propagation can occur in traditional turbo receivers, leading to performance degradation. Moreover, the processing delay induced by information exchange between different modules may also be undesirable for URLLC. To address the issues, we advocate to perform joint channel estimation, detection, and decoding (JCDD) for MIMO URLLC systems encoded by short low-density parity-check (LDPC) codes. Specifically, we develop two novel JCDD problem formulations based on the maximum a posteriori (MAP) criterion for Gaussian MIMO channels and sparse mmWave MIMO channels, respectively, which integrate the pilots, the bit-to-symbol map**, the LDPC code constraints, as well as the channel statistical information. Both the challenging large-scale non-convex problems are then solved based on the alternating direction method of multipliers (ADMM) algorithms, where closed-form solutions are achieved in each ADMM iteration. Furthermore, two JCDD neural networks, called JCDDNet-G and JCDDNet-S, are built by unfolding the derived ADMM algorithms and introducing trainable parameters. It is interesting to find via simulations that the proposed trainable JCDD receivers can outperform the turbo receivers with affordable computational complexities.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Accent-VITS:accent transfer for end-to-end TTS
Authors:
Linhan Ma,
Yongmao Zhang,
Xinfa Zhu,
Yi Lei,
Ziqian Ning,
Pengcheng Zhu,
Lei Xie
Abstract:
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable…
▽ More
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable effective and stable accent transfer.We leverage a hierarchical CVAE structure to model accent pronunciation information and acoustic features, respectively, using bottleneck features and mel spectrums as constraints.Moreover, the text-to-wave map** in VITS is decomposed into text-to-accent and accent-to-wave map**s in Accent-VITS. In this way, the disentanglement of accent and speaker timbre becomes be more stable and effective.Experiments on multi-accent and Mandarin datasets show that Accent-VITS achieves higher speaker similarity, accent similarity and speech naturalness as compared with a strong baseline.
△ Less
Submitted 29 December, 2023; v1 submitted 28 December, 2023;
originally announced December 2023.
-
A Coverage Control-based Idle Vehicle Rebalancing Approach for Autonomous Mobility-on-Demand Systems
Authors:
Pengbo Zhu,
Isik Ilber Sirmatel,
Giancarlo Ferrari-Trecate,
Nikolas Geroliminis
Abstract:
As an emerging mode of urban transportation, Autonomous Mobility-on-Demand (AMoD) systems show the potential in improving mobility in cities through timely and door-to-door services. However, the spatiotemporal imbalances between mobility demand and supply may lead to inefficiencies and a low quality of service. Vehicle rebalancing (i.e., dispatching idle vehicles to high-demand areas), is a poten…
▽ More
As an emerging mode of urban transportation, Autonomous Mobility-on-Demand (AMoD) systems show the potential in improving mobility in cities through timely and door-to-door services. However, the spatiotemporal imbalances between mobility demand and supply may lead to inefficiencies and a low quality of service. Vehicle rebalancing (i.e., dispatching idle vehicles to high-demand areas), is a potential solution for efficient AMoD fleet management. In this paper, we formulate the vehicle rebalancing problem as a coverage control problem for the deployment of a fleet of mobile agents for AMoD operation in urban areas. Performance is demonstrated via microscopic simulations representing a large urban road network of Shenzhen, China. Results reveal the potential of the proposed method in improving service rates and decreasing passenger waiting times.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Passive Integrated Sensing and Communication Scheme based on RF Fingerprint Information Extraction for Cell-Free RAN
Authors:
**gxuan Yu,
Fan Zeng,
Jiamin Li,
Feiyang Liu,
Pengcheng Zhu,
Dongming Wang,
Xiaohu You
Abstract:
This paper investigates how to achieve integrated sensing and communication (ISAC) based on a cell-free radio access network (CF-RAN) architecture with a minimum footprint of communication resources. We propose a new passive sensing scheme. The scheme is based on the radio frequency (RF) fingerprint learning of the RF radio unit (RRU) to build an RF fingerprint library of RRUs. The source RRU is i…
▽ More
This paper investigates how to achieve integrated sensing and communication (ISAC) based on a cell-free radio access network (CF-RAN) architecture with a minimum footprint of communication resources. We propose a new passive sensing scheme. The scheme is based on the radio frequency (RF) fingerprint learning of the RF radio unit (RRU) to build an RF fingerprint library of RRUs. The source RRU is identified by comparing the RF fingerprints carried by the signal at the receiver side. The receiver extracts the channel parameters from the signal and estimates the channel environment, thus locating the reflectors in the environment. The proposed scheme can effectively solve the problem of interference between signals in the same time-frequency domain but in different spatial domains when multiple RRUs jointly serve users in CF-RAN architecture. Simulation results show that the proposed passive ISAC scheme can effectively detect reflector location information in the environment without degrading the communication performance.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Personalizing Keyword Spotting with Speaker Information
Authors:
Beltrán Labrador,
Pai Zhu,
Guanlong Zhao,
Angelo Scorza Scarpati,
Quan Wang,
Alicia Lozano-Diez,
Alex Park,
Ignacio López Moreno
Abstract:
Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker…
▽ More
Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups. To address this challenge, we propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM), a recent method for learning from multiple sources of information. We explore both Text-Dependent and Text-Independent speaker recognition systems to extract speaker information, and we experiment on extracting this information from both the input audio and pre-enrolled user audio. We evaluate our systems on a diverse dataset and achieve a substantial improvement in keyword detection accuracy, particularly among underrepresented speaker groups. Moreover, our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost, which makes it a practical solution for real-world applications.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
Authors:
Ziqian Ning,
Yuepeng Jiang,
Pengcheng Zhu,
Shuai Wang,
Jixun Yao,
Lei Xie,
Mengxiao Bi
Abstract:
Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC…
▽ More
Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC encounters several problems that limit its performance. First, the autoregressive decoder has error accumulation in its nature and limits the inference speed as well. Second, the causal convolution enables streaming capability but cannot sufficiently use future information within chunks. Third, the model is unable to effectively address the noise in the unvoiced segments, lowering the sound quality. In this paper, we propose DualVC 2 to address these issues. Specifically, the model backbone is migrated to a Conformer-based architecture, empowering parallel inference. Causal convolution is replaced by non-causal convolution with dynamic chunk mask to make better use of within-chunk future information. Also, quiet attention is introduced to enhance the model's noise robustness. Experiments show that DualVC 2 outperforms DualVC and other baseline systems in both subjective and objective metrics, with only 186.4 ms latency. Our audio samples are made publicly available.
△ Less
Submitted 18 January, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Signal Processing and Learning for Next Generation Multiple Access in 6G
Authors:
Wei Chen,
Yuanwei Liu,
Hamid Jafarkhani,
Yonina C. Eldar,
Peiying Zhu,
Khaled B Letaief
Abstract:
Wireless communication systems to date primarily rely on the orthogonality of resources to facilitate the design and implementation, from user access to data transmission. Emerging applications and scenarios in the sixth generation (6G) wireless systems will require massive connectivity and transmission of a deluge of data, which calls for more flexibility in the design concept that goes beyond or…
▽ More
Wireless communication systems to date primarily rely on the orthogonality of resources to facilitate the design and implementation, from user access to data transmission. Emerging applications and scenarios in the sixth generation (6G) wireless systems will require massive connectivity and transmission of a deluge of data, which calls for more flexibility in the design concept that goes beyond orthogonality. Furthermore, recent advances in signal processing and learning have attracted considerable attention, as they provide promising approaches to various complex and previously intractable problems of signal processing in many fields. This article provides an overview of research efforts to date in the field of signal processing and learning for next-generation multiple access, with an emphasis on massive random access and non-orthogonal multiple access. The promising interplay with new technologies and the challenges in learning-based NGMA are discussed.
△ Less
Submitted 9 September, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Sensiverse: A dataset for ISAC study
Authors:
Jia** Luo,
Baojian Zhou,
Yang Yu,
** Zhang,
Xiaohui Peng,
Jianglei Ma,
Peiying Zhu,
Jianmin Lu,
Wen Tong
Abstract:
In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the…
▽ More
In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the dataset are also described, and finally the use of the dataset is illustrated with examples through the evaluation of use cases such as 3D environment reconstruction and moving targets.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models
Authors:
Heyang Xue,
Shuai Guo,
Pengcheng Zhu,
Mengxiao Bi
Abstract:
Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example. However, the sampling drift problem leads to these approaches struggling in multi-speaker scenarios in practice due to mo…
▽ More
Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example. However, the sampling drift problem leads to these approaches struggling in multi-speaker scenarios in practice due to more complex target data distribution compared to single-speaker scenarios. In this paper, we present Multi-GradSpeech, a multi-speaker diffusion-based acoustic models which introduces the Consistent Diffusion Model (CDM) as a generative modeling approach. We enforce the consistency property of CDM during the training process to alleviate the sampling drift problem in the inference stage, resulting in significant improvements in multi-speaker TTS performance. Our experimental results corroborate that our proposed approach can improve the performance of different speakers involved in multi-speaker TTS compared to Grad-TTS, even outperforming the fine-tuning approach. Audio samples are available at https://welkinyang.github.io/multi-gradspeech/
△ Less
Submitted 31 August, 2023; v1 submitted 20 August, 2023;
originally announced August 2023.
-
Joint Uplink and Downlink Resource Allocation Towards Energy-efficient Transmission for URLLC
Authors:
Kang Li,
Pengcheng Zhu,
Yan Wang,
Fu-Chun Zheng,
Xiaohu You
Abstract:
Ultra-reliable and low-latency communications (URLLC) is firstly proposed in 5G networks, and expected to support applications with the most stringent quality-of-service (QoS). However, since the wireless channels vary dynamically, the transmit power for ensuring the QoS requirements of URLLC may be very high, which conflicts with the power limitation of a real system. To fulfill the successful UR…
▽ More
Ultra-reliable and low-latency communications (URLLC) is firstly proposed in 5G networks, and expected to support applications with the most stringent quality-of-service (QoS). However, since the wireless channels vary dynamically, the transmit power for ensuring the QoS requirements of URLLC may be very high, which conflicts with the power limitation of a real system. To fulfill the successful URLLC transmission with finite transmit power, we propose an energy-efficient packet delivery mechanism incorparated with frequency-hop** and proactive drop** in this paper. To reduce uplink outage probability, frequency-hop** provides more chances for transmission so that the failure hardly occurs. To avoid downlink outage from queue clearing, proactive drop** controls overall reliability by introducing an extra error component. With the proposed packet delivery mechanism, we jointly optimize bandwidth allocation and power control of uplink and downlink, antenna configuration, and subchannel assignment to minimize the average total power under the constraint of URLLC transmission requirements. Via theoretical analysis (e.g., the convexity with respect to bandwidth, the independence of bandwidth allocation, the convexity of antenna configuration with inactive constraints), the simplication of finding the global optimal solution for resource allocation is addressed. A three-step method is then proposed to find the optimal solution for resource allocation. Simulation results validate the analysis and show the performance gain by optimizing resource allocation with the proposed packet delivery mechanism.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Resource Allocation in Cell-Free MU-MIMO Multicarrier System with Finite and Infinite Blocklength
Authors:
Jiafei Fu,
Pengcheng Zhu,
Bo Ai,
Jiangzhou Wang,
Xiaohu You
Abstract:
The explosive growth of data results in more scarce spectrum resources. It is important to optimize the system performance under limited resources. In this paper, we investigate how to achieve weighted throughput (WTP) maximization for cell-free (CF) multiuser MIMO (MU-MIMO) multicarrier (MC) systems through resource allocation (RA), in the cases of finite blocklength (FBL) and infinite blocklengt…
▽ More
The explosive growth of data results in more scarce spectrum resources. It is important to optimize the system performance under limited resources. In this paper, we investigate how to achieve weighted throughput (WTP) maximization for cell-free (CF) multiuser MIMO (MU-MIMO) multicarrier (MC) systems through resource allocation (RA), in the cases of finite blocklength (FBL) and infinite blocklength (INFBL) regimes. To ensure the quality of service (QoS) of each user, particularly for the block error rate (BLER) and latency in the FBL regime, the WTP gets maximized under the constraints of total power consumption and required QoS metrics. Since the channels vary in different subcarriers (SCs) and inter-user interference strengths, the WTP can be maximized by scheduling the best users in each time-frequency (TF) resource and advanced beamforming design, while the resources can be fully utilized. With this motivation, we propose a joint user scheduling (US) and beamforming design algorithm based on the successive convex approximation (SCA) and gene-aided (GA) algorithms, to address a mixed integer nonlinear programming (MINLP) problem. Numerical results demonstrate that the proposed RA outperforms the comparison schemes. And the CF system in our scenario is capable of achieving higher spectral efficiency than the centralized antenna systems (CAS).
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Grou** Method for mmWave Massive MIMO System: Exploitation of Angular Multiplexing Gain
Authors:
Peng Jiang,
Pengcheng Zhu,
Jiamin Li,
Dongming Wang
Abstract:
A future millimeter-wave (mmWave) massive multiple-input and multiple-output (MIMO) system may serve hundreds or thousands of users at the same time; thus, research on multiple access technology is particularly important.Moreover, due to the short-wavelength nature of a mmWave, large-scale arrays are easier to implement than microwaves, while their directivity and sparseness make the physical beam…
▽ More
A future millimeter-wave (mmWave) massive multiple-input and multiple-output (MIMO) system may serve hundreds or thousands of users at the same time; thus, research on multiple access technology is particularly important.Moreover, due to the short-wavelength nature of a mmWave, large-scale arrays are easier to implement than microwaves, while their directivity and sparseness make the physical beamforming effect of precoding more prominent.In consideration of the mmWave angle division multiple access (ADMA) system based on precoding, this paper investigates the influence of the angle distribution on system performance, which is denoted as the angular multiplexing gain.Furthermore, inspired by the above research, we transform the ADMA user grou** problem to maximize the system sum-rate into the inter-user angular spacing equalization problem.Then, the form of the optimal solution for the approximate problem is derived, and the corresponding grou** algorithm is proposed.The simulation results demonstrate that the proposed algorithm performs better than the comparison methods.Finally, a complexity analysis also shows that the proposed algorithm has extremely low complexity.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Authors:
Ziqian Ning,
Yuepeng Jiang,
Pengcheng Zhu,
Jixun Yao,
Shuai Wang,
Lei Xie,
Mengxiao Bi
Abstract:
Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities. Unlike typical (non-streaming) voice conversion, which can leverage the entire utterance as full context, streaming voice conversion faces significant challenges due to the missing future information, resulting in degraded intelligibility,…
▽ More
Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities. Unlike typical (non-streaming) voice conversion, which can leverage the entire utterance as full context, streaming voice conversion faces significant challenges due to the missing future information, resulting in degraded intelligibility, speaker similarity, and sound quality. To address this challenge, we propose DualVC, a dual-mode neural voice conversion approach that supports both streaming and non-streaming modes using jointly trained separate network parameters. Furthermore, we propose intra-model knowledge distillation and hybrid predictive coding (HPC) to enhance the performance of streaming conversion. Additionally, we incorporate data augmentation to train a noise-robust autoregressive decoder, improving the model's performance on long-form speech conversion. Experimental results demonstrate that the proposed model outperforms the baseline models in the context of streaming voice conversion, while maintaining comparable performance to the non-streaming topline system that leverages the complete context, albeit with a latency of only 252.8 ms.
△ Less
Submitted 30 May, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Model-driven CT reconstruction algorithm for nano-resolution X-ray phase contrast imaging
Authors:
Xuebao Cai,
Yuhang Tan,
Ting Su,
Dong Liang,
Hairong Zheng,
**you Xu,
Pei** Zhu,
Yongshuai Ge
Abstract:
The low-density imaging performance of a zone plate based nano-resolution hard X-ray computed tomography (CT) system can be significantly improved by incorporating a grating-based Lau interferometer. Due to the diffraction, however, the acquired nano-resolution phase signal may suffer splitting problem, which impedes the direct reconstruction of phase contrast CT (nPCT) images. To overcome, a new…
▽ More
The low-density imaging performance of a zone plate based nano-resolution hard X-ray computed tomography (CT) system can be significantly improved by incorporating a grating-based Lau interferometer. Due to the diffraction, however, the acquired nano-resolution phase signal may suffer splitting problem, which impedes the direct reconstruction of phase contrast CT (nPCT) images. To overcome, a new model-driven nPCT image reconstruction algorithm is developed in this study. In it, the diffraction procedure is mathematically modeled into a matrix B, from which the projections without signal splitting can be generated invertedly. Furthermore, a penalized weighed least-square model with total variation (PWLS-TV) is employed to denoise these projections, from which nPCT images with high accuracy are directly reconstructed. Numerical and physical experiments demonstrate that this new algorithm is able to work with phase projections having any splitting distances. Results also reveal that nPCT images with higher signal-to-noise-ratio (SNR) would be reconstructed from projections with larger signal splittings. In conclusion, a novel model-driven nPCT image reconstruction algorithm with high accuracy and robustness is verified for the Lau interferometer based hard X-ray nano-resolution phase contrast imaging.
△ Less
Submitted 13 October, 2023; v1 submitted 14 May, 2023;
originally announced May 2023.
-
On the Road to 6G: Visions, Requirements, Key Technologies and Testbeds
Authors:
Cheng-Xiang Wang,
Xiaohu You,
Xiqi Gao,
Xiuming Zhu,
Zixin Li,
Chuan Zhang,
Haiming Wang,
Yongming Huang,
Yunfei Chen,
Harald Haas,
John S. Thompson,
Erik G. Larsson,
Marco Di Renzo,
Wen Tong,
Peiying Zhu,
Xuemin,
Shen,
H. Vincent Poor,
Lajos Hanzo
Abstract:
Fifth generation (5G) mobile communication systems have entered the stage of commercial development, providing users with new services and improved user experiences as well as offering a host of novel opportunities to various industries. However, 5G still faces many challenges. To address these challenges, international industrial, academic, and standards organizations have commenced research on s…
▽ More
Fifth generation (5G) mobile communication systems have entered the stage of commercial development, providing users with new services and improved user experiences as well as offering a host of novel opportunities to various industries. However, 5G still faces many challenges. To address these challenges, international industrial, academic, and standards organizations have commenced research on sixth generation (6G) wireless communication systems. A series of white papers and survey papers have been published, which aim to define 6G in terms of requirements, application scenarios, key technologies, etc. Although ITU-R has been working on the 6G vision and it is expected to reach a consensus on what 6G will be by mid-2023, the related global discussions are still wide open and the existing literature has identified numerous open issues. This paper first provides a comprehensive portrayal of the 6G vision, technical requirements, and application scenarios, covering the current common understanding of 6G. Then, a critical appraisal of the 6G network architecture and key technologies is presented. Furthermore, existing testbeds and advanced 6G verification platforms are detailed for the first time. In addition, future research directions and open challenges are identified for stimulating the on-going global debate. Finally, lessons learned to date concerning 6G networks are discussed.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Spectral Efficiency and Scalability Analysis for Multi-Level Cooperative Cell-Free Massive MIMO Systems
Authors:
Jiamin Li,
Xiaoyu Sun,
Pengcheng Zhu,
Dongming Wang,
Xiaohu You
Abstract:
This paper proposes a multi-level cooperative architecture to balance the spectral efficiency and scalability of cell-free massive multiple-input multiple-output (MIMO) systems. In the proposed architecture, spatial expansion units (SEUs) are introduced to avoid a large amount of computation at the access points (APs) and increase the degree of cooperation among APs. We first derive the closed-for…
▽ More
This paper proposes a multi-level cooperative architecture to balance the spectral efficiency and scalability of cell-free massive multiple-input multiple-output (MIMO) systems. In the proposed architecture, spatial expansion units (SEUs) are introduced to avoid a large amount of computation at the access points (APs) and increase the degree of cooperation among APs. We first derive the closed-form expressions of the uplink user achievable rates under multi-level cooperative architecture with maximal ratio combination (MRC) and zero-forcing (ZF) receivers. The accuracy of the closed-form expressions is verified. Moreover, numerical results have demonstrated that the proposed multi-level cooperative architecture achieves a better trade-off between spectral efficiency and scalability than other forms of cell-free massive MIMO architectures.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Network-Assisted Full-Duplex Cell-Free mmWave Massive MIMO Systems with DAC Quantization and Fronthaul Compression
Authors:
Jiamin Li,
Qingrui Fan,
Yu Zhang,
Pengcheng Zhu,
Dongming Wang,
Hao Wu,
Xiaohu You
Abstract:
In this paper, we investigate network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems with digital-to-analog converter (DAC) quantization and fronthaul compression. We propose to maximize the weighted uplink and downlink sum rate by jointly optimizing the power allocation of both the transmitting remote antenna units (T-RAUs) and…
▽ More
In this paper, we investigate network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems with digital-to-analog converter (DAC) quantization and fronthaul compression. We propose to maximize the weighted uplink and downlink sum rate by jointly optimizing the power allocation of both the transmitting remote antenna units (T-RAUs) and uplink users and the variances of the downlink and uplink fronthaul compression noises. To deal with this challenging problem, we further apply a successive convex approximation (SCA) method to handle the non-convex bidirectional limited-capacity fronthaul constraints. The simulation results verify the convergence of the proposed SCA-based algorithm and analyze the impact of fronthaul capacity and DAC quantization on the spectral efficiency of the NAFD cell-free mmWave massive MIMO systems. Moreover, some insightful conclusions are obtained through the comparisons of spectral efficiency, which shows that NAFD achieves better performance gains than co-time co-frequency full-duplex cloud radio access network (CCFD C-RAN) in the cases of practical limited-resolution DACs. Specifically, their performance gaps with 8-bit DAC quantization are larger than that with 1-bit DAC quantization, which attains a 5.5-fold improvement.
△ Less
Submitted 17 February, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
Authors:
Pengfei Zhu,
Chao Pang,
Yekun Chai,
Lei Li,
Shuohuan Wang,
Yu Sun,
Hao Tian,
Hua Wu
Abstract:
In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinne…
▽ More
In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models. Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process within the diffusion model framework. Addressing the challenge of limited text-music parallel data, we undertake the creation of a dataset by harnessing web resources, a task facilitated by weak supervision techniques. Furthermore, a rigorous empirical inquiry is undertaken to contrast the efficacy of two distinct prompt formats for text conditioning, namely, music tags and unconstrained textual descriptions. The outcomes of this comparative analysis affirm the superior performance of our proposed model in terms of enhancing text-music relevance. Finally, our work culminates in a demonstrative exhibition of the excellent capabilities of our model in text-to-music generation. We further demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.
△ Less
Submitted 21 September, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
HAPS for 6G Networks: Potential Use Cases, Open Challenges, and Possible Solutions
Authors:
Omid Abbasi,
Animesh Yadav,
Halim Yanikomeroglu,
Ngoc Dung Dao,
Gamini Senarath,
Peiying Zhu
Abstract:
High altitude platform station (HAPS), which is deployed in the stratosphere at an altitude of 20-50 kilometres, has attracted much attention in recent years due to their large footprint, line-of-sight links, and fixed position relative to the Earth. Compared with existing network infrastructure, HAPS has a much larger coverage area than terrestrial base stations and is much closer than satellites…
▽ More
High altitude platform station (HAPS), which is deployed in the stratosphere at an altitude of 20-50 kilometres, has attracted much attention in recent years due to their large footprint, line-of-sight links, and fixed position relative to the Earth. Compared with existing network infrastructure, HAPS has a much larger coverage area than terrestrial base stations and is much closer than satellites to the ground users. Besides small-cells and macro-cells, a HAPS can offer one mega-cell, which can complement legacy networks in 6G and beyond wireless systems. This paper explores potential use cases and discusses relevant open challenges of integrating HAPS into legacy networks, while also suggesting some solutions to these challenges. The cumulative density functions of spectral efficiency of the integrated network and cell-edge users are studied and compared with terrestrial network. The results show the capacity gains achieved by the integrated network are beneficial to cell-edge users. Furthermore, the advantages of a HAPS for backhauling aerial base stations are demonstrated by the simulation results.
△ Less
Submitted 11 April, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Optimization of the energy efficiency in Smart Internet of Vehicles assisted by MEC
Authors:
Jiafei Fu,
Pengcheng Zhu,
**gyu Hua,
Jiamin Li,
Jiangang Wen
Abstract:
Smart Internet of Vehicles (IoV) as a promising application in Internet of Things (IoT) emerges with the development of the fifth generation mobile communication (5G). Nevertheless, the heterogeneous requirements of sufficient battery capacity, powerful computing ability and energy efficiency for electric vehicles face great challenges due to the explosive data growth in 5G and the sixth generatio…
▽ More
Smart Internet of Vehicles (IoV) as a promising application in Internet of Things (IoT) emerges with the development of the fifth generation mobile communication (5G). Nevertheless, the heterogeneous requirements of sufficient battery capacity, powerful computing ability and energy efficiency for electric vehicles face great challenges due to the explosive data growth in 5G and the sixth generation of mobile communication (6G) networks. In order to alleviate the deficiencies mentioned above, this paper proposes a mobile edge computing (MEC) enabled IoV system, in which electric vehicle nodes (eVNs) upload and download data through an anchor node (AN) which is integrated with a MEC server. Meanwhile, the anchor node transmitters radio signal to electric vehicles with simultaneous wireless information and power transfer (SWIPT) technology so as to compensate the battery limitation of eletric vehicles. Moreover, the spectrum efficiency is further improved by multi-input and multi-output (MIMO) and full-duplex (FD) technologies which is equipped at the anchor node. In consideration of the issues above, we maximize the average energy efficiency of electric vehicles by jointly optimize the CPU frequency, vehicle transmitting power, computing tasks and uplink rate. Since the problem is nonconvex, we propose a novel alternate interior-point iterative scheme (AIIS) under the constraints of computing tasks, energy consumption and time latency. Results and discussion section verifies the effectiveness of the proposed AIIS scheme comparing with the benchmark schemes.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Performance Analysis and Optimization of Network-Assisted Full-Duplex Systems under Low-Resolution ADCs
Authors:
Xiangning Song,
Zhenhao Ji,
Jiamin Li,
Pengcheng Zhu,
Dongming Wang,
Xiaohu You
Abstract:
Network-assisted full-duplex (NAFD) distributed massive multiple input multiple output (M-MIMO) enables the in-band full-duplex with existing half-duplex devices at the network level, which exceptionally improves spectral efficiency. This paper analyzes the impact of low-resolution analog-to-digital converters (ADCs) on NAFD distributed M-MIMO and designs an efficient bit allocation algorithm for…
▽ More
Network-assisted full-duplex (NAFD) distributed massive multiple input multiple output (M-MIMO) enables the in-band full-duplex with existing half-duplex devices at the network level, which exceptionally improves spectral efficiency. This paper analyzes the impact of low-resolution analog-to-digital converters (ADCs) on NAFD distributed M-MIMO and designs an efficient bit allocation algorithm for low-resolution ADCs. The beamforming training mechanism relieves the heavy pilot overhead for channel estimation, which remarkably enhances system performance by guiding the interference cancellation and coherence detection. Furthermore, closed-form expressions for spectral and energy efficiency with low-resolution ADCs are derived. The multi-objective optimization problem (MOOP) for spectral and energy efficiency is solved by the deep Q network and the non-dominated sorting genetic algorithm II. The simulation results corroborate the theoretical derivation and verify the effectiveness of introducing low-resolution ADCs in NAFD distributed M-MIMO systems. Meanwhile, a set of Pareto-optimal solutions for ADC accuracy flexibly provide guidelines for deploying in a practical NAFD distributed M-MIMO system.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features
Authors:
Ziqian Ning,
Qicong Xie,
Pengcheng Zhu,
Zhichao Wang,
Liumeng Xue,
Jixun Yao,
Lei Xie,
Mengxiao Bi
Abstract:
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically,…
▽ More
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically, we use a BNF encoder and a Perturbed-Wav encoder to form a content extractor to learn linguistic and para-linguistic features respectively, where BNFs come from a robust pre-trained ASR model and the perturbed wave becomes speaker-irrelevant after signal perturbation. We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input. Finally the decoder consumes the integrated features and the speaker-dependent prosody feature to generate the converted speech. Experiments demonstrate that Expressive-VC is superior to several state-of-the-art systems, achieving both high expressiveness captured from the source speech and high speaker similarity with the target speaker; meanwhile intelligibility is well maintained.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Authors:
Xiaoran Fan,
Chao Pang,
Tian Yuan,
He Bai,
Renjie Zheng,
Pengfei Zhu,
Shuohuan Wang,
Junkun Chen,
Zeyu Chen,
Liang Huang,
Yu Sun,
Hua Wu
Abstract:
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We prop…
▽ More
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods.
△ Less
Submitted 4 December, 2022; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Low Altitude 3-D Coverage Performance Analysis in Cell-Free Distributed Collaborative Massive MIMO Systems
Authors:
Jiamin Li,
Qijun Pan,
Pengcheng Zhu,
Dongming Wang,
Xiaohu You
Abstract:
To improve the poor performance of distributed operation and non-scalability of centralized operation in traditional cell-free massive MIMO, we propose a cell-free distributed collaborative (CFDC) massive multiple-input multiple-output (MIMO) system based on a novel two-layer model to take advantages of the distributed cloud-edge-end collaborative architecture in beyond 5G (B5G) internet of things…
▽ More
To improve the poor performance of distributed operation and non-scalability of centralized operation in traditional cell-free massive MIMO, we propose a cell-free distributed collaborative (CFDC) massive multiple-input multiple-output (MIMO) system based on a novel two-layer model to take advantages of the distributed cloud-edge-end collaborative architecture in beyond 5G (B5G) internet of things (IoT) environment to provide strong flexibility and scalability. We further ultilize the proposed CFDC massive MIMO system to support the low altitude three-dimensional (3-D) coverage scenario with unmanned aerial vehicles (UAVs), while accounting for 3-D Rician channel estimation, user-centric association and different scalable receiving schemes. Since coexisted UAVs and ground users (GUEs) cause greater interference, we ultilize user-centric association strategy and minimum-mean-square error (MMSE) channel state information (CSI) estimation to obtain the estimated CSI of UAVs and GUEs. Under the CFDC scenarios, scalable receiving schemes as maximum ratio combing (MRC), partial zero-forcing (P-ZF) and partial minimum-mean-square error (P-MMSE) can be performed at edge servers and the closed-form expressions for uplink spectral efficiency (SE) are derived. Based on the derived expressions, we propose an efficient power control algorithm by solving a multi-objective optimization problem (MOOP) between maximizing the average SE of UAVs and GUEs simultaneously with Deep Q-Network (DQN). Numerical results verify the accuracy of the derived closed-form expressions and the effectiveness of the coexisted UAVs and GUEs transmission scheme in CFDC massive MIMO systems. The SE analysis under various system parameters offers numerous flexibilities for system optimization.
△ Less
Submitted 28 March, 2023; v1 submitted 27 June, 2022;
originally announced June 2022.
-
WKGM: Weight-K-space Generative Model for Parallel Imaging Reconstruction
Authors:
Zongjiang Tu,
Die Liu,
Xiaoqing Wang,
Chen Jiang,
Pengwen Zhu,
Minghui Zhang,
Shanshan Wang,
Dong Liang,
Qiegen Liu
Abstract:
Deep learning based parallel imaging (PI) has made great progresses in recent years to accelerate magnetic resonance imaging (MRI). Nevertheless, it still has some limitations, such as the robustness and flexibility of existing methods have great deficiency. In this work, we propose a method to explore the k-space domain learning via robust generative modeling for flexible calibration-less PI reco…
▽ More
Deep learning based parallel imaging (PI) has made great progresses in recent years to accelerate magnetic resonance imaging (MRI). Nevertheless, it still has some limitations, such as the robustness and flexibility of existing methods have great deficiency. In this work, we propose a method to explore the k-space domain learning via robust generative modeling for flexible calibration-less PI reconstruction, coined weight-k-space generative model (WKGM). Specifically, WKGM is a generalized k-space domain model, where the k-space weighting technology and high-dimensional space augmentation design are efficiently incorporated for score-based generative model training, resulting in good and robust reconstructions. In addition, WKGM is flexible and thus can be synergistically combined with various traditional k-space PI models, which can make full use of the correlation between multi-coil data and realizecalibration-less PI. Even though our model was trained on only 500 images, experimental results with varying sampling patterns and acceleration factors demonstrate that WKGM can attain state-of-the-art reconstruction results with the well-learned k-space generative prior.
△ Less
Submitted 24 November, 2022; v1 submitted 8 May, 2022;
originally announced May 2022.
-
One-step Method for Material Quantitation using In-line Tomography with Single Scanning
Authors:
Suyu Liao,
Shiwo Deng,
Yining Zhu,
Huitao Zhang,
Pei** Zhu,
Kai Zhang,
Xing Zhao
Abstract:
Objective: Quantitative technique based on In-line phase-contrast computed tomography with single scanning attracts more attention in application due to the flexibility of the implementation. However, the quantitative results usually suffer from artifacts and noise, since the phase retrieval and reconstruction are independent ("two-steps") without feedback from the original data. Our goal is to de…
▽ More
Objective: Quantitative technique based on In-line phase-contrast computed tomography with single scanning attracts more attention in application due to the flexibility of the implementation. However, the quantitative results usually suffer from artifacts and noise, since the phase retrieval and reconstruction are independent ("two-steps") without feedback from the original data. Our goal is to develop a method for material quantitative imaging based on a priori information specifically for the single-scanning data. Method: An iterative method that directly reconstructs the refractive index decrement delta and imaginary beta of the object from observed data ("one-step") within single object-to-detector distance (ODD) scanning. Simultaneously, high-quality quantitative reconstruction results are obtained by using a linear approximation that achieves material decomposition in the iterative process. Results: By comparing the equivalent atomic number of the material decomposition results in experiments, the accuracy of the proposed method is greater than 97.2%. Conclusion: The quantitative reconstruction and decomposition results are effectively improved, and there are feedback and corrections during the iteration, which effectively reduce the impact of noise and errors. Significance: This algorithm has the potential for quantitative imaging research, especially for imaging live samples and human breast preclinical studies.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Authors:
Heyang Xue,
Xinsheng Wang,
Yongmao Zhang,
Lei Xie,
Pengcheng Zhu,
Mengxiao Bi
Abstract:
Building a high-quality singing corpus for a person who is not good at singing is non-trivial, thus making it challenging to create a singing voice synthesizer for this person. Learn2Sing is dedicated to synthesizing the singing voice of a speaker without his or her singing data by learning from data recorded by others, i.e., the singing teacher. Inspired by the fact that pitch is the key style fa…
▽ More
Building a high-quality singing corpus for a person who is not good at singing is non-trivial, thus making it challenging to create a singing voice synthesizer for this person. Learn2Sing is dedicated to synthesizing the singing voice of a speaker without his or her singing data by learning from data recorded by others, i.e., the singing teacher. Inspired by the fact that pitch is the key style factor to distinguish singing from speaking voice, the proposed Learn2Sing 2.0 first generates the preliminary acoustic feature with averaged pitch value in the phone level, which allows the training of this process for different styles, i.e., speaking or singing, share same conditions except for the speaker information. Then, conditioned on the specific style, a diffusion decoder, which is accelerated by a fast sampling algorithm during the inference stage, is adopted to gradually restore the final acoustic feature. During the training, to avoid the information confusion of the speaker embedding and the style embedding, mutual information is employed to restrain the learning of speaker embedding and style embedding. Experiments show that the proposed approach is capable of synthesizing high-quality singing voice for the target speaker without singing data with 10 decoding steps.
△ Less
Submitted 26 May, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Label-efficient Hybrid-supervised Learning for Medical Image Segmentation
Authors:
Junwen Pan,
Qi Bi,
Yanzhan Yang,
Pengfei Zhu,
Cheng Bian
Abstract:
Due to the lack of expertise for medical image annotation, the investigation of label-efficient methodology for medical image segmentation becomes a heated topic. Recent progresses focus on the efficient utilization of weak annotations together with few strongly-annotated labels so as to achieve comparable segmentation performance in many unprofessional scenarios. However, these approaches only co…
▽ More
Due to the lack of expertise for medical image annotation, the investigation of label-efficient methodology for medical image segmentation becomes a heated topic. Recent progresses focus on the efficient utilization of weak annotations together with few strongly-annotated labels so as to achieve comparable segmentation performance in many unprofessional scenarios. However, these approaches only concentrate on the supervision inconsistency between strongly- and weakly-annotated instances but ignore the instance inconsistency inside the weakly-annotated instances, which inevitably leads to performance degradation. To address this problem, we propose a novel label-efficient hybrid-supervised framework, which considers each weakly-annotated instance individually and learns its weight guided by the gradient direction of the strongly-annotated instances, so that the high-quality prior in the strongly-annotated instances is better exploited and the weakly-annotated instances are depicted more precisely. Specially, our designed dynamic instance indicator (DII) realizes the above objectives, and is adapted to our dynamic co-regularization (DCR) framework further to alleviate the erroneous accumulation from distortions of weak annotations. Extensive experiments on two hybrid-supervised medical segmentation datasets demonstrate that with only 10% strong labels, the proposed framework can leverage the weak labels efficiently and achieve competitive performance against the 100% strong-label supervised scenario.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Authors:
Yu Wang,
Xinsheng Wang,
Pengcheng Zhu,
Jie Wu,
Hanzhao Li,
Heyang Xue,
Yongmao Zhang,
Lei Xie,
Mengxiao Bi
Abstract:
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44,100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been p…
▽ More
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44,100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.
△ Less
Submitted 19 January, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation
Authors:
Zhichao Wang,
Qicong Xie,
Tao Li,
Hongqiang Du,
Lei Xie,
Pengcheng Zhu,
Mengxiao Bi
Abstract:
One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness. In this paper, we build on the recognition-synthesis framework and propose a one-shot voice conversion approach for style transfer based on speaker adaptation. First, a speaker normalization module is ad…
▽ More
One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness. In this paper, we build on the recognition-synthesis framework and propose a one-shot voice conversion approach for style transfer based on speaker adaptation. First, a speaker normalization module is adopted to remove speaker-related information in bottleneck features extracted by ASR. Second, we adopt weight regularization in the adaptation process to prevent over-fitting caused by using only one utterance from target speaker as training data. Finally, to comprehensively decouple the speech factors, i.e., content, speaker, style, and transfer source style to the target, a prosody module is used to extract prosody representation. Experiments show that our approach is superior to the state-of-the-art one-shot VC systems in terms of style and speaker similarity; additionally, our approach also maintains good speech quality.
△ Less
Submitted 21 February, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Authors:
Yongmao Zhang,
Jian Cong,
Heyang Xue,
Lei Xie,
Pengcheng Zhu,
Mengxiao Bi
Abstract:
In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score. Our approach is inspired by VITS, which adopts VAE-based posterior encoder augmented with normalizing flow-based prior encoder and adversarial decoder to realize complete end-to-end speech generation. VISinger follows the…
▽ More
In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score. Our approach is inspired by VITS, which adopts VAE-based posterior encoder augmented with normalizing flow-based prior encoder and adversarial decoder to realize complete end-to-end speech generation. VISinger follows the main architecture of VITS, but makes substantial improvements to the prior encoder based on the characteristics of singing. First, instead of using phoneme-level mean and variance of acoustic features, we introduce a length regulator and a frame prior network to get the frame-level mean and variance on acoustic features, modeling the rich acoustic variation in singing. Second, we further introduce an F0 predictor to guide the frame prior network, leading to stabler singing performance. Finally, to improve the singing rhythm, we modify the duration predictor to specifically predict the phoneme to note duration ratio, helped with singing note normalization. Experiments on a professional Mandarin singing corpus show that VISinger significantly outperforms FastSpeech+Neural-Vocoder two-stage approach and the oracle VITS; ablation study demonstrates the effectiveness of different contributions.
△ Less
Submitted 24 February, 2022; v1 submitted 17 October, 2021;
originally announced October 2021.
-
Disarranged Zone Learning (DZL): An unsupervised and dynamic automatic stenosis recognition methodology based on coronary angiography
Authors:
Yanan Dai,
Pengxiong Zhu,
Bangde Xue,
Yun Ling,
Xibao Shi,
Liang Geng,
Qi Zhang,
Jun Liu
Abstract:
We proposed a novel unsupervised methodology named Disarranged Zone Learning (DZL) to automatically recognize stenosis in coronary angiography. The methodology firstly disarranges the frames in a video, secondly it generates an effective zone and lastly trains an encoder-decoder GRU model to learn the capability to recover disarranged frames. The breakthrough of our study is to discover and valida…
▽ More
We proposed a novel unsupervised methodology named Disarranged Zone Learning (DZL) to automatically recognize stenosis in coronary angiography. The methodology firstly disarranges the frames in a video, secondly it generates an effective zone and lastly trains an encoder-decoder GRU model to learn the capability to recover disarranged frames. The breakthrough of our study is to discover and validate the Sequence Intensity (Recover Difficulty) is a measure of Coronary Artery Stenosis Status. Hence, the prediction accuracy of DZL is used as an approximator of coronary stenosis indicator. DZL is an unsupervised methodology and no label engineering effort is needed, the sub GRU model in DZL works as a self-supervised approach. So DZL could theoretically utilize infinitely huge amounts of coronary angiographies to learn and improve performance without laborious data labeling. There is no data preprocessing precondition to run DZL as it dynamically utilizes the whole video, hence it is easy to be implemented and generalized to overcome the data heterogeneity of coronary angiography. The overall average precision score achieves 0.93, AUC achieves 0.8 for this pure methodology. The highest segmented average precision score is 0.98 and the highest segmented AUC is 0.87 for coronary occlusion indicator. Finally, we developed a software demo to implement DZL methodology.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Cell Multi-Bernoulli (Cell-MB) Sensor Control for Multi-object Search-While-Tracking (SWT)
Authors:
Keith A. LeGrand,
**** Zhu,
Silvia Ferrari
Abstract:
Information-driven control can be used to develop intelligent sensors that can optimize their measurement value based on environmental feedback. In object tracking applications, sensor actions are chosen based on the expected reduction in uncertainty also known as information gain. Random finite set (RFS) theory provides a formalism for quantifying and estimating information gain in multi-object t…
▽ More
Information-driven control can be used to develop intelligent sensors that can optimize their measurement value based on environmental feedback. In object tracking applications, sensor actions are chosen based on the expected reduction in uncertainty also known as information gain. Random finite set (RFS) theory provides a formalism for quantifying and estimating information gain in multi-object tracking problems. However, estimating information gain in these applications remains computationally challenging. This paper presents a new tractable approximation of the RFS expected information gain applicable to sensor control for multi-object search and tracking. Unlike existing RFS approaches, the information gain approximation presented in this paper considers the contributions of non-ideal noisy measurements, missed detections, false alarms, and object appearance/disappearance. The effectiveness of the information-driven sensor control is demonstrated through two multi-vehicle search-while-tracking experiments using real video data from remote terrestrial and satellite sensors.
△ Less
Submitted 11 July, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Spectral Efficiency Analysis of Cell-free Distributed Massive MIMO Systems with Imperfect Covariance Matrix
Authors:
Feng Ye,
Jiamin Li,
Pengcheng Zhu,
Dongming Wang,
Xiaohu You
Abstract:
In this paper, the impacts of imperfect channel covariance matrix on the spectral efficiency (SE) of cell-free distributed massive multiple-input multiple-output (MIMO) systems are analyzed. We propose to estimate the channel covariance matrix by alternately using the assigned pilots and their phase-shifted pilots in different coherent blocks, which improves the accuracy of channel estimation with…
▽ More
In this paper, the impacts of imperfect channel covariance matrix on the spectral efficiency (SE) of cell-free distributed massive multiple-input multiple-output (MIMO) systems are analyzed. We propose to estimate the channel covariance matrix by alternately using the assigned pilots and their phase-shifted pilots in different coherent blocks, which improves the accuracy of channel estimation with imperfect covariance matrix and reduces pilot overhead. Under this scheme, the closed-form expressions of SE with maximum ratio combination (MRC) and zero-forcing (ZF) receivers are derived, which enables us to select key parameters for better system performance. Simulation results verify the effectiveness of the proposed channel estimation method and the accuracy of the derived closed-form expressions. When more coherent blocks are used to estimate the covariance matrix, we can get better system performance. Moreover, some insightful conclusions are arrived at from the SE comparisons between different receiving schemes (ZF and MRC) and different pilot allocation schemes (orthogonal pilot and pilot reuse).
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
On Performance Loss of DOA Measurement Using Massive MIMO Receiver with Mixed-ADCs
Authors:
Baihua Shi,
Lingling Zhu,
Wenlong Cai,
Nuo Chen,
Tong Shen,
Pengcheng Zhu,
Feng Shu,
Jiangzhou Wang
Abstract:
High hardware cost and high power consumption of massive multiple-input and multiple output (MIMO) are two challenges for the future wireless communications including beyond fifth generation (B5G) and sixth generation (6G). Adopting the low-resolution analog-to-digital converter (ADC) is viewed as a promising solution. Additionally, the direction of arrival (DOA) estimation is an indispensable tec…
▽ More
High hardware cost and high power consumption of massive multiple-input and multiple output (MIMO) are two challenges for the future wireless communications including beyond fifth generation (B5G) and sixth generation (6G). Adopting the low-resolution analog-to-digital converter (ADC) is viewed as a promising solution. Additionally, the direction of arrival (DOA) estimation is an indispensable technology for beam alignment and tracking in massive MIMO systems. Thus, in this paper, the performance of DOA estimation with mixed-ADC structure is firstly investigated. The Cramer-Rao lower bound (CRLB) for this architecture is derived based on the additive quantization noise model. Eventually, a performance loss factor and the associated energy efficiency factor is defined for analysis in detail. Simulation results show that the mixed-ADC architecture can strike a good balance among performance loss, circuit cost and energy efficiency. More importantly, just a few bits (up to 4 bits) of low-resolution ADCs can achieve a satisfactory performance for DOA measurement.
△ Less
Submitted 18 January, 2022; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Privacy-preserving Channel Estimation in Cell-free Hybrid Massive MIMO Systems
Authors:
Jun Xu,
Xiaodong Wang,
Pengcheng Zhu,
Xiaohu You
Abstract:
We consider a cell-free hybrid massive multiple-input multiple-output (MIMO) system with $K$ users and $M$ access points (APs), each with $N_a$ antennas and $N_r< N_a$ radio frequency (RF) chains. When $K\ll M{N_a}$, efficient uplink channel estimation and data detection with reduced number of pilots can be performed based on low-rank matrix completion. However, such a scheme requires the central…
▽ More
We consider a cell-free hybrid massive multiple-input multiple-output (MIMO) system with $K$ users and $M$ access points (APs), each with $N_a$ antennas and $N_r< N_a$ radio frequency (RF) chains. When $K\ll M{N_a}$, efficient uplink channel estimation and data detection with reduced number of pilots can be performed based on low-rank matrix completion. However, such a scheme requires the central processing unit (CPU) to collect received signals from all APs, which may enable the CPU to infer the private information of user locations. We therefore develop and analyze privacy-preserving channel estimation schemes under the framework of differential privacy (DP). As the key ingredient of the channel estimator, two joint differentially private noisy matrix completion algorithms based respectively on Frank-Wolfe iteration and singular value decomposition are presented. We provide an analysis on the tradeoff between the privacy and the channel estimation error. In particular, we show that the estimation error can be mitigated while maintaining the same privacy level by increasing the payload size with fixed pilot size; and the scaling laws of both the privacy-induced and privacy-independent error components in terms of payload size are characterized. Simulation results are provided to further demonstrate the tradeoff between privacy and channel estimation performance.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
A Vision of Self-Evolving Network Management for Future Intelligent Vertical HetNet
Authors:
Tasneem Darwish,
Gunes Karabulut Kurt,
Halim Yanikomeroglu,
Gamini Senarath,
Peiying Zhu
Abstract:
Future integrated terrestrial-aerial-satellite networks will have to exhibit some unprecedented characteristics for the provision of both communications and computation services, and security for a tremendous number of devices with very broad and demanding requirements across multiple networks, operators, and ecosystems. Although 3GPP introduced the concept of self-organization networks (SONs) in…
▽ More
Future integrated terrestrial-aerial-satellite networks will have to exhibit some unprecedented characteristics for the provision of both communications and computation services, and security for a tremendous number of devices with very broad and demanding requirements across multiple networks, operators, and ecosystems. Although 3GPP introduced the concept of self-organization networks (SONs) in 4G and 5G documents to automate network management, even this progressive concept will face several challenges as it may not be sufficiently agile in co** with the immense levels of complexity, heterogeneity, and mobility in the envisioned beyond-5G integrated networks. In the presented vision, we discuss how future integrated networks can be intelligently and autonomously managed to efficiently utilize resources, reduce operational costs, and achieve the targeted Quality of Experience (QoE). We introduce the novel concept of "self-evolving networks (SENs)" framework, which utilizes artificial intelligence, enabled by machine learning (ML) algorithms, to make future integrated networks fully automated and intelligently evolve with respect to the provision, adaptation, optimization, and management aspects of networking, communications, computation, and infrastructure nodes' mobility. To envisage the concept of SEN in future integrated networks, we use the Intelligent Vertical Heterogeneous Network (I-VHetNet) architecture as our reference. The paper discusses five prominent scenarios where SEN plays the main role in providing automated network management. Numerical results provide an insight on how the SEN framework improves the performance of future integrated networks. The paper presents the leading enablers and examines the challenges associated with the application of SEN concept in future integrated networks.
△ Less
Submitted 9 March, 2021; v1 submitted 6 September, 2020;
originally announced September 2020.
-
High Altitude Platform Station based Super Macro Base Station (HAPS-SMBS) Constellations
Authors:
Md Sahabul Alam,
Gunes Karabulut Kurt,
Halim Yanikomeroglu,
Peiying Zhu,
Ngoc Dũng Đào
Abstract:
High altitude platform station (HAPS) systems have recently attracted renewed attention. While terrestrial and satellite technologies are well-established for providing connectivity services, they face certain shortcomings and challenges, which could be addressed by complementing them with HAPS systems. In this paper, we envision a HAPS as a super macro base station, which we refer to as HAPS-SMBS…
▽ More
High altitude platform station (HAPS) systems have recently attracted renewed attention. While terrestrial and satellite technologies are well-established for providing connectivity services, they face certain shortcomings and challenges, which could be addressed by complementing them with HAPS systems. In this paper, we envision a HAPS as a super macro base station, which we refer to as HAPS-SMBS, to provide connectivity in a plethora of applications. Unlike a conventional HAPS, which targets broad coverage for remote areas or disaster recovery, we envision next-generation HAPS-SMBS to have the necessary capabilities to address the high capacity, low latency, and computing requirements especially for highly populated metropolitan areas. This article focuses mainly on the potential opportunities, target use cases, and challenges that we expect to be associated with the design and implementation of the HAPS-SMBS based future wireless access architecture.
△ Less
Submitted 22 September, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Aerial Platforms with Reconfigurable Smart Surfaces for 5G and Beyond
Authors:
Safwan Alfattani,
Wael Jaafar,
Yassine Hmamouche,
Halim Yanikomeroglu,
Abbas Yongaçoglu,
Ng\d{o}c Dũng Đào,
Peiying Zhu
Abstract:
Aerial platforms are expected to deliver enhanced and seamless connectivity in the fifth generation (5G) wireless networks and beyond (B5G). This is generally achievable by supporting advanced onboard communication features embedded in heavy and energy-intensive equipment. Alternatively, reconfigurable smart surfaces (RSS), which smartly exploit/recycle signal reflections in the environment, are i…
▽ More
Aerial platforms are expected to deliver enhanced and seamless connectivity in the fifth generation (5G) wireless networks and beyond (B5G). This is generally achievable by supporting advanced onboard communication features embedded in heavy and energy-intensive equipment. Alternatively, reconfigurable smart surfaces (RSS), which smartly exploit/recycle signal reflections in the environment, are increasingly being recognized as a new wireless communication paradigm to improve communication links. In fact, their reduced cost, low power use, light weight, and flexible deployment make them an attractive candidate for integration with 5G/B5G technologies. In this article, we discuss comprehensive approaches to the integration of RSS in aerial platforms. First, we present a review of RSS technology, its operations and types of communication. Next, we describe how RSS can be used in aerial platforms, and we propose a control architecture workflow. Then, several potential use cases are presented and discussed. Finally, associated research challenges are identified.
△ Less
Submitted 4 November, 2020; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Deep Learning-based Modulation Detection for NOMA Systems
Authors:
Wenwu Xie,
Jian Xiao,
**xia Yang,
Xin Peng,
Chao Yu,
Peng Zhu
Abstract:
Since the signal with strong power should be demodulated first for successive interference cancellation (SIC) demodulation in non-orthogonal multiple access (NOMA) systems, the base station (BS) should inform the near user terminal (UT), which has allocated higher power, of modulation mode of the far user terminal. To avoid unnecessary signaling overhead in this process, a blind detection algorith…
▽ More
Since the signal with strong power should be demodulated first for successive interference cancellation (SIC) demodulation in non-orthogonal multiple access (NOMA) systems, the base station (BS) should inform the near user terminal (UT), which has allocated higher power, of modulation mode of the far user terminal. To avoid unnecessary signaling overhead in this process, a blind detection algorithm of NOMA signal modulation mode is designed in this paper. Taking the joint constellation density diagrams of NOMA signal as the detection features, deep residual network is built for classification, so as to detect the modulation mode of NOMA signal. In view of the fact that the joint constellation diagrams are easily polluted by high intensity noise and lose their real distribution pattern, the wavelet denoising method is adopted to improve the quality of constellations. The simulation results represent that the proposed algorithm can achieve satisfactory detection accuracy in NOMA systems. In addition, the factors affecting the recognition performance are also verified and analyzed.
△ Less
Submitted 16 October, 2020; v1 submitted 24 May, 2020;
originally announced May 2020.
-
Training Keyword Spotting Models on Non-IID Data with Federated Learning
Authors:
Andrew Hard,
Kurt Partridge,
Cameron Nguyen,
Niranjan Subrahmanya,
Aishanee Shah,
Pai Zhu,
Ignacio Lopez Moreno,
Rajiv Mathews
Abstract:
We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimizat…
▽ More
We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimization algorithms and hyperparameter configurations using large-scale federated simulations. To overcome resource constraints, we replace memory intensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%. Finally, to label examples (given the zero visibility into on-device data), we explore teacher-student training.
△ Less
Submitted 4 June, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning
Authors:
Yiming Sun,
Bing Cao,
Pengfei Zhu,
Qinghua Hu
Abstract:
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. It empowers smart city traffic management and disaster rescue. Researchers have made mount of efforts in this area and achieved considerable progress. Nevertheless, it is still a challenge when the objects are hard to distinguish, especially in low light conditions. To tackle this problem, we con…
▽ More
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. It empowers smart city traffic management and disaster rescue. Researchers have made mount of efforts in this area and achieved considerable progress. Nevertheless, it is still a challenge when the objects are hard to distinguish, especially in low light conditions. To tackle this problem, we construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night. Due to the great gap between RGB and infrared images, cross-modal images provide both effective information and redundant information. To address this dilemma, we further propose an uncertainty-aware cross-modality vehicle detection (UA-CMDet) framework to extract complementary information from cross-modal images, which can significantly improve the detection performance in low light conditions. An uncertainty-aware module (UAM) is designed to quantify the uncertainty weights of each modality, which is calculated by the cross-modal Intersection over Union (IoU) and the RGB illumination value. Furthermore, we design an illumination-aware cross-modal non-maximum suppression algorithm to better integrate the modal-specific information in the inference phase. Extensive experiments on the DroneVehicle dataset demonstrate the flexibility and effectiveness of the proposed method for crossmodality vehicle detection. The dataset can be download from https://github.com/VisDrone/DroneVehicle.
△ Less
Submitted 14 October, 2021; v1 submitted 5 March, 2020;
originally announced March 2020.
-
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Authors:
Peihao Zhu,
Rameen Abdal,
Yipeng Qin,
Peter Wonka
Abstract:
We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference im…
▽ More
We propose semantic region-adaptive normalization (SEAN), a simple but effective building block for Generative Adversarial Networks conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, we can build a network architecture that can control the style of each semantic region individually, e.g., we can specify one style reference image per region. SEAN is better suited to encode, transfer, and synthesize style than the best previous method in terms of reconstruction quality, variability, and visual quality. We evaluate SEAN on multiple datasets and report better quantitative metrics (e.g. FID, PSNR) than the current state of the art. SEAN also pushes the frontier of interactive image editing. We can interactively edit images by changing segmentation masks or the style for any given region. We can also interpolate styles from two reference images per region.
△ Less
Submitted 24 May, 2020; v1 submitted 28 November, 2019;
originally announced November 2019.
-
Power Minimization for Wireless Backhaul Based Ultra-Dense Cache-enabled C-RAN
Authors:
Jun Xu,
Pengcheng Zhu,
Jiamin Li,
Xiaohu You
Abstract:
This correspondence paper investigates joint design of small base station (SBS) clustering, multicast beamforming for access and backhaul links, as well as frequency allocation in backhaul transmission to minimize the total power consumption for wireless backhaul based ultra-dense cache-enabled cloud radio access network (C-RAN). To solve this nontrivial problem, we develop a low-complexity algori…
▽ More
This correspondence paper investigates joint design of small base station (SBS) clustering, multicast beamforming for access and backhaul links, as well as frequency allocation in backhaul transmission to minimize the total power consumption for wireless backhaul based ultra-dense cache-enabled cloud radio access network (C-RAN). To solve this nontrivial problem, we develop a low-complexity algorithm, which is a combination of smoothed ${\ell _0}{\text{-norm}}$ approximation and convex-concave procedure. Simulation results show that the proposed algorithm converges fast and greatly reduces the backhaul traffic.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Deep Learning Based Pilot Design for Multi-user Distributed Massive MIMO Systems
Authors:
Jun Xu,
Pengcheng Zhu,
Jiamin Li,
Xiaohu You
Abstract:
This letter proposes a deep learning based pilot design scheme to minimize the sum mean square error (MSE) of channel estimation for multi-user distributed massive multiple-input multiple-output (MIMO) systems. The pilot signal of each user is expressed as a weighted superposition of orthonormal pilot sequence basis, where the power assigned to each pilot sequence is the corresponding weight. A mu…
▽ More
This letter proposes a deep learning based pilot design scheme to minimize the sum mean square error (MSE) of channel estimation for multi-user distributed massive multiple-input multiple-output (MIMO) systems. The pilot signal of each user is expressed as a weighted superposition of orthonormal pilot sequence basis, where the power assigned to each pilot sequence is the corresponding weight. A multi-layer fully connected deep neural network (DNN) is designed to optimize the power allocated to each pilot sequence to minimize the sum MSE, which takes the channel large-scale fading coefficients as input and outputs the pilot power allocation vector. The loss function of the DNN is defined as the sum MSE, and we leverage the unsupervised learning strategy to train the DNN. Simulation results show that the proposed scheme achieves better sum MSE performance than other methods with low complexity.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.