-
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Authors:
Yiming Zhang,
Yicheng Gu,
Yanhong Zeng,
Zhening Xing,
Yuancheng Wang,
Zhizheng Wu,
Kai Chen
Abstract:
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations…
▽ More
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations, we propose FoleyCrafter, a novel framework that leverages a pre-trained text-to-audio model to ensure high-quality audio generation. FoleyCrafter comprises two key components: the semantic adapter for semantic alignment and the temporal controller for precise audio-video synchronization. The semantic adapter utilizes parallel cross-attention layers to condition audio generation on video features, producing realistic sound effects that are semantically relevant to the visual content. Meanwhile, the temporal controller incorporates an onset detector and a timestampbased adapter to achieve precise audio-video alignment. One notable advantage of FoleyCrafter is its compatibility with text prompts, enabling the use of text descriptions to achieve controllable and diverse video-to-audio generation according to user intents. We conduct extensive quantitative and qualitative experiments on standard benchmarks to verify the effectiveness of FoleyCrafter. Models and codes are available at https://github.com/open-mmlab/FoleyCrafter.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Multi-Functional Beamforming Design for Integrated Sensing, Communication, and Computation
Authors:
Yapeng Zhao,
Qingqing Wu,
Wen Chen,
Yong Zeng,
Ruiqi Liu,
Weidong Mei,
Fen Hou,
Shaodan Ma
Abstract:
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target,…
▽ More
Integrated sensing and communication (ISAC) systems may face a heavy computation burden since the sensory data needs to be further processed. This paper studies a novel system that integrates sensing, communication, and computation, aiming to provide services for different objectives efficiently. This system consists of a multi-antenna multi-functional base station (BS), an edge server, a target, and multiple singleantenna communication users. The BS needs to allocate the available resources to efficiently provide sensing, communication, and computation services. Due to the heavy service burden and limited power budget, the BS can partially offload the tasks to the nearby edge server instead of computing them locally. We consider the estimation of the target response matrix, a general problem in radar sensing, and utilize Cramer-Rao bound (CRB) as the corresponding performance metric. To tackle the non-convex optimization problem, we propose both semidefinite relaxation (SDR)-based alternating optimization and SDR-based successive convex approximation (SCA) algorithms to minimize the CRB of radar sensing while meeting the requirement of communication users and the need for task computing. Furthermore, we demonstrate that the optimal rankone solutions of both the alternating and SCA algorithms can be directly obtained via the solver or further constructed even when dealing with multiple functionalities. Simulation results show that the proposed algorithms can provide higher target estimation performance than state-of-the-art benchmarks while satisfying the communication and computation constraints.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation
Authors:
Bowei Yao,
Yi Zeng,
Haizhao Dai,
Qing Wu,
Youshen Xiao,
Fei Gao,
Yuyao Zhang,
**gyi Yu,
Xiran Cai
Abstract:
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving…
▽ More
Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving the image quality reconstructed from sparse data. Specially, the initial acoustic pressure distribution was modeled as a continuous function of spatial coordinates, and parameterized by a multi-layer perceptron. The weights of multi-layer perceptron were determined by training the network in self-supervised manner. And the total variation regularization term was used to offer the prior knowledge. We compared our result with some ablation studies, and the results show that out method outperforms existing methods on simulation and experimental data. Under the sparse sampling condition, our method can suppress the artifacts and avoid the ill-posed problem effectively, which reconstruct images with higher signal-to-noise ratio and contrast-to-noise ratio than traditional methods. The high-quality results for sparse data make the proposed method hold the potential for further decreasing the hardware cost of photoacoustic tomography system.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Sparse MIMO for ISAC: New Opportunities and Challenges
Authors:
Xinrui Li,
Hongqi Min,
Yong Zeng,
Shi **,
Linglong Dai,
Yifei Yuan,
Rui Zhang
Abstract:
Multiple-input multiple-output (MIMO) has been a key technology of wireless communications for decades. A typical MIMO system employs antenna arrays with the inter-antenna spacing being half of the signal wavelength, which we term as compact MIMO. Looking forward towards the future sixth-generation (6G) mobile communication networks, MIMO system will achieve even finer spatial resolution to not on…
▽ More
Multiple-input multiple-output (MIMO) has been a key technology of wireless communications for decades. A typical MIMO system employs antenna arrays with the inter-antenna spacing being half of the signal wavelength, which we term as compact MIMO. Looking forward towards the future sixth-generation (6G) mobile communication networks, MIMO system will achieve even finer spatial resolution to not only enhance the spectral efficiency of wireless communications, but also enable more accurate wireless sensing. To this end, by removing the restriction of half-wavelength antenna spacing, sparse MIMO has been proposed as a new architecture that is able to significantly enlarge the array aperture as compared to conventional compact MIMO with the same number of array elements. In addition, sparse MIMO leads to a new form of virtual MIMO systems for sensing with their virtual apertures considerably larger than physical apertures. As sparse MIMO is expected to be a viable technology for 6G, we provide in this article a comprehensive overview of it, especially focusing on its appealing advantages for integrated sensing and communication (ISAC) towards 6G. Specifically, assorted sparse MIMO architectures are first introduced, followed by their new benefits as well as challenges. We then discuss the main design issues of sparse MIMO, including beam pattern synthesis, signal processing, grating lobe suppression, beam codebook design, and array geometry optimization. Last, we provide numerical results to evaluate the performance of sparse MIMO for ISAC and point out promising directions for future research.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
Authors:
Baiang Li,
Sizhuo Ma,
Yanhong Zeng,
Xiaogang Xu,
Youqing Fang,
Zhao Zhang,
Jian Wang,
Kai Chen
Abstract:
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color map**, which enhances the visual representation by expanding the image's color range and adjusting the brightness…
▽ More
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color map**, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while kee** the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation
Authors:
Zhiqiang Xiao,
Xianda Liu,
Yong Zeng,
J. Andrew Zhang,
Shi **,
Rui Zhang
Abstract:
Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design…
▽ More
Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design. In this article, by leveraging the super spatial resolution of large antenna arrays and the multi-path spatial sparsity of highfrequency wireless channels, we introduce a new approach for waveform design based on the recently proposed delay-Doppler alignment modulation (DDAM). In particular, DDAM makes a paradigm shift of waveform design from the conventional manner of tolerating channel delay and Doppler spreads to actively manipulating them. First, we review the fundamental constraints and performance limitations of orthogonal frequency division multiplexing (OFDM) and introduce new opportunities for 6G waveform design. Next, the motivations and basic principles of DDAM are presented, followed by its various extensions to different wireless system setups. Finally, the main design considerations for DDAM are discussed and the new opportunities for future research are highlighted.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels
Authors:
Ye Zeng,
Li Qiao,
Zhen Gao,
Tong Qin,
Zhonghuai Wu,
Sheng Chen,
Mohsen Guizani
Abstract:
In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acqui…
▽ More
In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acquisition network (SWTCAN) to acquire downlink CSI, where pilot signals, downlink channel estimation, and uplink CSI feedback are jointly designed. Furthermore, to solve the problem of insufficient training data, we propose a variational auto-encoder-based channel sample generator (VAE-CSG), which can generate sufficient CSI samples based on a limited number of high-quality CSI data obtained from the current cell. The CSI dataset generated from VAE-CSG will be used for pre-training SWTCAN. To fine-tune the pre-trained SWTCAN for improved performance, we propose an online federated-tuning method, where only a small amount of SWTCAN parameters are unfrozen and updated using over-the-air computation, avoiding the high communication overhead caused by aggregating the complete CSI samples from user equipment (UEs) to the BS for centralized fine-tuning. Simulation results verify the advantages of the proposed SWTCAN and the communication efficiency of the proposed federated-tuning method.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Performance Analysis of Hybrid Cellular and Cell-free MIMO Network
Authors:
Zhuoyin Dai,
**gran Xu,
Xiaoli Xu,
Ruoguang Li,
Yong Zeng
Abstract:
Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio…
▽ More
Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communication network and collaborate with the existing cellular base stations (BSs). To further explore the performance limits of hybrid cellular and cell-free networks, this paper develops a hybrid network model based on stochastic geometric toolkits, which reveals the coupling of the signal and interference from both the cellular and cell-free networks. Specifically, the conjugate beamforming is applied in hybrid cellular and cell-free networks, which enables user equipment (UE) to benefit from both cellular BSs and cell-free APs. The aggregate signal received from the hybrid network is approximated via moment matching, and coverage probability is characterized by deriving the Laplace transform of the interference. The analysis of signal strength and coverage probability is verified by extensive simulations.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Environment-aware UAV Communications: CKM Construction and Predictive Beamforming
Authors:
Shiqi Zeng,
Xiaoli Xu,
Yong Zeng
Abstract:
Predictive millimeter-wave (mmWave) beamforming is a promising technique to enable low-latency and high-rate ground-air communications for cellular-connected unmanned aerial vehicles (UAVs). However, the high vulnerability of mmWave to blockages poses practical challenges to the implementation of such a technology. In this paper, we tackle the challenges by proposing a channel knowledge map (CKM)-…
▽ More
Predictive millimeter-wave (mmWave) beamforming is a promising technique to enable low-latency and high-rate ground-air communications for cellular-connected unmanned aerial vehicles (UAVs). However, the high vulnerability of mmWave to blockages poses practical challenges to the implementation of such a technology. In this paper, we tackle the challenges by proposing a channel knowledge map (CKM)-assisted predictive beamforming approach based on the echoed joint communication and sensing signal, whereby the line-of-sight (LoS) link identification is performed via hypothesis testing using prior information provided by CKM. Depending on the identification result, extended Kalman filtering (EKF) is adopted to reliably track the target UAV. Furthermore, if the non-line-of-sight (NLoS) state is identified, the target UAV will be immediately connected to a candidate base station (BS), namely a handover will be triggered to alleviate the communication outage. The simulation results show that the proposed method can significantly enhance the UAV tracking and mmWave communication performance compared to the benchmarking schemes without using CKM or LoS identification.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Little Pilot is Needed for Channel Estimation with Integrated Super-Resolution Sensing and Communication
Authors:
**gran Xu,
Huizhi Wang,
Yong Zeng,
Xiaoli Xu
Abstract:
Integrated super-resolution sensing and communication (ISSAC) is a promising technology to achieve extremely high sensing performance for critical parameters, such as the angles of the wireless channels. In this paper, we propose an ISSAC-based channel estimation method, which requires little or even no pilot, yet still achieves accurate channel state information (CSI) estimation. The key idea is…
▽ More
Integrated super-resolution sensing and communication (ISSAC) is a promising technology to achieve extremely high sensing performance for critical parameters, such as the angles of the wireless channels. In this paper, we propose an ISSAC-based channel estimation method, which requires little or even no pilot, yet still achieves accurate channel state information (CSI) estimation. The key idea is to exploit the fact that subspace-based super-resolution algorithms such as multiple signal classification (MUSIC) do not require a priori known pilots for accurate parameter estimation. Therefore, in the proposed method, the angles of the multi-path channel components are first estimated in a pilot-free manner while communication data symbols are sent. After that, the multi-path channel coefficients are estimated, where very little pilots are needed. The reasons are two folds. First, compared to the conventional channel estimation methods purely relying on channel training, much fewer parameters need to be estimated once the multi-path angles are accurately estimated. Besides, with angles obtained, the beamforming gain is also enjoyed when pilots are sent to estimate the channel path gains. To rigorously study the performance of the proposed method, we first consider the basic line-of-sight (LoS) channel. By analyzing the minimum mean square error (MMSE) of channel estimation and the resulting beamforming gains, we show that our proposed method significantly outperforms the conventional methods purely based on channel training. We then extend the study to the more general multipath channels. Simulation results are provided to demonstrate our theoretical results.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Fractional Delay Alignment Modulation for Spatially Sparse Wireless Communications
Authors:
Zhiwen Zhou,
Zhiqiang Xiao,
Yong Zeng
Abstract:
Delay alignment modulation (DAM) is a novel transmission technique for wireless systems with high spatial resolution by leveraging delay compensation and path-based beamforming, to mitigate the inter-symbol interference (ISI) without resorting to complex channel equalization or multi-carrier transmission. However, most existing studies on DAM consider a simplified scenario by assuming that the cha…
▽ More
Delay alignment modulation (DAM) is a novel transmission technique for wireless systems with high spatial resolution by leveraging delay compensation and path-based beamforming, to mitigate the inter-symbol interference (ISI) without resorting to complex channel equalization or multi-carrier transmission. However, most existing studies on DAM consider a simplified scenario by assuming that the channel multi-path delays are integer multiples of the signal sampling interval. This paper investigates DAM for the more general and practical scenarios with fractional multi-path delays. We first analyze the impact of fractional multi-path delays on the existing DAM design, termed integer DAM (iDAM), which can only achieve delay compensations that are integer multiples of the sampling interval. It is revealed that the existence of fractional multi-path delays renders iDAM no longer possible to achieve perfect delay alignment. To address this issue, we propose a more generic DAM design called fractional DAM (fDAM), which achieves fractional delay pre-compensation via upsampling and fractional delay filtering. By leveraging the Farrow filter structure, the proposed approach can eliminate ISI without real-time computation of filter coefficients, as typically required in traditional channel equalization techniques. Simulation results demonstrate that the proposed fDAM outperforms the existing iDAM and orthogonal frequency division multiplexing (OFDM) in terms of symbol error rate (SER) and spectral efficiency, while maintaining a comparable peak-to-average power ratio (PAPR) as iDAM, which is considerably lower than OFDM.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Prototy** and Experimental Results for Environment-Aware Millimeter Wave Beam Alignment via Channel Knowledge Map
Authors:
Zhuoyin Dai,
Di Wu,
Zhenjun Dong,
Kun Li,
Dingyang Ding,
Sihan Wang,
Yong Zeng
Abstract:
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, te…
▽ More
Channel knowledge map (CKM), which aims to directly reflect the intrinsic channel properties of the local wireless environment, is a novel technique for achieving environmentaware communication. In this paper, to alleviate the large training overhead in millimeter wave (mmWave) beam alignment, an environment-aware and training-free beam alignment prototype is established based on a typical CKM, termed beam index map (BIM). To this end, a general CKM construction method is first presented, and an indoor BIM is constructed offline to learn the candidate transmit and receive beam index pairs for each grid in the experimental area. Furthermore, based on the location information of the receiver (or the dynamic obstacles) from the ultra-wide band (UWB) positioning system, the established BIM is used to achieve training-free beam alignment by directly providing the beam indexes for the transmitter and receiver. Three typical scenarios are considered in the experiment, including quasi-static environment with line-of-sight (LoS) link, quasistatic environment without LoS link and dynamic environment. Besides, the receiver orientation measured from the gyroscope is also used to help CKM predict more accurate beam indexes. The experiment results show that compared with the benchmark location-based beam alignment strategy, the CKM-based beam alignment strategy can achieve much higher received power, which is close to that achieved by exhaustive beam search, but with significantly reduced training overhead.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Multiple Latent Space Map** for Compressed Dark Image Enhancement
Authors:
Yi Zeng,
Zhengning Wang,
Yuxuan Liu,
Tianjiao Zeng,
Xuhang Liu,
Xinglong Luo,
Shuaicheng Liu,
Shuyuan Zhu,
Bing Zeng
Abstract:
Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark…
▽ More
Dark image enhancement aims at converting dark images to normal-light images. Existing dark image enhancement methods take uncompressed dark images as inputs and achieve great performance. However, in practice, dark images are often compressed before storage or transmission over the Internet. Current methods get poor performance when processing compressed dark images. Artifacts hidden in the dark regions are amplified by current methods, which results in uncomfortable visual effects for observers. Based on this observation, this study aims at enhancing compressed dark images while avoiding compression artifacts amplification. Since texture details intertwine with compression artifacts in compressed dark images, detail enhancement and blocking artifacts suppression contradict each other in image space. Therefore, we handle the task in latent space. To this end, we propose a novel latent map** network based on variational auto-encoder (VAE). Firstly, different from previous VAE-based methods with single-resolution features only, we exploit multiple latent spaces with multi-resolution features, to reduce the detail blur and improve image fidelity. Specifically, we train two multi-level VAEs to project compressed dark images and normal-light images into their latent spaces respectively. Secondly, we leverage a latent map** network to transform features from compressed dark space to normal-light space. Specifically, since the degradation models of darkness and compression are different from each other, the latent map** process is divided map** into enlightening branch and deblocking branch. Comprehensive experiments demonstrate that the proposed method achieves state-of-the-art performance in compressed dark image enhancement.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Can Channels be Fully Inferred Between Two Antenna Panels?
Authors:
Y. Qiu,
D. W,
Y. Zeng
Abstract:
This letter considers a two-panel massive multiple-input multiple-output (MIMO) communication system, where the base station (BS) is equipped with two antenna panels that may use different frequency bands for communication. By exploiting the geometric relationships between antenna panels, efficient channel inference methods across antenna panels are proposed to reduce the overhead of real-time cha…
▽ More
This letter considers a two-panel massive multiple-input multiple-output (MIMO) communication system, where the base station (BS) is equipped with two antenna panels that may use different frequency bands for communication. By exploiting the geometric relationships between antenna panels, efficient channel inference methods across antenna panels are proposed to reduce the overhead of real-time channel estimation. Four scenarios are considered, namely far-field free-space, near-field free-space, multi-path sharing far-field scatterers, and multi-path sharing near-field scatterers. For both far-field and near-field free-space scenarios, we show that the channel of one panel can be fully inferred from that of the other panel, as long as the multi-path components (MPCs) composing the channel can be resolved. On the other hand, for the multi-path scenarios sharing far-field or near-field scatterers, only the angles or range of angles of the MPCs can be inferred, respectively. Simulation results based on commercial 3D ray-tracing software are presented to validate our developed channel inference techniques.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
DMCE: Diffusion Model Channel Enhancer for Multi-User Semantic Communication Systems
Authors:
Youcheng Zeng,
Xinxin He,
Xu Chen,
Haonan Tong,
Zhaohui Yang,
Yijun Guo,
Jianjun Hao
Abstract:
To achieve continuous massive data transmission with significantly reduced data payload, the users can adopt semantic communication techniques to compress the redundant information by transmitting semantic features instead. However, current works on semantic communication mainly focus on high compression ratio, neglecting the wireless channel effects including dynamic distortion and multi-user int…
▽ More
To achieve continuous massive data transmission with significantly reduced data payload, the users can adopt semantic communication techniques to compress the redundant information by transmitting semantic features instead. However, current works on semantic communication mainly focus on high compression ratio, neglecting the wireless channel effects including dynamic distortion and multi-user interference, which significantly limit the fidelity of semantic communication. To address this, this paper proposes a diffusion model (DM)-based channel enhancer (DMCE) for improving the performance of multi-user semantic communication, with the DM learning the particular data distribution of channel effects on the transmitted semantic features. In the considered system model, multiple users (such as road cameras) transmit semantic features of multi-source data to a receiver by applying the joint source-channel coding (JSCC) techniques, and the receiver fuses the semantic features from multiple users to complete specific tasks. Then, we propose DMCE to enhance the channel state information (CSI) estimation for improving the restoration of the received semantic features. Finally, the fusion results at the receiver are significantly enhanced, demonstrating a robust performance even under low signal-to-noise ratio (SNR) regimes, enabling the generation of effective object segmentation images. Extensive simulation results with a traffic scenario dataset show that the proposed scheme can improve the mean Intersection over Union (mIoU) by more than 25\% at low SNR regimes, compared with the benchmark schemes.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
A Novel Geometric Solution for Moving Target Localization through Multistatic Sensing in the ISAC System
Authors:
S. Zhuge,
Y. Ma,
Z. Lin,
Y. Zeng
Abstract:
This paper proposes a novel geometric solution for tracking a moving target through multistatic sensing. In contrast to existing two-step weighted least square (2SWLS) methods which use the bistatic range (BR) and bistatic range rate (BRR) measurements, the proposed method incorporates an additional direction of arrival (DOA) measurement of the target obtained from a communication receiver in an i…
▽ More
This paper proposes a novel geometric solution for tracking a moving target through multistatic sensing. In contrast to existing two-step weighted least square (2SWLS) methods which use the bistatic range (BR) and bistatic range rate (BRR) measurements, the proposed method incorporates an additional direction of arrival (DOA) measurement of the target obtained from a communication receiver in an integrated sensing and communication (ISAC) system. Unlike the existing 2SWLS methods that require at least three transmitter-receiver (TX-RX) pairs to operate, the proposed algorithm can conduct location estimation with a single TX-RX pair and velocity estimation with two TX-RX pairs. Simulations reveal that the proposed method exhibits superior performance compared to existing 2SWLS methods, particularly when dealing with moderate levels of noise in DOA measurements.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Integrated Sensing, Communication, and Powering (ISCAP): Towards Multi-functional 6G Wireless Networks
Authors:
Yilong Chen,
Zixiang Ren,
Jie Xu,
Yong Zeng,
Derrick Wing Kwan Ng,
Shuguang Cui
Abstract:
This article presents a novel multi-functional system for a sixth-generation (6G) wireless network with integrated sensing, communication, and powering (ISCAP), which unifies integrated sensing and communication (ISAC) and wireless information and power transfer (WIPT) techniques. The multi-functional ISCAP network promises to enhance resource utilization efficiency, reduce network costs, and impr…
▽ More
This article presents a novel multi-functional system for a sixth-generation (6G) wireless network with integrated sensing, communication, and powering (ISCAP), which unifies integrated sensing and communication (ISAC) and wireless information and power transfer (WIPT) techniques. The multi-functional ISCAP network promises to enhance resource utilization efficiency, reduce network costs, and improve overall performance through versatile operational modes. Specifically, a multi-functional base station (BS) can enable multi-functional transmission, by exploiting the same radio signals to perform target/environment sensing, wireless communication, and wireless power transfer (WPT), simultaneously. Besides, the three functions can be intelligently coordinated to pursue mutual benefits,i.e., wireless sensing can be leveraged to enable light-training or even training-free WIPT by providing side-channel information, and the BS can utilize WPT to wirelessly charge low-power devices for ensuring sustainable ISAC. Furthermore, multiple multi-functional BSs can cooperate in both transmission and reception phases for efficient interference management, multi-static sensing, and distributed energy beamforming. For these operational modes, we discuss the technical challenges and potential solutions, particularly focusing on the fundamental performance tradeoff limits, transmission protocol design, as well as waveform and beamforming optimization. Finally, interesting research directions are identified.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Coordinated Active-Reactive Power Management of ReP2H Systems with Multiple Electrolyzers
Authors:
Yangjun Zeng,
Buxiang Zhou,
Jie Zhu,
Jiarong Li,
Bosen Yang,
** Lin,
Yiwei Qiu
Abstract:
Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Impr…
▽ More
Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Improper scheduling may result in excessive reactive power demand, causing voltage violations and increased network losses, compromising safety and economy. To address these challenges, this paper first explores trade-offs between the efficiency and the reactive load of the electrolyzers. Subsequently, we propose a coordinated approach for scheduling the active and reactive power in the ReP2H system. A mixed-integer second-order cone programming (MISOCP) is established to jointly optimize active and reactive power by coordinating the ELZs, renewable energy sources, energy storage (ES), and var compensations. Case studies demonstrate that the proposed method reduces losses by 3.06% in an off-grid ReP2H system while increasing hydrogen production by 5.27% in average.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
How Much Data is Needed for Channel Knowledge Map Construction?
Authors:
Xiaoli Xu,
Yong Zeng
Abstract:
Channel knowledge map (CKM) has been recently proposed to enable environment-aware communications by utilizing historical or simulation generated wireless channel data. This paper studies the construction of one particular type of CKM, namely channel gain map (CGM), by using a finite number of measurements or simulation-generated data, with model-based spatial channel prediction. We try to answer…
▽ More
Channel knowledge map (CKM) has been recently proposed to enable environment-aware communications by utilizing historical or simulation generated wireless channel data. This paper studies the construction of one particular type of CKM, namely channel gain map (CGM), by using a finite number of measurements or simulation-generated data, with model-based spatial channel prediction. We try to answer the following question: How much data is sufficient for CKM construction? To this end, we first derive the average mean square error (AMSE) of the channel gain prediction as a function of the sample density of data collection for offline CGM construction, as well as the number of data points used for online spatial channel gain prediction. To model the spatial variation of the wireless environment even within each cell, we divide the CGM into subregions and estimate the channel parameters from the local data within each subregion. The parameter estimation error and the channel prediction error based on estimated channel parameters are derived as functions of the number of data points within the subregion. The analytical results provide useful guide for CGM construction and utilization by determining the required spatial sample density for offline data collection and number of data points to be used for online channel prediction, so that the desired level of channel prediction accuracy is guaranteed.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Active Reconfigurable Intelligent Surface Enhanced Spectrum Sensing for Cognitive Radio Networks
Authors:
Jungang Ge,
Ying-Chang Liang,
Sumei Sun,
Yonghong Zeng,
Zhidong Bai
Abstract:
In opportunistic cognitive radio networks, when the primary signal is very weak compared to the background noise, the secondary user requires long sensing time to achieve a reliable spectrum sensing performance, leading to little remaining time for the secondary transmission. To tackle this issue, we propose an active reconfigurable intelligent surface (RIS) assisted spectrum sensing system, where…
▽ More
In opportunistic cognitive radio networks, when the primary signal is very weak compared to the background noise, the secondary user requires long sensing time to achieve a reliable spectrum sensing performance, leading to little remaining time for the secondary transmission. To tackle this issue, we propose an active reconfigurable intelligent surface (RIS) assisted spectrum sensing system, where the received signal strength from the interested primary user can be enhanced and underlying interference within the background noise can be mitigated as well. In comparison with the passive RIS, the active RIS can not only adapt the phase shift of each reflecting element but also amplify the incident signals. Notably, we study the reflecting coefficient matrix (RCM) optimization problem to improve the detection probability given a maximum tolerable false alarm probability and limited sensing time. Then, we show that the formulated problem can be equivalently transformed to a weighted mean square error minimization problem using the principle of the well-known weighted minimum mean square error (WMMSE) algorithm, and an iterative optimization approach is proposed to obtain the optimal RCM. In addition, to fairly compare passive RIS and active RIS, we study the required power budget of the RIS to achieve a target detection probability under a special case where the direct links are neglected and the RIS-related channels are line-of-sight. Via extensive simulations, the effectiveness of the WMMSE-based RCM optimization approach is demonstrated. Furthermore, the results reveal that the active RIS can outperform the passive RIS when the underlying interference within the background noise is relatively weak, whereas the passive RIS performs better in strong interference scenarios because the same power budget can support a vast number of passive reflecting elements for interference mitigation.
△ Less
Submitted 26 April, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
An Event-Based Synchronization Framework for Controller Hardware-in-the-loop Simulation of Electric Railway Power Electronics Systems
Authors:
Jialin Zheng,
Yangbin Zeng,
Han Xu,
Weicheng Liu,
Di Mou,
Zhengming Zhao
Abstract:
The Controller Hardware_in_the_loop (CHIL) simulation is gaining popularity as a cost_effective, efficient, and reliable tool in the design and development process of fast_growing electrified transportation power converters. However, it is challenging to implement the conventional CHIL simulations on the railway power converters with complex topologies and high switching frequencies due to strict…
▽ More
The Controller Hardware_in_the_loop (CHIL) simulation is gaining popularity as a cost_effective, efficient, and reliable tool in the design and development process of fast_growing electrified transportation power converters. However, it is challenging to implement the conventional CHIL simulations on the railway power converters with complex topologies and high switching frequencies due to strict real_time constraints. Therefore, this paper proposes an event-based synchronization CHIL (ES_CHIL) framework for high_fidelity simulation of these electrified railway power converters. Different from conventional CHIL simulations synchronized through the time axis, the ES_CHIL framework is synchronized through the event axis. Therefore, it can ease the real_time constraint and broaden the upper bound on the system size and switching frequency. Besides, models and algorithms with higher accuracy, such as the diode model with natural commutation processes, can be used in the ES-CHIL framework. The proposed framework is validated for a 350 kW wireless power transformer system containing 24 fully controlled devices and 36 diodes by comparing it with Simulink and physical experiments. This research improves the fidelity and application range of the power converters CHIL simulation. Thus, it helps to accelerate the prototype design and performance evaluation process for electrified railways and other applications with such complex converters.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Accurate Time-segmented Loss Model for SiC MOSFETs in Electro-thermal Multi-Rate Simulation
Authors:
Jialin Zheng,
Zhengming Zhao,
Han Xu,
Weicheng Liu,
Yangbin Zeng
Abstract:
Compared with silicon (Si) power devices, Silicon carbide (SiC) devices have the advantages of fast switching speed and low on-resistance. However, the effects of non-ideal characteristics of SiC MOSFETs and stray parameters (especially parasitic inductance) on switching losses need to be further evaluated. In this paper, a transient loss model based on SiC MOSFET and SiC Schottky barrier diode (S…
▽ More
Compared with silicon (Si) power devices, Silicon carbide (SiC) devices have the advantages of fast switching speed and low on-resistance. However, the effects of non-ideal characteristics of SiC MOSFETs and stray parameters (especially parasitic inductance) on switching losses need to be further evaluated. In this paper, a transient loss model based on SiC MOSFET and SiC Schottky barrier diode (SBD) switching pairs is proposed. The transient process analysis is simplified by time segmentation of the transient process of power switching devices. The electro-thermal simulation calculates the junction temperature and updates the temperature-related parameters with the proposed loss model and the thermal network model. A multi-rate data exchange strategy is proposed to solve the problem of disparity in timescales between circuit simulation and thermal network simulation. The CREE CMF20120D SiC MOSFET device is used for the experimental verification. The experimental results verify the accuracy of the model which provides guidance for the circuit design of SiC MOSFETs. All the parameters of the loss model can be extracted from the datasheet, which is practical in power electronics design.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
FPGA-Based Implicit-Explicit Real-time Simulation Solver for Railway Wireless Power Transfer with Nonlinear Magnetic Coupling Components
Authors:
Han Xu,
Yangbin Zeng,
Jialin Zheng,
Kainan Chen,
Weicheng Liu,
Zhengming Zhao
Abstract:
Railway Wireless Power Transfer (WPT) is a promising non-contact power supply solution, but constructing prototypes for controller testing can be both costly and unsafe. Real-time hardware-in-the-loop simulation is an effective and secure testing tool, but simulating the dynamic charging process of railway WPT systems is challenging due to the continuous changes in the nonlinear magnetic coupling…
▽ More
Railway Wireless Power Transfer (WPT) is a promising non-contact power supply solution, but constructing prototypes for controller testing can be both costly and unsafe. Real-time hardware-in-the-loop simulation is an effective and secure testing tool, but simulating the dynamic charging process of railway WPT systems is challenging due to the continuous changes in the nonlinear magnetic coupling components. To address this challenge, we propose an FPGA-based half-step implicit-explicit (IMEX) simulation solver. The proposed solver adopts an IMEX algorithm to solve the piecewise linear and nonlinear parts of the system separately, which enables FPGAs to solve nonlinear components while achieving high numerical stability. Additionally, we divide a complete integration step into two half-steps to reduce computational time delays. Our proposed method offers a promising solution for the real-time simulation of railway WPT systems. The novelty of our approach lies in the use of the IMEX algorithm and the half-step integration method, which significantly improves the accuracy and efficiency of the simulation. Our simulations and experiments demonstrate the effectiveness and accuracy of the proposed solver, which provides a new approach for simulating and optimizing railway WPT systems with nonlinear magnetic coupling components.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Integrated Sensing and Channel Estimation by Exploiting Dual Timescales for Delay-Doppler Alignment Modulation
Authors:
Zhiqiang Xiao,
Yong Zeng,
Fuxi Wen,
Zaichen Zhang,
Derrick Wing Kwan Ng
Abstract:
For integrated sensing and communication (ISAC) systems, the channel information essential for communication and sensing tasks fluctuates across different timescales. Specifically, wireless sensing primarily focuses on acquiring path state information (PSI) (e.g., delay, angle, and Doppler) of individual multi-path components to sense the environment, which usually evolves much more slowly than th…
▽ More
For integrated sensing and communication (ISAC) systems, the channel information essential for communication and sensing tasks fluctuates across different timescales. Specifically, wireless sensing primarily focuses on acquiring path state information (PSI) (e.g., delay, angle, and Doppler) of individual multi-path components to sense the environment, which usually evolves much more slowly than the composite channel state information (CSI) required for communications. Typically, the CSI is approximately unchanged during the channel coherence time, which characterizes the statistical properties of wireless communication channels. However, this concept is less appropriate for describing that for wireless sensing. To this end, in this paper, we introduce a new timescale to study the variation of the PSI from a channel geometric perspective, termed path invariant time, during which the PSI largely remains constant. Our analysis indicates that the path invariant time considerably exceeds the channel coherence time. Thus, capitalizing on these dual timescales of the wireless channel, in this paper, we propose a novel ISAC framework exploiting the recently proposed delay-Doppler alignment modulation (DDAM) technique. Different from most existing studies on DDAM that assume the availability of perfect PSI, in this work, we propose a novel algorithm, termed as adaptive simultaneously orthogonal matching pursuit with support refinement (ASOMP-SR), for joint environment sensing and PSI estimation. We also analyze the performance of DDAM with imperfectly sensed PSI.Simulation results unveil that the proposed DDAM-based ISAC can achieve superior spectral efficiency and a reduced peak-to-average power ratio (PAPR) compared to standard orthogonal frequency division multiplexing (OFDM).
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
A Tutorial on Near-Field XL-MIMO Communications Towards 6G
Authors:
Haiquan Lu,
Yong Zeng,
Changsheng You,
Yu Han,
Jiayi Zhang,
Zhe Wang,
Zhenjun Dong,
Shi **,
Cheng-Xiang Wang,
Tao Jiang,
Xiaohu You,
Rui Zhang
Abstract:
Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The…
▽ More
Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The evolution from massive MIMO to XL-MIMO is not simply an increase in the array size, but faces new design challenges, in terms of near-field channel modelling, performance analysis, channel estimation, and practical implementation. In this article, we give a comprehensive tutorial overview on near-field XL-MIMO communications, aiming to provide useful guidance for tackling the above challenges. First, the basic near-field modelling for XL-MIMO is established, by considering the new characteristics of non-uniform spherical wave (NUSW) and spatial non-stationarity. Next, based on the near-field modelling, the performance analysis of XL-MIMO is presented, including the near-field signal-to-noise ratio (SNR) scaling laws, beam focusing pattern, achievable rate, and degrees-of-freedom (DoF). Furthermore, various XL-MIMO design issues such as near-field beam codebook, beam training, channel estimation, and delay alignment modulation (DAM) transmission are elaborated. Finally, we point out promising directions to inspire future research on near-field XL-MIMO communications.
△ Less
Submitted 3 April, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms
Authors:
Joseph Konan,
Ojas Bhargave,
Shikhar Agnihotri,
Shuo Han,
Yunyang Zeng,
Ankit Shah,
Bhiksha Raj
Abstract:
Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured ex…
▽ More
Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured examination tailored to various denoising settings and receiver interfaces. A methodological novelty is introduced via the Oaxaca decomposition, traditionally an econometric tool, repurposed herein to analyze acoustic-phonetic perturbations within VoIP systems. To further ground the implications of these transformations, psychoacoustic metrics, specifically PESQ and STOI, were harnessed to furnish a comprehensive understanding of speech alterations. Cumulatively, the insights garnered underscore the intricate landscape of VoIP-influenced acoustic dynamics. In addition to the primary findings, a multitude of metrics are reported, extending the research purview. Moreover, out-of-domain benchmarking for both time and time-frequency domain speech enhancement models is included, thereby enhancing the depth and applicability of this inquiry. Repository: github.com/deepology/VoIP-DNS-Challenge
△ Less
Submitted 21 November, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
A Tutorial on Environment-Aware Communications via Channel Knowledge Map for 6G
Authors:
Yong Zeng,
Junting Chen,
Jie Xu,
Di Wu,
Xiaoli Xu,
Shi **,
Xiqi Gao,
David Gesbert,
Shuguang Cui,
Rui Zhang
Abstract:
Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large antenna size, wide bandwidth, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links beco…
▽ More
Sixth-generation (6G) mobile communication networks are expected to have dense infrastructures, large antenna size, wide bandwidth, cost-effective hardware, diversified positioning methods, and enhanced intelligence. Such trends bring both new challenges and opportunities for the practical design of 6G. On one hand, acquiring channel state information (CSI) in real time for all wireless links becomes quite challenging in 6G. On the other hand, there would be numerous data sources in 6G containing high-quality location-tagged channel data, e.g., the estimated channels or beams between base station (BS) and user equipment (UE), making it possible to better learn the local wireless environment. By exploiting this new opportunity and for tackling the CSI acquisition challenge, there is a promising paradigm shift from the conventional environment-unaware communications to the new environment-aware communications based on the novel approach of channel knowledge map (CKM). This article aims to provide a comprehensive overview on environment-aware communications enabled by CKM to fully harness its benefits for 6G. First, the basic concept of CKM is presented, followed by the comparison of CKM with various existing channel inference techniques. Next, the main techniques for CKM construction are discussed, including both environment model-free and environment model-assisted approaches. Furthermore, a general framework is presented for the utilization of CKM to achieve environment-aware communications, followed by some typical CKM-aided communication scenarios. Finally, important open problems in CKM research are highlighted and potential solutions are discussed to inspire future work.
△ Less
Submitted 6 February, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Delay-Doppler Alignment Modulation for Spatially Sparse Massive MIMO Communication
Authors:
Haiquan Lu,
Yong Zeng
Abstract:
Delay alignment modulation (DAM) is an emerging technique for achieving inter-symbol interference (ISI)-free wideband communications using spatial-delay processing, without relying on channel equalization or multi-carrier transmission. However, existing works on DAM only consider multiple-input single-output (MISO) communication systems and assume time-invariant channels. In this paper, by extendi…
▽ More
Delay alignment modulation (DAM) is an emerging technique for achieving inter-symbol interference (ISI)-free wideband communications using spatial-delay processing, without relying on channel equalization or multi-carrier transmission. However, existing works on DAM only consider multiple-input single-output (MISO) communication systems and assume time-invariant channels. In this paper, by extending DAM to time-variant frequency-selective multiple-input multiple-output (MIMO) channels, we propose a novel technique termed \emph{delay-Doppler alignment modulation} (DDAM). Specifically, by leveraging \emph{delay-Doppler compensation} and \emph{path-based beamforming}, the Doppler effect of each multi-path can be eliminated and all multi-path signal components may reach the receiver concurrently and constructively. We first show that by applying path-based zero-forcing (ZF) precoding and receive combining, DDAM can transform the original time-variant frequency-selective channels into time-invariant ISI-free channels. The necessary and/or sufficient conditions to achieve such a transformation are derived. Then an asymptotic analysis is provided by showing that when the number of base station (BS) antennas is much larger than that of channel paths, DDAM enables time-invariant ISI-free channels with the simple delay-Doppler compensation and path-based maximal-ratio transmission (MRT) beamforming. Furthermore, for the general DDAM design with some tolerable ISI, the path-based transmit precoding and receive combining matrices are optimized to maximize the spectral efficiency. Numerical results are provided to compare the proposed DDAM technique with various benchmarking schemes, including MIMO-orthogonal time frequency space (OTFS), MIMO-orthogonal frequency-division multiplexing (OFDM) without or with carrier frequency offset (CFO) compensation, and beam alignment along the dominant path.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Achievable Rate Region and Path-Based Beamforming for Multi-User Single-Carrier Delay Alignment Modulation
Authors:
Xingwei Wang,
Haiquan Lu,
Yong Zeng,
Xiaoli Xu,
Jie Xu
Abstract:
Delay alignment modulation (DAM) is a novel wideband transmission technique for mmWave massive MIMO systems, which exploits the high spatial resolution and multi-path sparsity to mitigate ISI, without relying on channel equalization or multi-carrier transmission. In particular, DAM leverages the delay pre-compensation and path-based beamforming to effectively align the multi-path components, thus…
▽ More
Delay alignment modulation (DAM) is a novel wideband transmission technique for mmWave massive MIMO systems, which exploits the high spatial resolution and multi-path sparsity to mitigate ISI, without relying on channel equalization or multi-carrier transmission. In particular, DAM leverages the delay pre-compensation and path-based beamforming to effectively align the multi-path components, thus achieving the constructive multi-path combination for eliminating the ISI while preserving the multi-path power gain. Different from the existing works only considering single-user DAM, this paper investigates the DAM technique for multi-user mmWave massive MIMO communication. First, we consider the asymptotic regime when the number of antennas Mt at BS is sufficiently large. It is shown that by employing the simple delay pre-compensation and per-path-based MRT beamforming, the single-carrier DAM is able to perfectly eliminate both ISI and IUI. Next, we consider the general scenario with Mt being finite. In this scenario, we characterize the achievable rate region of the multi-user DAM system by finding its Pareto boundary. Specifically, we formulate a rate-profile-constrained sum rate maximization problem by optimizing the per-path-based beamforming. Furthermore, we present three low-complexity per-path-based beamforming strategies based on the MRT, zero-forcing, and regularized zero-forcing principles, respectively, based on which the achievable sum rates are studied. Finally, we provide simulation results to demonstrate the performance of our proposed strategies as compared to two benchmark schemes based on the strongest-path-based beamforming and the prevalent OFDM, respectively. It is shown that DAM achieves higher spectral efficiency and/or lower peak-to-average-ratio, for systems with high spatial resolution and multi-path diversity.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Multi-User Modular XL-MIMO Communications: Near-Field Beam Focusing Pattern and User Grou**
Authors:
Xinrui Li,
Zhenjun Dong,
Yong Zeng,
Shi **,
Rui Zhang
Abstract:
In this paper, we investigate multi-user modular extremely large-scale multiple-input multiple-output (XL-MIMO) communication systems, where modular extremely large-scale uniform linear array (XL-ULA) is deployed at the base station (BS) to serve multiple single-antenna users. By exploiting the unique modular array architecture and considering the potential near-field propagation, we develop sub-a…
▽ More
In this paper, we investigate multi-user modular extremely large-scale multiple-input multiple-output (XL-MIMO) communication systems, where modular extremely large-scale uniform linear array (XL-ULA) is deployed at the base station (BS) to serve multiple single-antenna users. By exploiting the unique modular array architecture and considering the potential near-field propagation, we develop sub-array based uniform spherical wave (USW) models for distinct versus common angles of arrival/departure (AoAs/AoDs) with respect to different sub-arrays/modules, respectively. Under such USW models, we analyze the beam focusing patterns at the near-field observation location by using near-field beamforming. The analysis reveals that compared to the conventional XL-MIMO with collocated antenna elements, modular XL-MIMO can provide better spatial resolution by benefiting from its larger array aperture. However, it also incurs undesired grating lobes due to the large inter-module separation. Moreover, it is found that for multi-user modular XL-MIMO communications, the achievable signal-to-interference-plus-noise ratio (SINR) for users may be degraded by the grating lobes of the beam focusing pattern. To address this issue, an efficient user grou** method is proposed for multi-user transmission scheduling, so that users located within the grating lobes of each other are not allocated to the same time-frequency resource block (RB) for their communications. Numerical results are presented to verify the effectiveness of the proposed user grou** method, as well as the superior performance of modular XL-MIMO over its collocated counterpart with densely distributed users.
△ Less
Submitted 22 August, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Delay Alignment Modulation with Hybrid beamforming for Spatially Sparse Communications
Authors:
Jieni Zhang,
Yong Zeng
Abstract:
For millimeter wave (mmWave) or Terahertz (THz) communications, by leveraging the high spatial resolution offered by large antenna arrays and the multi-path sparsity of mmWave/THz channels, a novel inter-symbol interference (ISI) mitigation technique called delay alignment modulation (DAM) has been recently proposed. The key ideas of DAM are delay pre-compensation and path-based beamforming. Howev…
▽ More
For millimeter wave (mmWave) or Terahertz (THz) communications, by leveraging the high spatial resolution offered by large antenna arrays and the multi-path sparsity of mmWave/THz channels, a novel inter-symbol interference (ISI) mitigation technique called delay alignment modulation (DAM) has been recently proposed. The key ideas of DAM are delay pre-compensation and path-based beamforming. However, existing research on DAM is based on fully digital beamforming, where the number of radio frequency (RF) chains required is equal to the number of antennas. This leads to high hardware cost and power consumption for mmWave/THz massive multiple-input multiple-output (MIMO) communications. Thus, this paper proposes the hybrid analog/digital beamforming based DAM. The analog and digital beamforming matrices are designed to achieve performance close to DAM based on fully digital beamforming. The effectiveness of the proposed technique is verified by simulation results.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Can Sparse Arrays Outperform Collocated Arrays for Future Wireless Communications?
Authors:
Huizhi Wang,
Yong Zeng
Abstract:
Multiple-input multiple-output (MIMO) has become a key technology for contemporary wireless communication systems. For typical MIMO systems, antenna arrays are separated by half of the signal wavelength, which are termed collocated arrays. In this paper, we ask the following question: For future wireless communication systems, is it possible to achieve better performance than collocated arrays by…
▽ More
Multiple-input multiple-output (MIMO) has become a key technology for contemporary wireless communication systems. For typical MIMO systems, antenna arrays are separated by half of the signal wavelength, which are termed collocated arrays. In this paper, we ask the following question: For future wireless communication systems, is it possible to achieve better performance than collocated arrays by using sparse arrays, whose element spacing is larger than half wavelength? The answer to this question is not immediately clear since while sparse arrays may achieve narrower beam for the main lobe, they also generate undesired grating lobes. In this paper, we show that the answer to the above question is affirmative. To this end, we first provide an insightful explanation by investigating the key properties of beam patterns of sparse and collocated arrays, together with the typical distribution of spatial angle difference Δ, which all critically impact the inter-user interference (IUI). In particular, we show that sparse arrays are less likely to experience severe IUI than collocated arrays, since the probability of Δtypically reduces with the increasing of |Δ|. This naturally helps to reject those higher-order grating lobes of sparse arrays, especially when users are densely located. Then we provide a rigorous derivation of the achievable data rate for sparse and collocated arrays, and derive the condition under which sparse arrays strictly outperform collocated counterparts. Finally, numerical results are provided to validate our theoretical studies.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Sim2Plan: Robot Motion Planning via Message Passing between Simulation and Reality
Authors:
Yizhou Zhao,
Yuanhong Zeng,
Qian Long,
Ying Nian Wu,
Song-Chun Zhu
Abstract:
Simulation-to-real is the task of training and develo** machine learning models and deploying them in real settings with minimal additional training. This approach is becoming increasingly popular in fields such as robotics. However, there is often a gap between the simulated environment and the real world, and machine learning models trained in simulation may not perform as well in the real wor…
▽ More
Simulation-to-real is the task of training and develo** machine learning models and deploying them in real settings with minimal additional training. This approach is becoming increasingly popular in fields such as robotics. However, there is often a gap between the simulated environment and the real world, and machine learning models trained in simulation may not perform as well in the real world. We propose a framework that utilizes a message-passing pipeline to minimize the information gap between simulation and reality. The message-passing pipeline is comprised of three modules: scene understanding, robot planning, and performance validation. First, the scene understanding module aims to match the scene layout between the real environment set-up and its digital twin. Then, the robot planning module solves a robotic task through trial and error in the simulation. Finally, the performance validation module varies the planning results by constantly checking the status difference of the robot and object status between the real set-up and the simulation. In the experiment, we perform a case study that requires a robot to make a cup of coffee. Results show that the robot is able to complete the task under our framework successfully. The robot follows the steps programmed into its system and utilizes its actuators to interact with the coffee machine and other tools required for the task. The results of this case study demonstrate the potential benefits of our method that drive robots for tasks that require precision and efficiency. Further research in this area could lead to the development of even more versatile and adaptable robots, opening up new possibilities for automation in various industries.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Performance Analysis and Approximate Message Passing Detection of Orthogonal Time Sequency Multiplexing Modulation
Authors:
Ze** Sui,
Shefeng Yan,
Hongming Zhang,
Sumei Sun,
Yonghong Zeng,
Lie-Liang Yang,
Lajos Hanzo
Abstract:
In orthogonal time sequency multiplexing (OTSM) modulation, the information symbols are conveyed in the delay-sequency domain upon exploiting the inverse Walsh Hadamard transform (IWHT). It has been shown that OTSM is capable of attaining a bit error ratio (BER) similar to that of orthogonal time-frequency space (OTFS) modulation at a lower complexity, since the saving of multiplication operations…
▽ More
In orthogonal time sequency multiplexing (OTSM) modulation, the information symbols are conveyed in the delay-sequency domain upon exploiting the inverse Walsh Hadamard transform (IWHT). It has been shown that OTSM is capable of attaining a bit error ratio (BER) similar to that of orthogonal time-frequency space (OTFS) modulation at a lower complexity, since the saving of multiplication operations in the IWHT. Hence we provide its BER performance analysis and characterize its detection complexity. We commence by deriving its generalized input-output relationship and its unconditional pairwise error probability (UPEP). Then, its BER upper bound is derived in closed form under both ideal and imperfect channel estimation conditions, which is shown to be tight at moderate to high signal-to-noise ratios (SNRs). Moreover, a novel approximate message passing (AMP) aided OTSM detection framework is proposed. Specifically, to circumvent the high residual BER of the conventional AMP detector, we proposed a vector AMP-based expectation-maximization (VAMP-EM) detector for performing joint data detection and noise variance estimation. The variance auto-tuning algorithm based on the EM algorithm is designed for the VAMP-EM detector to further improve the convergence performance. The simulation results illustrate that the VAMP-EM detector is capable of striking an attractive BER vs. complexity trade-off than the state-of-the-art schemes as well as providing a better convergence. Finally, we propose AMP and VAMP-EM turbo receivers for low-density parity-check (LDPC)-coded OTSM systems. It is demonstrated that our proposed VAMP-EM turbo receiver is capable of providing both BER and convergence performance improvements over the conventional AMP solution.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Coverage Enhancement Strategy in WMSNs Based on a Novel Swarm Intelligence Algorithm: Army Ant Search Optimizer
Authors:
Yindi Yao,
Qin Wen,
Yanpeng Cui,
Feng Zhao,
Bozhan Zhao,
Yao** Zeng
Abstract:
As one of the most crucial scenarios of the Internet of Things (IoT), wireless multimedia sensor networks (WMSNs) pay more attention to the information-intensive data (e.g., audio, video, image) for remote environments. The area coverage reflects the perception of WMSNs to the surrounding environment, where a good coverage effect can ensure effective data collection. Given the harsh and complex ph…
▽ More
As one of the most crucial scenarios of the Internet of Things (IoT), wireless multimedia sensor networks (WMSNs) pay more attention to the information-intensive data (e.g., audio, video, image) for remote environments. The area coverage reflects the perception of WMSNs to the surrounding environment, where a good coverage effect can ensure effective data collection. Given the harsh and complex physical environment of WMSNs, which easily form the sensing overlap** regions and coverage holes by random deployment. The intention of our research is to deal with the optimization problem of maximizing the coverage rate in WMSNs. By proving the NP-hard of the coverage enhancement of WMSNs, inspired by the predation behavior of army ants, this article proposes a novel swarm intelligence (SI) technology army ant search optimizer (AASO) to solve the above problem, which is implemented by five operators: army ant and prey initialization, recruited by prey, attack prey, update prey, and build ant bridge. The simulation results demonstrate that the optimizer shows good performance in terms of exploration and exploitation on benchmark suites when compared to other representative SI algorithms. More importantly, coverage enhancement AASO-based in WMSNs has better merits in terms of coverage effect when compared to existing approaches.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Near-Field Beam Management for Extremely Large-Scale Array Communications
Authors:
Changsheng You,
Yunpu Zhang,
Chenyu Wu,
Yong Zeng,
Beixiong Zheng,
Li Chen,
Linglong Dai,
A. Lee Swindlehurst
Abstract:
Extremely large-scale arrays (XL-arrays) have emerged as a promising technology to achieve super-high spectral efficiency and spatial resolution in future wireless systems. The large aperture of XL-arrays means that spherical rather than planar wavefronts must be considered, and a paradigm shift from far-field to near-field communications is necessary. Unlike existing works that have mainly consid…
▽ More
Extremely large-scale arrays (XL-arrays) have emerged as a promising technology to achieve super-high spectral efficiency and spatial resolution in future wireless systems. The large aperture of XL-arrays means that spherical rather than planar wavefronts must be considered, and a paradigm shift from far-field to near-field communications is necessary. Unlike existing works that have mainly considered far-field beam management, we study the new near-field beam management for XL-arrays. We first provide an overview of near-field communications and introduce various applications of XL-arrays in both outdoor and indoor scenarios. Then, three typical near-field beam management methods for XL-arrays are discussed: near-field beam training, beam tracking, and beam scheduling. We point out their main design issues and propose promising solutions to address them. Moreover, other important directions in near-field communications are also highlighted to motivate future research.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Single-Image-Based Deep Learning for Segmentation of Early Esophageal Cancer Lesions
Authors:
Haipeng Li,
Dingrui Liu,
Yu Zeng,
Shuaicheng Liu,
Tao Gan,
Nini Rao,
**lin Yang,
Bing Zeng
Abstract:
Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesio…
▽ More
Accurate segmentation of lesions is crucial for diagnosis and treatment of early esophageal cancer (EEC). However, neither traditional nor deep learning-based methods up to today can meet the clinical requirements, with the mean Dice score - the most important metric in medical image analysis - hardly exceeding 0.75. In this paper, we present a novel deep learning approach for segmenting EEC lesions. Our approach stands out for its uniqueness, as it relies solely on a single image coming from one patient, forming the so-called "You-Only-Have-One" (YOHO) framework. On one hand, this "one-image-one-network" learning ensures complete patient privacy as it does not use any images from other patients as the training data. On the other hand, it avoids nearly all generalization-related problems since each trained network is applied only to the input image itself. In particular, we can push the training to "over-fitting" as much as possible to increase the segmentation accuracy. Our technical details include an interaction with clinical physicians to utilize their expertise, a geometry-based rendering of a single lesion image to generate the training set (the \emph{biggest} novelty), and an edge-enhanced UNet. We have evaluated YOHO over an EEC data-set created by ourselves and achieved a mean Dice score of 0.888, which represents a significant advance toward clinical applications.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Domestic Activities Classification from Audio Recordings Using Multi-scale Dilated Depthwise Separable Convolutional Network
Authors:
Yufei Zeng,
Yanxiong Li,
Zhenfeng Zhou,
Ruiqi Wang,
Difeng Lu
Abstract:
Domestic activities classification (DAC) from audio recordings aims at classifying audio recordings into pre-defined categories of domestic activities, which is an effective way for estimation of daily activities performed in home environment. In this paper, we propose a method for DAC from audio recordings using a multi-scale dilated depthwise separable convolutional network (DSCN). The DSCN is a…
▽ More
Domestic activities classification (DAC) from audio recordings aims at classifying audio recordings into pre-defined categories of domestic activities, which is an effective way for estimation of daily activities performed in home environment. In this paper, we propose a method for DAC from audio recordings using a multi-scale dilated depthwise separable convolutional network (DSCN). The DSCN is a lightweight neural network with small size of parameters and thus suitable to be deployed in portable terminals with limited computing resources. To expand the receptive field with the same size of DSCN's parameters, dilated convolution, instead of normal convolution, is used in the DSCN for further improving the DSCN's performance. In addition, the embeddings of various scales learned by the dilated DSCN are concatenated as a multi-scale embedding for representing property differences among various classes of domestic activities. Evaluated on a public dataset of the Task 5 of the 2018 challenge on Detection and Classification of Acoustic Scenes and Events (DCASE-2018), the results show that: both dilated convolution and multi-scale embedding contribute to the performance improvement of the proposed method; and the proposed method outperforms the methods based on state-of-the-art lightweight network in terms of classification accuracy.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Near-Field Beam Focusing Pattern and Grating Lobe Characterization for Modular XL-Array
Authors:
Xinrui Li,
Zhenjun Dong,
Yong Zeng,
Shi **,
Rui Zhang
Abstract:
In this paper, we investigate the near-field modelling and analyze the beam focusing pattern for modular extremely large-scale array (XL-array) communications. As modular XL-array is physically and electrically large in general, the accurate characterization of amplitude and phase variations across its array elements requires the non-uniform spherical wave (NUSW) model, which, however, is difficul…
▽ More
In this paper, we investigate the near-field modelling and analyze the beam focusing pattern for modular extremely large-scale array (XL-array) communications. As modular XL-array is physically and electrically large in general, the accurate characterization of amplitude and phase variations across its array elements requires the non-uniform spherical wave (NUSW) model, which, however, is difficult for performance analysis and optimization. To address this issue, we first present two ways to simplify the NUSW model by exploiting the unique regular structure of modular XL-array, termed sub-array based uniform spherical wave (USW) models with different or common angles, respectively. Based on the developed models, the near-field beam focusing patterns of XL-array communications are derived. It is revealed that compared to the existing collocated XL-array with the same number of array elements, modular XL-array can significantly enhance the spatial resolution, but at the cost of generating undesired grating lobes. Fortunately, different from the conventional far-field uniform plane wave (UPW) model, the near-field USW model for modular XL-array exhibits a higher grating lobe suppression capability, thanks to the non-linear phase variations across the array elements. Finally, simulation results are provided to verify the near-field beam focusing pattern and grating lobe characteristics of modular XL-array.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Integrated Super-Resolution Sensing and Communication with 5G NR Waveform: Signal Processing with Uneven CPs and Experiments
Authors:
Chaoyue Zhang,
Zhiwen Zhou,
Huizhi Wang,
Yong Zeng
Abstract:
Integrated sensing and communication (ISAC) is a promising technology to simultaneously provide high-performance wireless communication and radar sensing services in future networks. In this paper, we propose the concept of \emph{integrated super-resolution sensing and communication} (ISSAC), which uses super-resolution algorithms in ISAC systems to achieve extreme sensing performance for those cr…
▽ More
Integrated sensing and communication (ISAC) is a promising technology to simultaneously provide high-performance wireless communication and radar sensing services in future networks. In this paper, we propose the concept of \emph{integrated super-resolution sensing and communication} (ISSAC), which uses super-resolution algorithms in ISAC systems to achieve extreme sensing performance for those critical parameters, such as delay, Doppler, and angle of the sensing targets. Based on practical fifth generation (5G) New Radio (NR) waveforms, the signal processing techniques of ISSAC are investigated and prototy** experiments are performed to verify the achievable performance. To this end, we first study the effect of uneven cyclic prefix (CP) lengths of 5G NR orthogonal frequency division multiplexing (OFDM) waveforms on various sensing algorithms. Specifically, the performance of the standard Periodogram based radar processing method, together with the two classical super-resolution algorithms, namely, MUltiple SIgnal Classification (MUSIC) and Estimating Signal Parameter via Rotational Invariance Techniques (ESPRIT) are analyzed in terms of the delay and Doppler estimation. To resolve the uneven CP issue, a new structure of steering vector for MUSIC and a new selection of submatrices for ESPRIT are proposed. Furthermore, an ISSAC experiment platform is setup to validate the theoretical analysis, and the experimental results show that the performance degradation caused by unequal CP length is insignificant and high-resolution delay and Doppler estimation of the target can be achieved with 5G NR waveforms.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Multiuser beam steering OWC system based on NOMA
Authors:
Y. Zeng,
Sanaa H. Mohamed,
Ahmad Qidan,
Taisir E. H. El-Gorashi,
Jaafar M. H. Elmirghani
Abstract:
In this paper, we propose applying Non-Orthogonal Multiple Access (NOMA) technology in a multiuser beam steering OWC system. We study the performance of the NOMA-based multiuser beam steering system in terms of the achievable rate and Bit Error Rate (BER). We investigate the impact of the power allocation factor of NOMA and the number of users in the room. The results show that the power allocatio…
▽ More
In this paper, we propose applying Non-Orthogonal Multiple Access (NOMA) technology in a multiuser beam steering OWC system. We study the performance of the NOMA-based multiuser beam steering system in terms of the achievable rate and Bit Error Rate (BER). We investigate the impact of the power allocation factor of NOMA and the number of users in the room. The results show that the power allocation factor is a vital parameter in NOMA-based transmission that affects the performance of the network in terms of data rate and BER.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Relay Assisted Multiuser OWC Systems under Human Blockage
Authors:
Y. Zeng,
Sanaa H. Mohamed,
Ahmad Qidan,
Taisir E. H. El-Gorashi,
Jaafar M. H. Elmirghani
Abstract:
This paper proposes using cooperative communication based on optoelectronic (O-E-O) amplify-and-forward relay terminals to reduce the influence of the blockage and shadowing resulting from human movement in a beam steering Optical Wireless Communication (OWC) system. The simulation results indicate that on average, the outage probability of the cooperative communication mode with O-E-O relay termi…
▽ More
This paper proposes using cooperative communication based on optoelectronic (O-E-O) amplify-and-forward relay terminals to reduce the influence of the blockage and shadowing resulting from human movement in a beam steering Optical Wireless Communication (OWC) system. The simulation results indicate that on average, the outage probability of the cooperative communication mode with O-E-O relay terminals is two orders of magnitude lower than the outage probability of the system without relay terminals.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Multi-Channel Attentive Feature Fusion for Radio Frequency Fingerprinting
Authors:
Yuan Zeng,
Yi Gong,
Jiawei Liu,
Shangao Lin,
Zidong Han,
Ruoxiao Cao,
Kaibin Huang,
Khaled Ben Letaief
Abstract:
Radio frequency fingerprinting (RFF) is a promising device authentication technique for securing the Internet of things. It exploits the intrinsic and unique hardware impairments of the transmitters for RF device identification. In real-world communication systems, hardware impairments across transmitters are subtle, which are difficult to model explicitly. Recently, due to the superior performanc…
▽ More
Radio frequency fingerprinting (RFF) is a promising device authentication technique for securing the Internet of things. It exploits the intrinsic and unique hardware impairments of the transmitters for RF device identification. In real-world communication systems, hardware impairments across transmitters are subtle, which are difficult to model explicitly. Recently, due to the superior performance of deep learning (DL)-based classification models on real-world datasets, DL networks have been explored for RFF. Most existing DL-based RFF models use a single representation of radio signals as the input. Multi-channel input model can leverage information from different representations of radio signals and improve the identification accuracy of the RF fingerprint. In this work, we propose a novel multi-channel attentive feature fusion (McAFF) method for RFF. It utilizes multi-channel neural features extracted from multiple representations of radio signals, including IQ samples, carrier frequency offset, fast Fourier transform coefficients and short-time Fourier transform coefficients, for better RF fingerprint identification. The features extracted from different channels are fused adaptively using a shared attention module, where the weights of neural features from multiple channels are learned during training the McAFF model. In addition, we design a signal identification module using a convolution-based ResNeXt block to map the fused features to device identities. To evaluate the identification performance of the proposed method, we construct a WiFi dataset, named WFDI, using commercial WiFi end-devices as the transmitters and a Universal Software Radio Peripheral (USRP) as the receiver. ...
△ Less
Submitted 23 June, 2023; v1 submitted 19 March, 2023;
originally announced March 2023.
-
Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms
Authors:
Joseph Konan,
Ojas Bhargave,
Shikhar Agnihotri,
Hojeong Lee,
Ankit Shah,
Shuo Han,
Yunyang Zeng,
Amanda Shu,
Haohui Liu,
Xuankai Chang,
Hamza Khalid,
Minseon Gwak,
Kawon Lee,
Minjeong Kim,
Bhiksha Raj
Abstract:
In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and plat…
▽ More
In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Cramér-Rao Bounds for Near-Field Sensing with Extremely Large-Scale MIMO
Authors:
Huizhi Wang,
Zhiqiang Xiao,
Yong Zeng
Abstract:
Mobile communication networks were designed to mainly support ubiquitous wireless communications, yet they are also expected to achieve radio sensing capabilities in the near future. However, most prior studies on radio sensing usually rely on far-field assumption with uniform plane wave (UPW) models. With the ever-increasing antenna size, together with the growing demands to sense nearby targets,…
▽ More
Mobile communication networks were designed to mainly support ubiquitous wireless communications, yet they are also expected to achieve radio sensing capabilities in the near future. However, most prior studies on radio sensing usually rely on far-field assumption with uniform plane wave (UPW) models. With the ever-increasing antenna size, together with the growing demands to sense nearby targets, the conventional far-field UPW assumption may become invalid. Therefore, this paper studies near-field radio sensing with extremely large-scale (XL) antenna arrays, where the more general uniform spheric wave (USW) sensing model is considered. Closed-form expressions of the Cramér-Rao Bounds (CRBs) for both angle and range estimations are derived for near-field XL-MIMO radar mode and XL-phased array radar mode, respectively. Our results reveal that different from the conventional UPW model where the CRB for angle decreases unboundedly as the number of antennas increases, for XL-MIMO radar-based near-field sensing, the CRB decreases with diminishing return and approaches to a certain limit as the number of antennas increases. Besides, different from the far-field model where the CRB for range is infinity since it has no range estimation capability, that for the near-field case is finite. Furthermore, it is revealed that the commonly used spherical wave model based on second-order Taylor approximation is insufficient for near-field CRB analysis. Extensive simulation results are provided to validate our derived CRBs.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
i2LQR: Iterative LQR for Iterative Tasks in Dynamic Environments
Authors:
Yifan Zeng,
Suiyi He,
Han Hoang Nguyen,
Yihan Li,
Zhongyu Li,
Koushil Sreenath,
Jun Zeng
Abstract:
This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike…
▽ More
This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike existing algorithms, the i2LQR computes the optimal solution in an iterative manner at each timestamp, rendering it well-suited for iterative tasks with changing constraints at different iterations. To evaluate the performance of the proposed algorithm, we conduct numerical simulations for an iterative task aimed at minimizing completion time. The results show that i2LQR achieves an optimized performance with respect to learning-based MPC (LMPC) as the benchmark in static environments, and outperforms LMPC in dynamic environments with both static and dynamics obstacles.
△ Less
Submitted 6 September, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
Authors:
Muqiao Yang,
Joseph Konan,
David Bick,
Yunyang Zeng,
Shuo Han,
Anurag Kumar,
Shinji Watanabe,
Bhiksha Raj
Abstract:
Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non…
▽ More
Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non-differentiable, and we develop a neural network estimator that can accurately predict their time-series values across an utterance. We also model phoneme-specific weights for each feature, as the acoustic parameters are known to show different behavior in different phonemes. We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features. Experimentally we show that it improves speech enhancement workflows in both time-domain and time-frequency domain, as measured by standard evaluation metrics. We also provide an analysis of phoneme-dependent improvement on acoustic parameters, demonstrating the additional interpretability that our method provides. This analysis can suggest which features are currently the bottleneck for improvement.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
Authors:
Yunyang Zeng,
Joseph Konan,
Shuo Han,
David Bick,
Muqiao Yang,
Anurag Kumar,
Shinji Watanabe,
Bhiksha Raj
Abstract:
Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable…
▽ More
Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features. Unlike prior work that looks at aggregated acoustic parameters or a few categories of acoustic parameters, our temporal acoustic parameter (TAP) loss enables auxiliary optimization and improvement of many fine-grain speech characteristics in enhancement workflows. We show that adding TAPLoss as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility. We use data from the Deep Noise Suppression 2020 Challenge to demonstrate that both time-domain models and time-frequency domain models can benefit from our method.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Wireless Communication Using Metal Reflectors: Reflection Modelling and Experimental Verification
Authors:
Zhi Yu,
Chao Feng,
Yong Zeng,
Teng Li,
Shi **
Abstract:
Wireless communication using fully passive metal reflectors is a promising technique for coverage expansion, signal enhancement, rank improvement and blind-zone compensation, thanks to its appealing features including zero energy consumption, ultra low cost, signaling- and maintenance-free, easy deployment and full compatibility with existing and future wireless systems. However, a prevalent under…
▽ More
Wireless communication using fully passive metal reflectors is a promising technique for coverage expansion, signal enhancement, rank improvement and blind-zone compensation, thanks to its appealing features including zero energy consumption, ultra low cost, signaling- and maintenance-free, easy deployment and full compatibility with existing and future wireless systems. However, a prevalent understanding for reflection by metal plates is based on Snell's Law, i.e., signal can only be received when the observation angle equals to the incident angle, which is valid only when the electrical dimension of the metal plate is extremely large. In this paper, we rigorously derive a general reflection model that is applicable to metal reflectors of any size, any orientation, and any linear polarization. The derived model is given compactly in terms of the radar cross section (RCS) of the metal plate, as a function of its physical dimensions and orientation vectors, as well as the wave polarization and the wave deflection vector, i.e., the change of direction from the incident wave direction to the observation direction. Furthermore, experimental results based on actual field measurements are provided to validate the accuracy of our developed model and demonstrate the great potential of communications using metal reflectors.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Multi-User Delay Alignment Modulation for Millimeter Wave Massive MIMO
Authors:
Xingwei Wang,
Haiquan Lu,
Yong Zeng
Abstract:
Delay alignment modulation (DAM) is a novel wideband communication technique, which exploits the high spatial resolution and multi-path sparsity of millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems to mitigate inter-symbol interference (ISI), without relying on conventional techniques like channel equalization or multi-carrier transmission. In this paper, we extend the…
▽ More
Delay alignment modulation (DAM) is a novel wideband communication technique, which exploits the high spatial resolution and multi-path sparsity of millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems to mitigate inter-symbol interference (ISI), without relying on conventional techniques like channel equalization or multi-carrier transmission. In this paper, we extend the DAM technique to multi-user mmWave massive MIMO communication systems. We first provide asymptotic analysis by showing that when the number of base station (BS) antennas is much larger than the total number of channel paths, DAM is able to eliminate both ISI and inter-user interference (IUI) with the simple delay pre-compensation and per-path-based maximal ratio transmission (MRT) beamforming. We then study the general multi-user DAM design by considering the three classical transmit beamforming strategies in a per-path basis, namely MRT, zero-forcing (ZF) and regularized zero-forcing (RZF). Simulation results demonstrate that multi-user DAM can significantly outperform the benchmarking single-carrier ISI mitigation technique that only uses the strongest channel path of each user.
△ Less
Submitted 13 November, 2022;
originally announced November 2022.