Search | arXiv e-print repository

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the audio infilling task. Unlike many previous works, it does not require additional components (e.g., duration model, grapheme-to-phoneme) or complex techniques (e.g., monotonic alignment search). Despite its simplicity, E2 TTS achieves state-of-the-art zero-shot TTS capabilities that are comparable to or surpass previous works, including Voicebox and NaturalSpeech 3. The simplicity of E2 TTS also allows for flexibility in the input representation. We propose several variants of E2 TTS to improve usability during inference. See https://aka.ms/e2tts/ for demo samples. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16083 [pdf, other]

Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning

Authors: Ruisheng Gao, Zeyu Xiao, Zhiwei Xiong

Abstract: Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to… ▽ More Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution by effectively modeling long-range spatial-angular correlations, but their quadratic complexity hinders the efficient processing of high resolution 4D inputs, resulting in slow inference speed and high memory cost. As a compromise, most prior work adopts a patch-based strategy, which fails to leverage the full information from the entire input LFs. The recently proposed selective state-space model, Mamba, has gained popularity for its efficient long-range sequence modeling. In this paper, we propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy. Specifically, we tokenize 4D LFs into subspace sequences and conduct bi-directional scanning on each subspace. Based on our scanning strategy, we then design the Mamba-based Global Interaction (MGI) module to capture global information and the local Spatial- Angular Modulator (SAM) to complement local details. Additionally, we introduce a Transformer-to-Mamba (T2M) loss to further enhance overall performance. Extensive experiments on public benchmarks demonstrate that MLFSR surpasses CNN-based models and rivals Transformer-based methods in performance while maintaining higher efficiency. With quicker inference speed and reduced memory demand, MLFSR facilitates full-image processing of high-resolution 4D LFs with enhanced performance. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 17 pages,7 figures

arXiv:2406.09190 [pdf, other]

Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation

Authors: Zhiqiang Xiao, Xianda Liu, Yong Zeng, J. Andrew Zhang, Shi **, Rui Zhang

Abstract: Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design… ▽ More Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design. In this article, by leveraging the super spatial resolution of large antenna arrays and the multi-path spatial sparsity of highfrequency wireless channels, we introduce a new approach for waveform design based on the recently proposed delay-Doppler alignment modulation (DDAM). In particular, DDAM makes a paradigm shift of waveform design from the conventional manner of tolerating channel delay and Doppler spreads to actively manipulating them. First, we review the fundamental constraints and performance limitations of orthogonal frequency division multiplexing (OFDM) and introduce new opportunities for 6G waveform design. Next, the motivations and basic principles of DDAM are presented, followed by its various extensions to different wireless system setups. Finally, the main design considerations for DDAM are discussed and the new opportunities for future research are highlighted. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.04281 [pdf, other]

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, **yu Li, Sheng Zhao, Naoyuki Kanda

Abstract: Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations a… ▽ More Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker characteristics, has been underexplored. In this work, we propose a novel total-duration-aware (TDA) duration model for TTS, where phoneme durations are predicted not only from the text input but also from an additional input of the total target duration. We also propose a MaskGIT-based duration model that enhances the diversity and quality of the predicted phoneme durations. Our results demonstrate that the proposed TDA duration models achieve better intelligibility and speaker similarity for various speech rate configurations compared to the baseline models. We also show that the proposed MaskGIT-based model can generate phoneme durations with higher quality and diversity compared to its regression or flow-matching counterparts. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2406.00604 [pdf, other]

Multipath Exploitation for Fluctuating Target Detection in RIS-Assisted ISAC Systems

Authors: Shoushuo Zhang, Zichao Xiao, Rang Liu, Ming Li, Wei Wang, Qian Liu

Abstract: Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctu… ▽ More Integrated sensing and communication (ISAC) systems are typically deployed in multipath environments, which is usually deemed as a challenging issue for wireless communications. However, the multipath propagation can also provide extra illumination and observation perspectives for radar sensing, which offers spatial diversity gain for detecting targets with spatial radar cross-section (RCS) fluctuations. In this letter, we propose to utilize reconfigurable intelligent surfaces (RIS) in ISAC systems to provide high-quality and controllable multipath propagation for improving the performance of fluctuating target detection and simultaneously enhancing the quality of communication services. To effectively exploit the spatial diversity offered by RIS-empowered multipath, the dual-functional transmit beamforming and the RIS reflection beamforming are jointly designed to maximize the expectation of radar signal-to-noise ratio (SNR). To solve the resulting complex non-convex optimization problem, we develop an efficient alternating optimization algorithm that utilizes majorization-minimization (MM) and alternating direction method of multipliers (ADMM) algorithms. Simulation results illustrate the advantages of multipath exploitation and the proposed beamforming design algorithm for fluctuating target detection in RIS-assisted ISAC systems. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: submitted to IEEE WCL

arXiv:2405.06971 [pdf, other]

Controlling network-coupled neural dynamics with nonlinear network control theory

Authors: Zhongye Xia, Weibin Li, Zhichao Liang, Kexin Lou, Quanying Liu

Abstract: This paper addresses the problem of controlling the temporal dynamics of complex nonlinear network-coupled dynamical systems, specifically in terms of neurodynamics. Based on the Lyapunov direct method, we derive a control strategy with theoretical guarantees of controllability. To verify the performance of the derived control strategy, we perform numerical experiments on two nonlinear network-cou… ▽ More This paper addresses the problem of controlling the temporal dynamics of complex nonlinear network-coupled dynamical systems, specifically in terms of neurodynamics. Based on the Lyapunov direct method, we derive a control strategy with theoretical guarantees of controllability. To verify the performance of the derived control strategy, we perform numerical experiments on two nonlinear network-coupled dynamical systems that emulate phase synchronization and neural population dynamics. The results demonstrate the feasibility and effectiveness of our control strategy. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.15643 [pdf, ps, other]

Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array

Authors: Lipeng Zhu, Xiangyu Pi, Wenyan Ma, Zhenyu Xiao, Rui Zhang

Abstract: Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge… ▽ More Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge, we propose in this paper utilizing movable antenna (MA) arrays to enhance the satellite beam coverage and interference mitigation. Specifically, given the satellite orbit and the coverage requirement within a specific time interval, the antenna position vector (APV) and antenna weight vector (AWV) of the satellite-mounted MA array are jointly optimized over time to minimize the average signal leakage power to the interference area of the satellite, subject to the constraints of the minimum beamforming gain over the coverage area, the continuous movement of MAs, and the constant modulus of AWV. The corresponding continuous-time decision process for the APV and AWV is first transformed into a more tractable discrete-time optimization problem. Then, an alternating optimization (AO)-based algorithm is developed by iteratively optimizing the APV and AWV, where the successive convex approximation (SCA) technique is utilized to obtain locally optimal solutions during the iterations. Moreover, to further reduce the antenna movement overhead, a low-complexity MA scheme is proposed by using an optimized common APV over all time slots. Simulation results validate that the proposed MA array-aided beam coverage schemes can significantly decrease the interference leakage of the satellite compared to conventional FPA-based schemes, while the low-complexity MA scheme can achieve a performance comparable to the continuous-movement scheme. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2403.19951 [pdf, ps, other]

Fractional Delay Alignment Modulation for Spatially Sparse Wireless Communications

Authors: Zhiwen Zhou, Zhiqiang Xiao, Yong Zeng

Abstract: Delay alignment modulation (DAM) is a novel transmission technique for wireless systems with high spatial resolution by leveraging delay compensation and path-based beamforming, to mitigate the inter-symbol interference (ISI) without resorting to complex channel equalization or multi-carrier transmission. However, most existing studies on DAM consider a simplified scenario by assuming that the cha… ▽ More Delay alignment modulation (DAM) is a novel transmission technique for wireless systems with high spatial resolution by leveraging delay compensation and path-based beamforming, to mitigate the inter-symbol interference (ISI) without resorting to complex channel equalization or multi-carrier transmission. However, most existing studies on DAM consider a simplified scenario by assuming that the channel multi-path delays are integer multiples of the signal sampling interval. This paper investigates DAM for the more general and practical scenarios with fractional multi-path delays. We first analyze the impact of fractional multi-path delays on the existing DAM design, termed integer DAM (iDAM), which can only achieve delay compensations that are integer multiples of the sampling interval. It is revealed that the existence of fractional multi-path delays renders iDAM no longer possible to achieve perfect delay alignment. To address this issue, we propose a more generic DAM design called fractional DAM (fDAM), which achieves fractional delay pre-compensation via upsampling and fractional delay filtering. By leveraging the Farrow filter structure, the proposed approach can eliminate ISI without real-time computation of filter coefficients, as typically required in traditional channel equalization techniques. Simulation results demonstrate that the proposed fDAM outperforms the existing iDAM and orthogonal frequency division multiplexing (OFDM) in terms of symbol error rate (SER) and spectral efficiency, while maintaining a comparable peak-to-average power ratio (PAPR) as iDAM, which is considerably lower than OFDM. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE WCNC 2024

arXiv:2402.15185 [pdf, other]

Pre-Chirp-Domain Index Modulation for Affine Frequency Division Multiplexing

Authors: Guangyao Liu, Tianqi Mao, Ruiqi Liu, Zhenyu Xiao

Abstract: Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter a… ▽ More Affine frequency division multiplexing (AFDM), tailored as a novel multicarrier technique utilizing chirp signals for high-mobility communications, exhibits marked advantages compared to traditional orthogonal frequency division multiplexing (OFDM). AFDM is based on the discrete affine Fourier transform (DAFT) with two modifiable parameters of the chirp signals, termed as the pre-chirp parameter and post-chirp parameter, respectively. These parameters can be fine-tuned to avoid overlap** channel paths with different delays or Doppler shifts, leading to performance enhancement especially for doubly dispersive channel. In this paper, we propose a novel AFDM structure with the pre-chirp index modulation (PIM) philosophy (AFDM-PIM), which can embed additional information bits into the pre-chirp parameter design for both spectral and energy efficiency enhancement. Specifically, we first demonstrate that the application of distinct pre-chirp parameters to various subcarriers in the AFDM modulation process maintains the orthogonality among these subcarriers. Then, different pre-chirp parameters are flexibly assigned to each AFDM subcarrier according to the incoming bits. By such arrangement, aside from classical phase/amplitude modulation, extra binary bits can be implicitly conveyed by the indices of selected pre-chir** parameters realizations without additional energy consumption. At the receiver, both a maximum likelihood (ML) detector and a reduced-complexity ML-minimum mean square error (ML-MMSE) detector are employed to recover the information bits. It has been shown via simulations that the proposed AFDM-PIM exhibits superior bit error rate (BER) performance compared to classical AFDM, OFDM and IM-aided OFDM algorithms. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.11445 [pdf, other]

Balanced Truncation of Linear Systems with Quadratic Outputs in Limited Time and Frequency Intervals

Authors: Qiu-Yan Song, Umair Zulfiqar, Zhi-Hua Xiao, Mohammad Monir Uddin, Victor Sreeram

Abstract: Model order reduction involves constructing a reduced-order approximation of a high-order model while retaining its essential characteristics. This reduced-order model serves as a substitute for the original one in various applications such as simulation, analysis, and design. Often, there's a need to maintain high accuracy within a specific time or frequency interval, while errors beyond this lim… ▽ More Model order reduction involves constructing a reduced-order approximation of a high-order model while retaining its essential characteristics. This reduced-order model serves as a substitute for the original one in various applications such as simulation, analysis, and design. Often, there's a need to maintain high accuracy within a specific time or frequency interval, while errors beyond this limit can be tolerated. This paper addresses time-limited and frequency-limited model order reduction scenarios for linear systems with quadratic outputs, by generalizing the recently introduced structure-preserving balanced truncation algorithm. To that end, limited interval system Gramians are defined, and the corresponding generalized Lyapunov equations governing their computation are derived. Additionally, low-rank solutions for these equations are investigated. Next, balanced truncation algorithms are proposed for time-limited and frequency-limited scenarios, each utilizing its corresponding limited-interval system Gramians. The proposed algorithms ensure accurate results within specified time and frequency intervals while preserving the quadratic-output structure. Two benchmark numerical examples are presented to demonstrate the effectiveness of the algorithms, showcasing their ability to achieve superior accuracy within the desired time or frequency interval. △ Less

Submitted 5 March, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.07383 [pdf, other]

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, **zhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing and variety of the laughter to be generated. In this work, we propose ELaTE, a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt with precise control of laughter timing and expression. Specifically, ELaTE works on the audio prompt to mimic the voice characteristic, the text prompt to indicate the contents of the generated speech, and the input to control the laughter expression, which can be either the start and end times of laughter, or the additional audio prompt that contains laughter to be mimicked. We develop our model based on the foundation of conditional flow-matching-based zero-shot TTS, and fine-tune it with frame-level representation from a laughter detector as additional conditioning. With a simple scheme to mix small-scale laughter-conditioned data with large-scale pre-training data, we demonstrate that a pre-trained zero-shot TTS model can be readily fine-tuned to generate natural laughter with precise controllability, without losing any quality of the pre-trained zero-shot TTS model. Through objective and subjective evaluations, we show that ELaTE can generate laughing speech with significantly higher quality and controllability compared to conventional models. See https://aka.ms/elate/ for demo samples. △ Less

Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

arXiv:2401.14612 [pdf, ps, other]

On Inhomogeneous Infinite Products of Stochastic Matrices and Applications

Authors: Zhaoyue Xia, Jun Du, Chunxiao Jiang, H. Vincent Poor, Zhu Han, Yong Ren

Abstract: With the growth of magnitude of multi-agent networks, distributed optimization holds considerable significance within complex systems. Convergence, a pivotal goal in this domain, is contingent upon the analysis of infinite products of stochastic matrices (IPSMs). In this work, convergence properties of inhomogeneous IPSMs are investigated. The convergence rate of inhomogeneous IPSMs towards an abs… ▽ More With the growth of magnitude of multi-agent networks, distributed optimization holds considerable significance within complex systems. Convergence, a pivotal goal in this domain, is contingent upon the analysis of infinite products of stochastic matrices (IPSMs). In this work, convergence properties of inhomogeneous IPSMs are investigated. The convergence rate of inhomogeneous IPSMs towards an absolute probability sequence $π$ is derived. We also show that the convergence rate is nearly exponential, which coincides with existing results on ergodic chains. The methodology employed relies on delineating the interrelations among Sarymsakov matrices, scrambling matrices, and positive-column matrices. Based on the theoretical results on inhomogeneous IPSMs, we propose a decentralized projected subgradient method for time-varying multi-agent systems with graph-related stretches in (sub)gradient descent directions. The convergence of the proposed method is established for convex objective functions, and extended to non-convex objectives that satisfy Polyak-Lojasiewicz conditions. To corroborate the theoretical findings, we conduct numerical simulations, aligning the outcomes with the established theoretical framework. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.08974 [pdf, ps, other]

Performance Analysis and Optimization for Movable Antenna Aided Wideband Communications

Authors: Lipeng Zhu, Wenyan Ma, Zhenyu Xiao, Rui Zhang

Abstract: Movable antenna (MA) has emerged as a promising technology to enhance wireless communication performance by enabling the local movement of antennas at the transmitter (Tx) and/or receiver (Rx) for achieving more favorable channel conditions. As the existing studies on MA-aided wireless communications have mainly considered narrow-band transmission in flat fading channels, we investigate in this pa… ▽ More Movable antenna (MA) has emerged as a promising technology to enhance wireless communication performance by enabling the local movement of antennas at the transmitter (Tx) and/or receiver (Rx) for achieving more favorable channel conditions. As the existing studies on MA-aided wireless communications have mainly considered narrow-band transmission in flat fading channels, we investigate in this paper the MA-aided wideband communications employing orthogonal frequency division multiplexing (OFDM) in frequency-selective fading channels. Under the general multi-tap field-response channel model, the wireless channel variations in both space and frequency are characterized with different positions of the MAs. Unlike the narrow-band transmission where the optimal MA position at the Tx/Rx simply maximizes the single-tap channel amplitude, the MA position in the wideband case needs to balance the amplitudes and phases over multiple channel taps in order to maximize the OFDM transmission rate over multiple frequency subcarriers. First, we derive an upper bound on the OFDM achievable rate in closed form when the size of the Tx/Rx region for antenna movement is arbitrarily large. Next, we develop a parallel greedy ascent (PGA) algorithm to obtain locally optimal solutions to the MAs' positions for OFDM rate maximization subject to finite-size Tx/Rx regions. To reduce computational complexity, a simplified PGA algorithm is also provided to optimize the MAs' positions more efficiently. Simulation results demonstrate that the proposed PGA algorithms can approach the OFDM rate upper bound closely with the increase of Tx/Rx region sizes and outperform conventional systems with fixed-position antennas (FPAs) under the wideband channel setup. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.17454 [pdf, ps, other]

Sparsity Exploitation via Joint Receive Processing and Transmit Beamforming Design for MIMO-OFDM ISAC Systems

Authors: Zichao Xiao, Rang Liu, Ming Li, Wei Wang, Qian Liu

Abstract: Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming desig… ▽ More Integrated sensing and communication (ISAC) is widely recognized as a pivotal enabling technique for the advancement of future wireless networks. This paper aims to efficiently exploit the inherent sparsity of echo signals for the multi-input-multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) based ISAC system. A novel joint receive echo processing and transmit beamforming design is presented to achieve this goal. Specifically, we first propose a compressive sensing (CS)-assisted estimation approach to facilitate ISAC receive echo processing, which can not only enable accurate recovery of target information, but also allow substantial reduction in the number of sensing subcarriers to be sampled and processed. Then, based on the proposed CS-assisted processing method, the associated transmit beamforming design is formulated with the objective of maximizing the sum-rate of multiuser communications while satisfying the transmit power budget and ensuring the received signal-to-noise ratio (SNR) for the designated sensing subcarriers. In order to address the formulated non-convex problem involving high-dimensional variables, an effective iterative algorithm employing majorization minimization (MM), fractional programming (FP), and the nonlinear equality alternative direction method of multipliers (neADMM) with closed-form solutions has been developed. Finally, extensive numerical simulations are conducted to verify the effectiveness of the proposed algorithm and the superior performance of the introduced sparsity exploitation strategy. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 13 pages, 6 Figures, submitted to IEEE Trans

arXiv:2312.15863 [pdf, other]

PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning

Authors: Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, Jiang** Yin

Abstract: Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at t… ▽ More Designing better deep networks and better reinforcement learning (RL) algorithms are both important for deep RL. This work studies the former. Specifically, the Perception and Decision-making Interleaving Transformer (PDiT) network is proposed, which cascades two Transformers in a very natural way: the perceiving one focuses on \emph{the environmental perception} by processing the observation at the patch level, whereas the deciding one pays attention to \emph{the decision-making} by conditioning on the history of the desired returns, the perceiver's outputs, and the actions. Such a network design is generally applicable to a lot of deep RL settings, e.g., both the online and offline RL algorithms under environments with either image observations, proprioception observations, or hybrid image-language observations. Extensive experiments show that PDiT can not only achieve superior performance than strong baselines in different settings but also extract explainable feature representations. Our code is available at \url{https://github.com/maohangyu/PDiT}. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024, full paper with oral presentation). Cover our preliminary study: arXiv:2212.14538

arXiv:2312.06969 [pdf, ps, other]

Channel Estimation for Movable Antenna Communication Systems: A Framework Based on Compressed Sensing

Authors: Zhenyu Xiao, Songqi Cao, Lipeng Zhu, Yanming Liu, Xiang-Gen Xia, Rui Zhang

Abstract: Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose… ▽ More Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose a general channel estimation framework for MA systems by exploiting the multi-path field response channel structure. Specifically, the angles of departure (AoDs), angles of arrival (AoAs), and complex coefficients of the multi-path components (MPCs) are jointly estimated by employing the compressed sensing method, based on multiple channel measurements at designated positions of the Tx-MA and Rx-MA. Under this framework, the Tx-MA and Rx-MA measurement positions fundamentally determine the measurement matrix for compressed sensing, of which the mutual coherence is analyzed from the perspective of Fourier transform. Moreover, two criteria for MA measurement positions are provided to guarantee the successful recovery of MPCs. Then, we propose several MA measurement position setups and compare their performance. Finally, comprehensive simulation results show that the proposed framework is able to estimate the complete CSI between the Tx and Rx regions with a high accuracy. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.05256 [pdf, other]

Holistic Evaluation of GPT-4V for Biomedical Imaging

Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2311.05478 [pdf, other]

Robust Retraining-free GAN Fingerprinting via Personalized Normalization

Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

Abstract: In recent years, there has been significant growth in the commercial applications of generative models, licensed and distributed by model developers to users, who in turn use them to offer services. In this scenario, there is a need to track and identify the responsible user in the presence of a violation of the license agreement or any kind of malicious usage. Although there are methods enabling… ▽ More In recent years, there has been significant growth in the commercial applications of generative models, licensed and distributed by model developers to users, who in turn use them to offer services. In this scenario, there is a need to track and identify the responsible user in the presence of a violation of the license agreement or any kind of malicious usage. Although there are methods enabling Generative Adversarial Networks (GANs) to include invisible watermarks in the images they produce, generating a model with a different watermark, referred to as a fingerprint, for each user is time- and resource-consuming due to the need to retrain the model to include the desired fingerprint. In this paper, we propose a retraining-free GAN fingerprinting method that allows model developers to easily generate model copies with the same functionality but different fingerprints. The generator is modified by inserting additional Personalized Normalization (PN) layers whose parameters (scaling and bias) are generated by two dedicated shallow networks (ParamGen Nets) taking the fingerprint as input. A watermark decoder is trained simultaneously to extract the fingerprint from the generated images. The proposed method can embed different fingerprints inside the GAN by just changing the input of the ParamGen Nets and performing a feedforward pass, without finetuning or retraining. The performance of the proposed method in terms of robustness against both model-level and image-level attacks is also superior to the state-of-the-art. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.11326 [pdf, other]

Integrated Sensing and Channel Estimation by Exploiting Dual Timescales for Delay-Doppler Alignment Modulation

Authors: Zhiqiang Xiao, Yong Zeng, Fuxi Wen, Zaichen Zhang, Derrick Wing Kwan Ng

Abstract: For integrated sensing and communication (ISAC) systems, the channel information essential for communication and sensing tasks fluctuates across different timescales. Specifically, wireless sensing primarily focuses on acquiring path state information (PSI) (e.g., delay, angle, and Doppler) of individual multi-path components to sense the environment, which usually evolves much more slowly than th… ▽ More For integrated sensing and communication (ISAC) systems, the channel information essential for communication and sensing tasks fluctuates across different timescales. Specifically, wireless sensing primarily focuses on acquiring path state information (PSI) (e.g., delay, angle, and Doppler) of individual multi-path components to sense the environment, which usually evolves much more slowly than the composite channel state information (CSI) required for communications. Typically, the CSI is approximately unchanged during the channel coherence time, which characterizes the statistical properties of wireless communication channels. However, this concept is less appropriate for describing that for wireless sensing. To this end, in this paper, we introduce a new timescale to study the variation of the PSI from a channel geometric perspective, termed path invariant time, during which the PSI largely remains constant. Our analysis indicates that the path invariant time considerably exceeds the channel coherence time. Thus, capitalizing on these dual timescales of the wireless channel, in this paper, we propose a novel ISAC framework exploiting the recently proposed delay-Doppler alignment modulation (DDAM) technique. Different from most existing studies on DDAM that assume the availability of perfect PSI, in this work, we propose a novel algorithm, termed as adaptive simultaneously orthogonal matching pursuit with support refinement (ASOMP-SR), for joint environment sensing and PSI estimation. We also analyze the performance of DDAM with imperfectly sensed PSI.Simulation results unveil that the proposed DDAM-based ISAC can achieve superior spectral efficiency and a reduced peak-to-average power ratio (PAPR) compared to standard orthogonal frequency division multiplexing (OFDM). △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2308.09512 [pdf, other]

Multiuser Communications with Movable-Antenna Base Station: Joint Antenna Positioning, Receive Combining, and Power Control

Authors: Zhenyu Xiao, Xiangyu Pi, Lipeng Zhu, Xiang-Gen Xia, Rui Zhang

Abstract: Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize th… ▽ More Movable antenna (MA) is an emerging technology which enables a local movement of the antenna in the transmitter/receiver region for improving the channel condition and communication performance. In this paper, we study the deployment of multiple MAs at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation due to MAs' movements at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region for MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna position vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BSs can significantly improve the rate performance as compared to conventional BSs with fixed-position antennas (FPAs). △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2308.05546

arXiv:2308.05546 [pdf, other]

Multiuser Communications with Movable-Antenna Base Station Via Antenna Position Optimization

Authors: Xiangyu Pi, Lipeng Zhu, Zhenyu Xiao, Rui Zhang

Abstract: This paper studies the deployment of multiple movable antennas (MAs) at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation caused by MAs' movement at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided… ▽ More This paper studies the deployment of multiple movable antennas (MAs) at the base station (BS) for enhancing the multiuser communication performance. First, we model the multiuser channel in the uplink to characterize the wireless channel variation caused by MAs' movement at the BS. Then, an optimization problem is formulated to maximize the minimum achievable rate among multiple users for MA-aided uplink multiuser communications by jointly optimizing the MAs' positions, their receive combining at the BS, and the transmit power of users, under the constraints of finite moving region of MAs, minimum inter-MA distance, and maximum transmit power of each user. To solve this challenging non-convex optimization problem, a two-loop iterative algorithm is proposed by leveraging the particle swarm optimization (PSO) method. Specifically, the outer-loop updates the positions of a set of particles, where each particle's position represents one realization of the antenna positioning vector (APV) of all MAs. The inner-loop implements the fitness evaluation for each particle in terms of the max-min achievable rate of multiple users with its corresponding APV, where the receive combining matrix of the BS and the transmit power of each user are optimized by applying the block coordinate descent (BCD) technique. Simulation results show that the antenna position optimization for MAs-aided BS can significantly improve the rate performance as compared to conventional BS with fixed-position antennas (FPAs). △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2308.04162 [pdf, other]

EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation

Authors: Jiajun Chen, Jiacheng Lin, Zhiqiang Xiao, Haolong Fu, Ke Nai, Kailun Yang, Zhiyong Li

Abstract: Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly-related tasks, which both aim to segment specific objects from video sequences according to user-provided expression prompts. However, due to the challenges in modeling representations for different modalities, contemporary methods struggle to strike a balance between interaction flexibili… ▽ More Audio-guided Video Object Segmentation (A-VOS) and Referring Video Object Segmentation (R-VOS) are two highly-related tasks, which both aim to segment specific objects from video sequences according to user-provided expression prompts. However, due to the challenges in modeling representations for different modalities, contemporary methods struggle to strike a balance between interaction flexibility and high-precision localization and segmentation. In this paper, we address this problem from two perspectives: the alignment representation of audio and text and the deep interaction among audio, text, and visual features. First, we propose a universal architecture, the Expression Prompt Collaboration Transformer, herein EPCFormer. Next, we propose an Expression Alignment (EA) mechanism for audio and text expressions. By introducing contrastive learning for audio and text expressions, the proposed EPCFormer realizes comprehension of the semantic equivalence between audio and text expressions denoting the same objects. Then, to facilitate deep interactions among audio, text, and video features, we introduce an Expression-Visual Attention (EVA) mechanism. The knowledge of video object segmentation in terms of the expression prompts can seamlessly transfer between the two tasks by deeply exploring complementary cues between text and audio. Experiments on well-recognized benchmarks demonstrate that our universal EPCFormer attains state-of-the-art results on both tasks. The source code of EPCFormer will be made publicly available at https://github.com/lab206/EPCFormer. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: The source code will be made publicly available at https://github.com/lab206/EPCFormer

arXiv:2308.03387 [pdf, other]

A Novel Joint Angle-Range-Velocity Estimation Method for MIMO-OFDM ISAC Systems

Authors: Zichao Xiao, Rang Liu, Ming Li, Qian Liu

Abstract: Integrated sensing and communications (ISAC) is emerging as a key technique for next-generation wireless systems. In order to expedite the practical implementation of ISAC within pervasive mobile networks, it is essential to equip widely-deployed base stations with radar sensing capabilities. Thus, the utilization of standardized multiple-input multiple-output (MIMO) orthogonal frequency division… ▽ More Integrated sensing and communications (ISAC) is emerging as a key technique for next-generation wireless systems. In order to expedite the practical implementation of ISAC within pervasive mobile networks, it is essential to equip widely-deployed base stations with radar sensing capabilities. Thus, the utilization of standardized multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) hardware architectures and waveforms becomes pivotal for realizing seamless integration of effective communication and sensing functionalities. In this paper, we introduce a novel joint angle-range-velocity estimation algorithm for the MIMO-OFDM ISAC system. This approach exclusively depends on conventional MIMO-OFDM communication waveforms, which are widely adopted in wireless communications. Specifically, the angle-range-velocity information of potential targets is jointly extracted by utilizing all the received echo signals within a coherent processing interval (CPI). Therefore, the proposed joint estimation algorithm can achieve larger processing gains and higher resolution by fully exploiting echo signals and jointly estimating the angle-range-velocity information. Theoretical analysis for maximum unambiguous range, resolution, and processing gains are provided to verify the advantages of the proposed joint estimation algorithm. Finally, extensive numerical experiments are presented to demonstrate that the proposed joint estimation approach can achieve significantly lower root-mean-square-error (RMSE) of angle/range/velocity estimation for both single-target and multi-target scenarios. △ Less

Submitted 28 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: 13 pages, 8 figures, submitted to IEEE Trans

arXiv:2306.01598 [pdf, other]

Towards Source-free Domain Adaptive Semantic Segmentation via Importance-aware and Prototype-contrast Learning

Authors: Yihong Cao, Hui Zhang, Xiao Lu, Zheng Xiao, Kailun Yang, Yaonan Wang

Abstract: Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods, making it especially relevant in the context of intelligent vehicles. It utilizes a well-trained source model and un… ▽ More Domain adaptive semantic segmentation enables robust pixel-wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods, making it especially relevant in the context of intelligent vehicles. It utilizes a well-trained source model and unlabeled target data to achieve adaptation in the target domain. However, in the absence of source data and target labels, current solutions cannot sufficiently reduce the impact of domain shift and fully leverage the information from the target data. In this paper, we propose an end-to-end source-free domain adaptation semantic segmentation method via Importance-Aware and Prototype-Contrast (IAPC) learning. The proposed IAPC framework effectively extracts domain-invariant knowledge from the well-trained source model and learns domain-specific knowledge from the unlabeled target domain. Specifically, considering the problem of domain shift in the prediction of the target domain by the source model, we put forward an importance-aware mechanism for the biased target prediction probability distribution to extract domain-invariant knowledge from the source model. We further introduce a prototype-contrast strategy, which includes a prototype-symmetric cross-entropy loss and a prototype-enhanced cross-entropy loss, to learn target intra-domain knowledge without relying on labels. A comprehensive variety of experiments on two domain adaptive semantic segmentation benchmarks demonstrates that the proposed end-to-end IAPC solution outperforms existing state-of-the-art methods. The source code is publicly available at https://github.com/yihong-97/Source-free-IAPC. △ Less

Submitted 26 March, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). The source code is publicly available at https://github.com/yihong-97/Source-free-IAPC

arXiv:2305.18994 [pdf, other]

Toward Real-World Light Field Super-Resolution

Authors: Zeyu Xiao, Ruisheng Gao, Yutong Liu, Yueyi Zhang, Zhiwei Xiong

Abstract: Deep learning has opened up new possibilities for light field super-resolution (SR), but existing methods trained on synthetic datasets with simple degradations (e.g., bicubic downsampling) suffer from poor performance when applied to complex real-world scenarios. To address this problem, we introduce LytroZoom, the first real-world light field SR dataset capturing paired low- and high-resolution… ▽ More Deep learning has opened up new possibilities for light field super-resolution (SR), but existing methods trained on synthetic datasets with simple degradations (e.g., bicubic downsampling) suffer from poor performance when applied to complex real-world scenarios. To address this problem, we introduce LytroZoom, the first real-world light field SR dataset capturing paired low- and high-resolution light fields of diverse indoor and outdoor scenes using a Lytro ILLUM camera. Additionally, we propose the Omni-Frequency Projection Network (OFPNet), which decomposes the omni-frequency components and iteratively enhances them through frequency projection operations to address spatially variant degradation processes present in all frequency components. Experiments demonstrate that models trained on LytroZoom outperform those trained on synthetic datasets and are generalizable to diverse content and devices. Quantitative and qualitative evaluations verify the superiority of OFPNet. We believe this work will inspire future research in real-world light field SR. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: CVPRW 2023

arXiv:2305.18847 [pdf, other]

Low-Range-Sidelobe Waveform Design for MIMO-OFDM ISAC Systems

Authors: Peishi Li, Zichao Xiao, Ming Li, Rang Liu, Qian Liu

Abstract: Integrated sensing and communication (ISAC) is a promising technology in future wireless systems owing to its efficient hardware and spectrum utilization. In this paper, we consider a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system and propose a novel waveform design to provide better radar ranging performance by taking range sidelobe suppression into… ▽ More Integrated sensing and communication (ISAC) is a promising technology in future wireless systems owing to its efficient hardware and spectrum utilization. In this paper, we consider a multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) ISAC system and propose a novel waveform design to provide better radar ranging performance by taking range sidelobe suppression into consideration. In specific, we aim to design MIMO-OFDM dual-function waveform to minimize its integrated sidelobe level (ISL) while satisfying the quality of service (QoS) requirements of multi-user communications and the transmit power constraint. To achieve a lower ISL, the symbol-level precoding (SLP) technique is employed to fully exploit the degrees of freedom (DoFs) of the waveform design in both temporal and spatial domains. An efficient algorithm utilizing majorization-minimization (MM) framework is developed to solve the non-convex waveform design problem. Simulation results reveal radar ranging performance improvement and demonstrate the benefits of the proposed SLP-based low-range-sidelobe waveform design in ISAC systems. △ Less

Submitted 23 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.11013 [pdf, other]

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Authors: Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

Abstract: This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manual… ▽ More This paper introduces FunASR, an open-source speech recognition toolkit designed to bridge the gap between academic research and industrial applications. FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications. The toolkit's flagship model, Paraformer, is a non-autoregressive end-to-end speech recognition model that has been trained on a manually annotated Mandarin speech recognition dataset that contains 60,000 hours of speech. To improve the performance of Paraformer, we have added timestamp prediction and hotword customization capabilities to the standard Paraformer backbone. In addition, to facilitate model deployment, we have open-sourced a voice activity detection model based on the Feedforward Sequential Memory Network (FSMN-VAD) and a text post-processing punctuation model based on the controllable time-delay Transformer (CT-Transformer), both of which were trained on industrial corpora. These functional modules provide a solid foundation for building high-precision long audio speech recognition services. Compared to other models trained on open datasets, Paraformer demonstrates superior performance. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: 5 pages, 3 figures, accepted by INTERSPEECH 2023

arXiv:2304.09620 [pdf]

DCELANM-Net:Medical Image Segmentation based on Dual Channel Efficient Layer Aggregation Network with Learner

Authors: Chengzhun Lu, Zhangrun Xia, Krzysztof Przystupa, Orest Kochan, Jun Su

Abstract: The DCELANM-Net structure, which this article offers, is a model that ingeniously combines a Dual Channel Efficient Layer Aggregation Network (DCELAN) and a Micro Masked Autoencoder (Micro-MAE). On the one hand, for the DCELAN, the features are more effectively fitted by deepening the network structure; the deeper network can successfully learn and fuse the features, which can more accurately loca… ▽ More The DCELANM-Net structure, which this article offers, is a model that ingeniously combines a Dual Channel Efficient Layer Aggregation Network (DCELAN) and a Micro Masked Autoencoder (Micro-MAE). On the one hand, for the DCELAN, the features are more effectively fitted by deepening the network structure; the deeper network can successfully learn and fuse the features, which can more accurately locate the local feature information; and the utilization of each layer of channels is more effectively improved by widening the network structure and residual connections. We adopted Micro-MAE as the learner of the model. In addition to being straightforward in its methodology, it also offers a self-supervised learning method, which has the benefit of being incredibly scaleable for the model. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2304.04267 [pdf, other]

From Data-driven Learning to Physics-inspired Inferring: A Novel Mobile MIMO Channel Prediction Scheme Based on Neural ODE

Authors: Zhuoran Xiao, Zhaoyang Zhang, Zirui Chen, Zhaohui Yang, Chongwen Huang, Xiaoming Chen

Abstract: In this paper, we propose an innovative learning-based channel prediction scheme so as to achieve higher prediction accuracy and reduce the requirements of huge amounts and strict sequential format of channel data. Inspired by the idea of the neural ordinary differential equation (Neural ODE), we first prove that the channel prediction problem can be modeled as an ODE problem with a known initial… ▽ More In this paper, we propose an innovative learning-based channel prediction scheme so as to achieve higher prediction accuracy and reduce the requirements of huge amounts and strict sequential format of channel data. Inspired by the idea of the neural ordinary differential equation (Neural ODE), we first prove that the channel prediction problem can be modeled as an ODE problem with a known initial value by analyzing the physical process of electromagnetic wave propagation within a mobile environment. Then, we design a novel physics-inspired spatial channel gradient network (SCGnet), which represents the derivative process of channel varying as a special neural network and can obtain the gradients at any relative displacement needed for the ODE solving. With the SCGnet, the static channel at any location served by the base station is accurately inferred through consecutive propagation and integration. Finally, we design an efficient recurrent positioning algorithm based on some prior knowledge of user mobility to obtain the velocity vector and propose an approximate Doppler compensation method to make up the instantaneous angular-delay domain channel. Only discrete historical channel data is needed for the training, whereas only a few fresh channel measurements are needed for the prediction, which ensures the scheme's practicability. Comprehensive evaluations show that the proposed scheme is most efficient in representing, learning, and predicting mobile wireless channels. △ Less

Submitted 29 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

arXiv:2303.06324 [pdf, other]

OCCL: a Deadlock-free Library for GPU Collective Communication

Authors: Lichen Pan, Juncheng Liu, **hui Yuan, Rongkai Zhang, Pengze Li, Zhen Xiao

Abstract: Various distributed deep neural network (DNN) training technologies lead to increasingly complicated use of collective communications on GPU. The deadlock-prone collectives on GPU force researchers to guarantee that collectives are enqueued in a consistent order on each GPU to prevent deadlocks. In complex distributed DNN training scenarios, manual hardcoding is the only practical way for deadlock… ▽ More Various distributed deep neural network (DNN) training technologies lead to increasingly complicated use of collective communications on GPU. The deadlock-prone collectives on GPU force researchers to guarantee that collectives are enqueued in a consistent order on each GPU to prevent deadlocks. In complex distributed DNN training scenarios, manual hardcoding is the only practical way for deadlock prevention, which poses significant challenges to the development of artificial intelligence. This paper presents OCCL, which is, to the best of our knowledge, the first deadlock-free collective communication library for GPU supporting dynamic decentralized preemption and gang-scheduling for collectives. Leveraging the preemption opportunity of collectives on GPU, OCCL dynamically preempts collectives in a decentralized way via the deadlock-free collective execution framework and allows dynamic decentralized gang-scheduling via the stickiness adjustment scheme. With the help of OCCL, researchers no longer have to struggle to get all GPUs to launch collectives in a consistent order to prevent deadlocks. We implement OCCL with several optimizations and integrate OCCL with a distributed deep learning framework OneFlow. Experimental results demonstrate that OCCL achieves comparable or better latency and bandwidth for collectives compared to NCCL, the state-of-the-art. When used in distributed DNN training, OCCL can improve the peak training throughput by up to 78% compared to statically sequenced NCCL, while introducing overheads of less than 6.5% across various distributed DNN training approaches. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2303.05736 [pdf, ps, other]

Cramér-Rao Bounds for Near-Field Sensing with Extremely Large-Scale MIMO

Authors: Huizhi Wang, Zhiqiang Xiao, Yong Zeng

Abstract: Mobile communication networks were designed to mainly support ubiquitous wireless communications, yet they are also expected to achieve radio sensing capabilities in the near future. However, most prior studies on radio sensing usually rely on far-field assumption with uniform plane wave (UPW) models. With the ever-increasing antenna size, together with the growing demands to sense nearby targets,… ▽ More Mobile communication networks were designed to mainly support ubiquitous wireless communications, yet they are also expected to achieve radio sensing capabilities in the near future. However, most prior studies on radio sensing usually rely on far-field assumption with uniform plane wave (UPW) models. With the ever-increasing antenna size, together with the growing demands to sense nearby targets, the conventional far-field UPW assumption may become invalid. Therefore, this paper studies near-field radio sensing with extremely large-scale (XL) antenna arrays, where the more general uniform spheric wave (USW) sensing model is considered. Closed-form expressions of the Cramér-Rao Bounds (CRBs) for both angle and range estimations are derived for near-field XL-MIMO radar mode and XL-phased array radar mode, respectively. Our results reveal that different from the conventional UPW model where the CRB for angle decreases unboundedly as the number of antennas increases, for XL-MIMO radar-based near-field sensing, the CRB decreases with diminishing return and approaches to a certain limit as the number of antennas increases. Besides, different from the far-field model where the CRB for range is infinity since it has no range estimation capability, that for the near-field case is finite. Furthermore, it is revealed that the commonly used spherical wave model based on second-order Taylor approximation is insufficient for near-field CRB analysis. Extensive simulation results are provided to validate our derived CRBs. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2212.09930 [pdf, other]

Frequency-limited H$_2$ Model Order Reduction Based on Relative Error

Authors: Umair Zulfiqar, Xin Du, Qiuyan Song, Zhi-Hua Xiao, Victor Sreeram

Abstract: Frequency-limited model order reduction aims to approximate a high-order model with a reduced-order model that maintains high fidelity within a specific frequency range. Beyond this range, a decrease in accuracy is acceptable due to the nature of the problem. The quality of the reduced-order model is typically evaluated using absolute or relative measures of approximation error. Relative error, wh… ▽ More Frequency-limited model order reduction aims to approximate a high-order model with a reduced-order model that maintains high fidelity within a specific frequency range. Beyond this range, a decrease in accuracy is acceptable due to the nature of the problem. The quality of the reduced-order model is typically evaluated using absolute or relative measures of approximation error. Relative error, which represents the percentage error, becomes particularly relevant when reducing a plant model for the purpose of designing a reduced-order controller. This paper derives the necessary conditions for achieving a local optimum of the frequency-limited H2 norm for the relative error system. Based on these optimality conditions, an oblique projection algorithm is proposed to ensure a small relative error within the desired frequency interval. Unlike existing algorithms, the proposed approach does not necessitate solving large-scale Lyapunov and Ricatti equations. Instead, the proposed algorithm relies on solving sparse-dense Sylvester equations, which typically emerge in the majority of H2 model order reduction algorithms, but can be efficiently solved. To evaluate the performance of the proposed algorithm, a comparison is conducted with three existing techniques: frequency-limited balanced truncation, frequency-limited balanced stochastic truncation, and frequency-limited iterative Rational Krylov algorithm. The comparative analysis focuses on designing reduced-order controllers for high-order plants. Numerical results confirm that the reduced-order controllers obtained using the proposed algorithm ensure superior robust closed-loop stability. △ Less

Submitted 24 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: arXiv admin note: text overlap with arXiv:2212.08247

arXiv:2212.08247 [pdf, other]

Relative Error-based Time-limited H2 Model Order Reduction via Oblique Projection

Authors: Umair Zulfiqar, Xin Du, Qiuyan Song, Zhi-Hua Xiao, Victor Sreeram

Abstract: In time-limited model order reduction, a reduced-order approximation of the original high-order model is obtained that accurately approximates the original model within the desired limited time interval. Accuracy outside that time interval is not that important. The error incurred when a reduced-order model is used as a surrogate for the original model can be quantified in absolute or relative ter… ▽ More In time-limited model order reduction, a reduced-order approximation of the original high-order model is obtained that accurately approximates the original model within the desired limited time interval. Accuracy outside that time interval is not that important. The error incurred when a reduced-order model is used as a surrogate for the original model can be quantified in absolute or relative terms to access the performance of the model reduction algorithm. The relative error is generally more meaningful than an absolute error because if the original and reduced systems' responses are of small magnitude, the absolute error is small in magnitude as well. However, this does not necessarily mean that the reduced model is accurate. The relative error in such scenarios is useful and meaningful as it quantifies percentage error irrespective of the magnitude of the system's response. In this paper, the necessary conditions for a local optimum of the time-limited H2 norm of the relative error system are derived. Inspired by these conditions, an oblique projection algorithm is proposed that ensures small H2-norm relative error within the desired time interval. Unlike the existing relative error-based model reduction algorithms, the proposed algorithm does not require solutions of large-scale Lyapunov and Riccati equations. The proposed algorithm is compared with time-limited balanced truncation, time-limited balanced stochastic truncation, and time-limited iterative Rational Krylov algorithm. Numerical results confirm the superiority of the proposed algorithm over these existing algorithms. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2212.05503 [pdf]

Low-rank Tensor Assisted K-space Generative Model for Parallel Imaging Reconstruction

Authors: Wei Zhang, Zengwei Xiao, Hui Tao, Minghui Zhang, Xiaoling Xu, Qiegen Liu

Abstract: Although recent deep learning methods, especially generative models, have shown good performance in fast magnetic resonance imaging, there is still much room for improvement in high-dimensional generation. Considering that internal dimensions in score-based generative models have a critical impact on estimating the gradient of the data distribution, we present a new idea, low-rank tensor assisted… ▽ More Although recent deep learning methods, especially generative models, have shown good performance in fast magnetic resonance imaging, there is still much room for improvement in high-dimensional generation. Considering that internal dimensions in score-based generative models have a critical impact on estimating the gradient of the data distribution, we present a new idea, low-rank tensor assisted k-space generative model (LR-KGM), for parallel imaging reconstruction. This means that we transform original prior information into high-dimensional prior information for learning. More specifically, the multi-channel data is constructed into a large Hankel matrix and the matrix is subsequently folded into tensor for prior learning. In the testing phase, the low-rank rotation strategy is utilized to impose low-rank constraints on tensor output of the generative network. Furthermore, we alternately use traditional generative iterations and low-rank high-dimensional tensor iterations for reconstruction. Experimental comparisons with the state-of-the-arts demonstrated that the proposed LR-KGM method achieved better performance. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2212.05294 [pdf, ps, other]

Variational Speech Waveform Compression to Catalyze Semantic Communications

Authors: Shengshi Yao, Zixuan Xiao, Sixian Wang, **cheng Dai, Kai Niu, ** Zhang

Abstract: We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are ana… ▽ More We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are analyzed and synthesized by a pair of nonlinear transforms, yielding latent features. An entropy model with hyperprior is built to capture the probabilistic distribution of latent features, followed with quantization and entropy coding. The proposed waveform codec can be optimized flexibly towards arbitrary rate, and the other appealing feature is that it can be easily optimized for any differentiable loss function, including perceptual loss used in semantic communications. To further improve the fidelity, we incorporate residual coding to mitigate the degradation arising from quantization distortion at the latent space. Results indicate that achieving the same performance, the proposed method saves up to 27% coding rate than widely used adaptive multi-rate wideband (AMR-WB) codec as well as emerging neural waveform coding methods. △ Less

Submitted 13 December, 2022; v1 submitted 10 December, 2022; originally announced December 2022.

arXiv:2212.00555 [pdf, other]

A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms

Authors: Zhengwang Xia, Tao Zhou, Saqib Mamoon, Amani Alfakih, Jianfeng Lu

Abstract: Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i.e., effective connectivity) between brain regions. Compared with traditional correlation-base… ▽ More Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i.e., effective connectivity) between brain regions. Compared with traditional correlation-based methods, effective connectivity can provide the direction of information flow, which may provide additional information for the diagnosis of brain diseases. However, existing methods either ignore the fact that there is a temporal-lag in the information transmission across brain regions, or simply set the temporal-lag value between all brain regions to a fixed value. To overcome these issues, we design an effective temporal-lag neural network (termed ETLN) to simultaneously infer the causal relationships and the temporal-lag values between brain regions, which can be trained in an end-to-end manner. In addition, we also introduce three mechanisms to better guide the modeling of brain networks. The evaluation results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate the effectiveness of the proposed method. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.12671 [pdf, other]

3-D Positioning and Resource Allocation for Multi-UAV Base Stations Under Blockage-Aware Channel Model

Authors: Pengfei Yi, Lipeng Zhu, Zhenyu Xiao, Rui Zhang, Zhu Han, Xiang-Gen Xia

Abstract: In this paper, we propose to deploy multiple unmanned aerial vehicle (UAV) mounted base stations to serve ground users in outdoor environments with obstacles. In particular, the geographic information is employed to capture the blockage effects for air-to-ground (A2G) links caused by buildings, and a realistic blockage-aware A2G channel model is proposed to characterize the continuous variation of… ▽ More In this paper, we propose to deploy multiple unmanned aerial vehicle (UAV) mounted base stations to serve ground users in outdoor environments with obstacles. In particular, the geographic information is employed to capture the blockage effects for air-to-ground (A2G) links caused by buildings, and a realistic blockage-aware A2G channel model is proposed to characterize the continuous variation of the channels at different locations. Based on the proposed channel model, we formulate the joint optimization problem of UAV three-dimensional (3-D) positioning and resource allocation, by power allocation, user association, and subcarrier allocation, to maximize the minimum achievable rate among users. To solve this non-convex combinatorial programming problem, we introduce a penalty term to relax it and develop a suboptimal solution via a penalty-based double-loop iterative optimization framework. The inner loop solves the penalized problem by employing the block successive convex approximation (BSCA) technique, where the UAV positioning and resource allocation are alternately optimized in each iteration. The outer loop aims to obtain proper penalty multipliers to ensure the solution of the penalized problem converges to that of the original problem. Simulation results demonstrate the superiority of the proposed algorithm over other benchmark schemes in terms of the minimum achievable rate. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.02283 [pdf, ps, other]

Wireless Deep Speech Semantic Transmission

Authors: Zixuan Xiao, Shengshi Yao, **cheng Dai, Sixian Wang, Kai Niu, ** Zhang

Abstract: In this paper, we propose a new class of high-efficiency semantic coded transmission methods for end-to-end speech transmission over wireless channels. We name the whole system as deep speech semantic transmission (DSST). Specifically, we introduce a nonlinear transform to map the speech source to semantic latent space and feed semantic features into source-channel encoder to generate the channel-… ▽ More In this paper, we propose a new class of high-efficiency semantic coded transmission methods for end-to-end speech transmission over wireless channels. We name the whole system as deep speech semantic transmission (DSST). Specifically, we introduce a nonlinear transform to map the speech source to semantic latent space and feed semantic features into source-channel encoder to generate the channel-input sequence. Guided by the variational modeling idea, we build an entropy model on the latent space to estimate the importance diversity among semantic feature embeddings. Accordingly, these semantic features of different importance can be allocated with different coding rates reasonably, which maximizes the system coding gain. Furthermore, we introduce a channel signal-to-noise ratio (SNR) adaptation mechanism such that a single model can be applied over various channel states. The end-to-end optimization of our model leads to a flexible rate-distortion (RD) trade-off, supporting versatile wireless speech semantic transmission. Experimental results verify that our DSST system clearly outperforms current engineered speech transmission systems on both objective and subjective metrics. Compared with existing neural speech semantic transmission methods, our model saves up to 75% of channel bandwidth costs when achieving the same quality. An intuitive comparison of audio demos can be found at https://ximoo123.github.io/DSST. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2209.08456 [pdf, other]

Deep Learning-Based Rate-Splitting Multiple Access for Reconfigurable Intelligent Surface-Aided Tera-Hertz Massive MIMO

Authors: Minghui Wu, Zhen Gao, Yang Huang, Zhenyu Xiao, Derrick Wing Kwan Ng, Zhaoyang Zhang

Abstract: Reconfigurable intelligent surface (RIS) can significantly enhance the service coverage of Tera-Hertz massive multiple-input multiple-output (MIMO) communication systems. However, obtaining accurate high-dimensional channel state information (CSI) with limited pilot and feedback signaling overhead is challenging, severely degrading the performance of conventional spatial division multiple access.… ▽ More Reconfigurable intelligent surface (RIS) can significantly enhance the service coverage of Tera-Hertz massive multiple-input multiple-output (MIMO) communication systems. However, obtaining accurate high-dimensional channel state information (CSI) with limited pilot and feedback signaling overhead is challenging, severely degrading the performance of conventional spatial division multiple access. To improve the robustness against CSI imperfection, this paper proposes a deep learning (DL)-based rate-splitting multiple access (RSMA) scheme for RIS-aided Tera-Hertz multi-user MIMO systems. Specifically, we first propose a hybrid data-model driven DL-based RSMA precoding scheme, including the passive precoding at the RIS as well as the analog active precoding and the RSMA digital active precoding at the base station (BS). To realize the passive precoding at the RIS, we propose a Transformer-based data-driven RIS reflecting network (RRN). As for the analog active precoding at the BS, we propose a match-filter based analog precoding scheme considering that the BS and RIS adopt the LoS-MIMO antenna array architecture. As for the RSMA digital active precoding at the BS, we propose a low-complexity approximate weighted minimum mean square error (AWMMSE) digital precoding scheme. Furthermore, for better precoding performance as well as lower computational complexity, a model-driven deep unfolding active precoding network (DFAPN) is also designed by combining the proposed AWMMSE scheme with DL. Then, to acquire accurate CSI at the BS for the investigated RSMA precoding scheme to achieve higher spectral efficiency, we propose a CSI acquisition network (CAN) with low pilot and feedback signaling overhead, where the downlink pilot transmission, CSI feedback at the user equipments (UEs), and CSI reconstruction at the BS are modeled as an end-to-end neural network based on Transformer. △ Less

Submitted 5 December, 2022; v1 submitted 17 September, 2022; originally announced September 2022.

Comments: Accepted by IEEE Journal on Selected Areas in Communications

arXiv:2208.03055 [pdf, ps, other]

Joint Beamforming Design in DFRC Systems for Wideband Sensing and OFDM Communications

Authors: Zichao Xiao, Rang Liu, Ming Li, Yang Liu, Qian Liu

Abstract: Dual-function radar-communication (DFRC) systems, which can efficiently utilize the congested spectrum and costly hardware resources by employing one common waveform for both sensing and communication (S&C), have attracted increasing attention. While the orthogonal frequency division multiplexing (OFDM) technique has been widely adopted to support high-quality communications, it also has great pot… ▽ More Dual-function radar-communication (DFRC) systems, which can efficiently utilize the congested spectrum and costly hardware resources by employing one common waveform for both sensing and communication (S&C), have attracted increasing attention. While the orthogonal frequency division multiplexing (OFDM) technique has been widely adopted to support high-quality communications, it also has great potentials of improving radar sensing performance and providing flexible S&C. In this paper, we propose to jointly design the dual-functional transmit signals occupying several subcarriers to realize multi-user OFDM communications and detect one moving target in the presence of clutter. Meanwhile, the signals in other frequency subcarriers can be optimized in a similar way to perform other tasks. The transmit beamforming and receive filter are jointly optimized to maximize the radar output signal-to-interference-plus-noise ratio (SINR), while satisfying the communication SINR requirement and the power budget. An majorization minimization (MM) method based algorithm is developed to solve the resulting non-convex optimization problem. Numerical results reveal the significant wideband sensing gain brought by jointly designing the transmit signals in different subcarriers, and demonstrate the advantages of our proposed scheme and the effectiveness of the developed algorithm. △ Less

Submitted 5 August, 2022; originally announced August 2022.

Comments: 6pages, 3 figures, accepted by Globecom

arXiv:2208.00837 [pdf]

The feasibility of Q-band millimeter wave on hand-gesture recognition for indoor FTTR scenario

Authors: Yuxuan Hu, Zhaoyang Xia, Yanbo Zhao, Feng Xu

Abstract: The generalization for different scenarios and dif-ferent users is an urgent problem for millimeter wave gesture recognition for indoor fiber-to-the-room (FTTR) scenario. In order to solve this problem and verify the feasibility of FTTR Q-band millimeter wave in gesture recognition, we build a real-time millimeter wave gesture recognition system. The moving hand-gestures are represented as a varie… ▽ More The generalization for different scenarios and dif-ferent users is an urgent problem for millimeter wave gesture recognition for indoor fiber-to-the-room (FTTR) scenario. In order to solve this problem and verify the feasibility of FTTR Q-band millimeter wave in gesture recognition, we build a real-time millimeter wave gesture recognition system. The moving hand-gestures are represented as a variety of time-variant spec-trum features, such as micro-Doppler feature, and then the feature learning and classification is realized by using a convo-lution neural network (CNN). The experimental results show that the millimeter wave gesture recognition system can achieve the generalized gesture recognition for 2 scenarios and 4 users. △ Less

Submitted 5 July, 2023; v1 submitted 22 July, 2022; originally announced August 2022.

arXiv:2207.11945 [pdf, other]

Terahertz-Band Near-Space Communications: From a Physical-Layer Perspective

Authors: Tianqi Mao, Leyi Zhang, Zhenyu Xiao, Zhu Han, Xiang-Gen Xia

Abstract: Facilitated by rapid technological development of the near-space platform stations (NSPS), near-space communication (NS-COM) is envisioned to play a pivotal role in the space-air-ground integrated network for sixth-generation (6G) communications and beyond. In NS-COM, ultra-broadband wireless connectivity between NSPSs and various airborne/spaceborne platforms is required for a plethora of bandwid… ▽ More Facilitated by rapid technological development of the near-space platform stations (NSPS), near-space communication (NS-COM) is envisioned to play a pivotal role in the space-air-ground integrated network for sixth-generation (6G) communications and beyond. In NS-COM, ultra-broadband wireless connectivity between NSPSs and various airborne/spaceborne platforms is required for a plethora of bandwidth-consuming applications, such as NSPS-based Ad hoc networking, in-flight Internet and relaying technology. However, such requirement seems to contradict with the scarcity of spectrum resources at conventional microwave frequencies, which motivates the exploitation of terahertz (THz) band ranging from 0.1 to 10 THz. Due to huge available bandwidth, the THz signals are capable of supporting ultra-high-rate data transmission for NS-COM over 100 Gb/s, which are naturally suitable for the near-space environment with marginal path loss. To this end, this article provides an extensive investigation on the THz-band NS-COM (THz-NS-COM) from a physical-layer perspective. Firstly, we summarize the potential applications of THz communications in the near-space environment, where the corresponding technical barriers are analyzed. Then the channel characteristics of THz-NS-COM and the corresponding modeling strategies are discussed, respectively. Afterwards, three essential research directions are investigated to surpass the technical barriers of THz-NS-COM, i.e., robust beamforming for ultra-massive antenna array, signal processing algorithms against hybrid distortions, and integrated sensing and communications. Several open problems are also provided to unleash the full potential of THz-NS-COM. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2207.11896 [pdf, ps, other]

LEO Satellite Access Network (LEO-SAN) Towards 6G: Challenges and Approaches

Authors: Zhenyu Xiao, Junyi Yang, Tianqi Mao, Chong Xu, Rui Zhang, Zhu Han, Xiang-Gen Xia

Abstract: With the rapid development of satellite communication technologies, the space-based access network has been envisioned as a promising complementary part of the future 6G network. Aside from terrestrial base stations, satellite nodes, especially the low-earth-orbit (LEO) satellites, can also serve as base stations for Internet access, and constitute the LEO-satellite-based access network (LEO-SAN).… ▽ More With the rapid development of satellite communication technologies, the space-based access network has been envisioned as a promising complementary part of the future 6G network. Aside from terrestrial base stations, satellite nodes, especially the low-earth-orbit (LEO) satellites, can also serve as base stations for Internet access, and constitute the LEO-satellite-based access network (LEO-SAN). LEO-SAN is expected to provide seamless massive access and extended coverage with high signal quality. However, its practical implementation still faces significant technical challenges, e.g., high mobility and limited budget for communication payloads of LEO satellite nodes. This paper aims at revealing the main technical issues that have not been fully addressed by the existing LEO-SAN designs, from three aspects namely random access, beam management and Doppler-resistant transmission technologies. More specifically, the critical issues of random access in LEO-SAN are discussed regarding low flexibility, long transmission delay, and inefficient handshakes. Then the beam management for LEO-SAN is investigated in complex propagation environments under the constraints of high mobility and limited payload budget. Furthermore, the influence of Doppler shifts on LEO-SAN is explored. Correspondingly, promising technologies to address these challenges are also discussed, respectively. Finally, the future research directions are envisioned. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2207.11883 [pdf, other]

Near Space Communications (NS-COM): A New Regime in Space-Air-Ground Integrated Network (SAGIN)

Authors: Zhenyu Xiao, Tianqi Mao, Zhu Han, Xiang-Gen Xia

Abstract: Precipitated by the technological innovations of the near-space platform stations (NSPS), the near space communication (NS-COM) network has emerged as an indispensable part of the next-generation space-air-ground integrated network (SAGIN) that facilitates ubiquitous coverage and broadband data transfer. This paper aims to provide a comprehensive overview of NS-COM. Firstly, we investigate the dif… ▽ More Precipitated by the technological innovations of the near-space platform stations (NSPS), the near space communication (NS-COM) network has emerged as an indispensable part of the next-generation space-air-ground integrated network (SAGIN) that facilitates ubiquitous coverage and broadband data transfer. This paper aims to provide a comprehensive overview of NS-COM. Firstly, we investigate the differences between NS-COM and the existing terrestrial cellular networks as well as satellite-based and unmanned-aerial-vehicle (UAV)-based communication networks, which is followed by a review of the NS-COM development. Then, we explore the unique characteristics of NS-COM regarding the platforms and the propagation environment of the near space. The main issues of NS-COM are identified, resulted from the extremely long transmission distance, limitations of the communication payloads on NSPS and complex atmospheric constitution of the near space. Then various application scenarios of NS-COM are discussed, where the special technical requirements are also revealed, from the physical-layer aspect like transceiver design to the upper-layer aspect like computational offloading and NSPS placement. Furthermore, we investigate the co-existence of NS-COM and ground networks by treating each other as interferers or collaborators. Finally, we list several potential technologies for NS-COM from the perspective of spectrum usage, and highlight their technical challenges for future research. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2207.10969 [pdf, ps, other]

Convergence Theory of Generalized Distributed Subgradient Method with Random Quantization

Authors: Zhaoyue Xia, Jun Du, Yong Ren

Abstract: The distributed subgradient method (DSG) is a widely discussed algorithm to cope with large-scale distributed optimization problems in the arising machine learning applications. Most exisiting works on DSG focus on ideal communication between the cooperative agents such that the shared information between agents is exact and perfect. This assumption, however, could lead to potential privacy concer… ▽ More The distributed subgradient method (DSG) is a widely discussed algorithm to cope with large-scale distributed optimization problems in the arising machine learning applications. Most exisiting works on DSG focus on ideal communication between the cooperative agents such that the shared information between agents is exact and perfect. This assumption, however, could lead to potential privacy concerns and is not feasible when the wireless transmission links are not of good quality. To overcome the challenge, a common approach is to quantize the data locally before transmission, which avoids exposure of raw data and significantly reduces the size of data. Compared with perfect data, quantization poses fundamental challenges on loss of data accuracy, which further impacts the convergence of the algorithms. To settle the problem, we propose a generalized distributed subgradient method with random quantization, which can be intepreted as a two time-scale stochastic approximation method. We provide comprehensive results on the convergence of the algorithm and derive upper bounds on the convergence rates in terms of the quantization bit, stepsizes and the number of network agents. Our results extend the existing results, where only special cases are considered and general conclusions for the convergence rates are missing. Finally, numerical simulations are conducted on linear regression problems to support our theoretical results. △ Less

Submitted 23 August, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.03736 [pdf, other]

Mobile MIMO Channel Prediction with ODE-RNN: a Physics-Inspired Adaptive Approach

Authors: Zhuoran Xiao, Zhaoyang Zhang, Zirui Chen, Zhaohui Yang, Richeng **

Abstract: Obtaining accurate channel state information (CSI) is crucial and challenging for multiple-input multiple-output (MIMO) wireless communication systems. Conventional channel estimation method cannot guarantee the accuracy of mobile CSI while requires high signaling overhead. Through exploring the intrinsic correlation among a set of historical CSI instances randomly obtained in a certain communicat… ▽ More Obtaining accurate channel state information (CSI) is crucial and challenging for multiple-input multiple-output (MIMO) wireless communication systems. Conventional channel estimation method cannot guarantee the accuracy of mobile CSI while requires high signaling overhead. Through exploring the intrinsic correlation among a set of historical CSI instances randomly obtained in a certain communication environment, channel prediction can significantly increase CSI accuracy and save signaling overhead. In this paper, we propose a novel channel prediction method based on ordinary differential equation (ODE)-recurrent neural network (RNN) for accurate and flexible mobile MIMO channel prediction. Differing from existing works using sequential network structures for exploring the numerical correlation between observed data, our proposed method tries to represent the implicit physics process of path responses changing by specially designed continuous learning network with ODE structure. Due to the targeted design of learning network, our proposed method fits the mathematics feature of CSI data better and enjoy higher network interpretability. Experimental results show that the proposed learning approach outperforms existing methods, especially for long time interval of the CSI sequence and large channel measurement error. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 7 pages, conference

arXiv:2207.03647 [pdf, ps, other]

Integrated Sensing and Communication with Delay Alignment Modulation: Performance Analysis and Beamforming Optimization

Authors: Zhiqiang Xiao, Yong Zeng

Abstract: Delay alignment modulation (DAM) has been recently proposed to enable manipulable channel delay spread for efficient single- or multi-carrier communications. In particular, with perfect delay alignment, inter-symbol interference (ISI) can be eliminated even with single-carrier (SC) transmission, without relying on sophisticated channel equalization. The key ideas of DAM are delay pre-compensation… ▽ More Delay alignment modulation (DAM) has been recently proposed to enable manipulable channel delay spread for efficient single- or multi-carrier communications. In particular, with perfect delay alignment, inter-symbol interference (ISI) can be eliminated even with single-carrier (SC) transmission, without relying on sophisticated channel equalization. The key ideas of DAM are delay pre-compensation and path-based beamforming, so that all multi-path signal components may arrive at the receiver simultaneously and be superimposed constructively, rather than causing the detrimental ISI. Compared to the classic orthogonal frequency division multiplexing (OFDM) transmission, DAM-enabled SC communication has several appealing advantages, including low peak-to-average-power ratio (PAPR) and high tolerance for Doppler frequency shift, which renders DAM also appealing for radar sensing. Therefore, in this paper, DAM is investigated for integrated sensing and communication (ISAC) systems. We first study the output signal-to-noise ratios (SNRs) for ISI-free SC communication and radar sensing, and then derive the closed-form expressions for DAM-based sensing in terms of the ambiguity function (AF) and integrated sidelobe ratio (ISR). Furthermore, we study the beamforming design problem for DAM-based ISAC to maximize the communication SNR while guaranteeing the sensing performance in terms of the sensing SNR and ISR. Finally, we provide performance comparison between DAM and OFDM for ISAC, and it is revealed that DAM signal may achieve better communication and sensing performance, thanks to its low PAPR, reduced guard interval overhead, as well as higher tolerance for Doppler frequency shift. Simulation results are provided to show the great potential of DAM for ISAC. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.03714 [pdf, ps, other]

Integrated Sensing and Communication with Delay Alignment Modulation

Authors: Zhiqiang Xiao, Yong Zeng

Abstract: Delay alignment modulation (DAM) has been recently proposed to enable inter-symbol interference (ISI)-free single-carrier (SC) communication without relying on sophisticated channel equalization. The key idea of DAM is to pre-introduce deliberate symbol delays at the transmitter side, so that all multi-path signal components may arrive at the receiver simultaneously and be superimposed constructiv… ▽ More Delay alignment modulation (DAM) has been recently proposed to enable inter-symbol interference (ISI)-free single-carrier (SC) communication without relying on sophisticated channel equalization. The key idea of DAM is to pre-introduce deliberate symbol delays at the transmitter side, so that all multi-path signal components may arrive at the receiver simultaneously and be superimposed constructively, rather than causing the detrimental ISI. Compared to the classic orthogonal frequency division multiplexing (OFDM) transmission, DAM has several appealing advantages, including low peak-to-average-power ratio (PAPR) and high tolerance for Doppler frequency shift, which makes DAM also appealing for radar sensing. Therefore, in this paper, DAM is investigated for the emerging integrated sensing and communication (ISAC) setup. We first derive the output signal-to-noise ratios (SNRs) for ISI-free communication and radar sensing, respectively, and then propose an efficient beamforming design for DAM-ISAC to maximize the communication SNR while guaranteeing the sensing performance. The comparison analysis of DAM versus OFDM for ISAC is developed, and it is revealed that DAM enables higher sensing SNR and larger Doppler frequency estimation. Simulation results are provided to show the great potential of DAM for ISAC △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.00891 [pdf, ps, other]

Low-Complexity Designs of Symbol-Level Precoding for MU-MISO Systems

Authors: Zichao Xiao, Rang Liu, Ming Li, Yang Liu, Qian Liu

Abstract: Symbol-level precoding (SLP), which converts the harmful multi-user interference (MUI) into beneficial signals, can significantly improve symbol-error-rate (SER) performance in multi-user communication systems. While enjoying symbolic gain, however, the complicated non-linear symbol-by-symbol precoder design suffers high computational complexity exponential with the number of users, which is unaff… ▽ More Symbol-level precoding (SLP), which converts the harmful multi-user interference (MUI) into beneficial signals, can significantly improve symbol-error-rate (SER) performance in multi-user communication systems. While enjoying symbolic gain, however, the complicated non-linear symbol-by-symbol precoder design suffers high computational complexity exponential with the number of users, which is unaffordable in realistic systems. In this paper, we propose a novel low-complexity grouped SLP (G-SLP) approach and develop efficient design algorithms for typical max-min fairness and power minimization problems. In particular, after dividing all users into several groups, the precoders for each group are separately designed on a symbol-by-symbol basis by only utilizing the symbol information of the users in that group, in which the intra-group MUI is exploited using the concept of constructive interference (CI) and the inter-group MUI is also effectively suppressed. In order to further reduce the computational complexity, we utilize the Lagrangian dual, Karush-Kuhn-Tucker (KKT) conditions and the majorization-minimization (MM) method to transform the resulting problems into more tractable forms, and develop efficient algorithms for obtaining closed-form solutions to them. Extensive simulation results illustrate that the proposed G-SLP strategy and design algorithms dramatically reduce the computational complexity without causing significant performance loss compared with the traditional SLP schemes. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: 15 pages, 10 figures, submitted to IEEE

arXiv:2203.10726 [pdf, other]

TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Ligong Han, Xiaoxiao He, Zhaoyang Xia, Song Wen, Qi Chang, Zhennan Yan, Mu Zhou, Dimitris Metaxas

Abstract: Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im… ▽ More Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation. △ Less

Submitted 5 September, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Showing 1–50 of 77 results for author: Xia, Z