Search | arXiv e-print repository

On the Coexistence of OTFS Modulation with OFDM-based Communication Systems

Authors: Akram Shafie, **hong Yuan, Paul Fitzpatrick, Taka Sakurai, Yuting Fang

Abstract: We investigate the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation of OTFS in the considered coexisting system. In this derivation, we consider (i) the inclusion of multiple cyclic prefixes… ▽ More We investigate the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation of OTFS in the considered coexisting system. In this derivation, we consider (i) the inclusion of multiple cyclic prefixes (CPs) with unequal lengths to the OTFS signal and (ii) edge carrier unloading (ECU), to account for the impacts of CP length, frame structure, and subcarrier arrangement described in 3GPP standards for 4G/5G systems. Our analysis reveals that the inclusion of multiple CPs to the OTFS signal and ECU lead to the channel response exhibiting spreading effects/leakage along the Doppler and delay dimensions, respectively. Consequently, the effective sampled delay-Doppler (DD) domain channel model for OTFS in coexisting systems may exhibit reduced sparsity. We also show that the effective DD domain channel coefficients for OTFS in coexisting systems are influenced by the unequal lengths of CPs. Subsequently, we propose an interference cancellation-based channel estimation (CE) technique for OTFS in coexisting systems. Through numerical results, we validate our analysis, highlight the importance of not ignoring the unequal lengths of CPs during signal detection, and show the significance of the proposed CE technique. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: This paper has been submitted for publication in an IEEE Journal. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: text overlap with arXiv:2311.06850

arXiv:2406.18548 [pdf]

Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is added to the network for processing. The brain glioma MRI image dataset provided by cancer imaging archives was experimentally verified. A multi-scale segmentation method based on a weighted least squares filter was used to complete the 3D reconstruction of brain tumors. Thus, the accuracy of three-dimensional reconstruction is further improved. Experiments show that the local texture features obtained by the proposed algorithm are similar to those obtained by laser scanning. The algorithm is improved by using the U-Net method and an accuracy of 0.9851 is obtained. This approach significantly enhances the precision of image segmentation and boosts the efficiency of image classification. △ Less

Submitted 23 May, 2024; originally announced June 2024.

arXiv:2406.07410 [pdf, other]

Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech

Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

Abstract: We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and prepro… ▽ More We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and preprocessed Pitt recordings results in typical levels (approximately 80%) of AD detection accuracy. These results demonstrate a Clever Hans effect in AD detection on the Pitt corpus. Our findings emphasize the crucial importance of maintaining vigilance regarding inherent biases in datasets utilized for training deep learning models, and highlight the necessity for a better understanding of the models' performance. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.04776 [pdf, ps, other]

OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

Authors: Tongyang Xu, Shuangyang Li, **hong Yuan

Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) sha** technique, revisiting the traditional Sinc b… ▽ More Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) sha** technique, revisiting the traditional Sinc back to 1924, with the aim of enhancing performance in industrial Internet of things (IIoT). In time-critical IIoT applications, low-latency and time-jitter tolerance are two critical factors that significantly impact the performance and reliability. Recognizing the inevitability of latency and jitter in practice, this work aims to propose a waveform technique to mitigate these effects via reducing latency and enhancing the system robustness under time jitter effects. The utilization of irSinc yields a signal with increased spectral efficiency without sacrificing error performance. Integrating the irSinc in a two-stage framework, a single-carrier non-orthogonal frequency sha** (SC-NOFS) waveform is developed, showcasing perfect compatibility with 5G standards, enabling the direct integration of irSinc in existing industrial IoT setups. Through 5G standard signal configuration, our signal achieves faster data transmission within the same spectral bandwidth. Hardware experiments validate an 18% saving in timing resources, leading to either reduced latency or enhanced jitter tolerance. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02126 [pdf, other]

CityLight: A Universal Model Towards Real-world City-scale Traffic Signal Control Coordination

Authors: **wei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

Abstract: Traffic signal control (TSC) is a promising low-cost measure to enhance transportation efficiency without affecting existing road infrastructure. While various reinforcement learning-based TSC methods have been proposed and experimentally outperform conventional rule-based methods, none of them has been deployed in the real world. An essential gap lies in the oversimplification of the scenarios in… ▽ More Traffic signal control (TSC) is a promising low-cost measure to enhance transportation efficiency without affecting existing road infrastructure. While various reinforcement learning-based TSC methods have been proposed and experimentally outperform conventional rule-based methods, none of them has been deployed in the real world. An essential gap lies in the oversimplification of the scenarios in terms of intersection heterogeneity and road network intricacy. To make TSC applicable in urban traffic management, we target TSC coordination in city-scale high-authenticity road networks, aiming to solve the three unique and important challenges: city-level scalability, heterogeneity of real-world intersections, and effective coordination among intricate neighbor connections. Since optimizing multiple agents in a parameter-sharing paradigm can boost the training efficiency and help achieve scalability, we propose our method, CityLight, based on the well-acknowledged optimization framework, parameter-sharing MAPPO. To ensure the unified policy network can learn to fit large-scale heterogeneous intersections and tackle the intricate between-neighbor coordination, CityLight proposes a universal representation module that consists of two key designs: heterogeneous intersection alignment and neighborhood impact alignment for coordination. To further boost coordination, CityLight adopts neighborhood-integrated rewards to transition from achieving local optimal to global optimal. Extensive experiments on datasets with hundreds to tens of thousands of real-world intersections and authentic traffic demands validate the surprising effectiveness and generalizability of CityLight, with an overall performance gain of 11.66% and a 22.59% improvement in transfer scenarios in terms of throughput. △ Less

Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.08295 [pdf, other]

SpeechVerse: A Large-scale Generalizable Audio Language Model

Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore develop SpeechVerse, a robust multi-task training and curriculum learning framework that combines pre-trained speech and text foundation models via a small set of learnable parameters, while kee** the pre-trained models frozen during training. The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions. We perform extensive benchmarking that includes comparing our model performance against traditional baselines across several datasets and tasks. Furthermore, we evaluate the model's capability for generalized instruction following by testing on out-of-domain datasets, novel prompts, and unseen tasks. Our empirical experiments reveal that our multi-task SpeechVerse model is even superior to conventional task-specific baselines on 9 out of the 11 tasks. △ Less

Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Single Column, 13 page

arXiv:2405.08288 [pdf, other]

Orthogonal Delay-Doppler Division Multiplexing Modulation with Tomlinson-Harashima Precoding

Authors: Yiyan Ma, Akram Shafie, **hong Yuan, Guoyu Ma, Zhangdui Zhong, Bo Ai

Abstract: The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima p… ▽ More The orthogonal delay-Doppler (DD) division multiplexing(ODDM) modulation has been recently proposed as a promising modulation scheme for next-generation communication systems with high mobility. Despite its benefits, ODDM modulation and other DD domain modulation schemes face the challenge of excessive equalization complexity. To address this challenge, we propose time domain Tomlinson-Harashima precoding (THP) for the ODDM transmitter, to make the DD domain single-tap equalizer feasible, thereby reducing the equalization complexity. In our design, we first pre-cancel the inter-symbolinterference (ISI) using the linear time-varying (LTV) channel information. Second, different from classical THP designs, we introduce a modified modulo operation with an adaptive modulus, by which the joint DD domain data multiplexing and timedomain ISI pre-cancellation can be realized without excessively increasing the bit errors. We then analytically study the losses encountered in this design, namely the power loss, the modulo noise loss, and the modulo signal loss. Based on this analysis, BER lower bounds of the ODDM system with time domain THP are derived when 4-QAM or 16-QAM modulations are adopted for symbol map** in the DD domain. Finally, through numerical results, we validate our analysis and then demonstrate that the ODDM system with time domain THP is a promising solution to realize better BER performance over LTV channels compared to orthogonal frequency division multiplexing systems with single-tap equalizer and ODDM systems with maximum ratio combining. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07547 [pdf, other]

Channel Coding Toward 6G: Technical Overview and Outlook

Authors: Mohammad Rowshan, Min Qiu, Yixuan Xie, Xinyi Gu, **hong Yuan

Abstract: Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Th… ▽ More Channel coding plays a pivotal role in ensuring reliable communication over wireless channels. With the growing need for ultra-reliable communication in emerging wireless use cases, the significance of channel coding has amplified. Furthermore, minimizing decoding latency is crucial for critical-mission applications, while optimizing energy efficiency is paramount for mobile and the Internet of Things (IoT) communications. As the fifth generation (5G) of mobile communications is currently in operation and 5G-advanced is on the horizon, the objective of this paper is to assess prominent channel coding schemes in the context of recent advancements and the anticipated requirements for the sixth generation (6G). In this paper, after considering the potential impact of channel coding on key performance indicators (KPIs) of wireless networks, we review the evolution of mobile communication standards and the organizations involved in the standardization, from the first generation (1G) to the current 5G, highlighting the technologies integral to achieving targeted KPIs such as reliability, data rate, latency, energy efficiency, spectral efficiency, connection density, and traffic capacity. Following this, we delve into the anticipated requirements for potential use cases in 6G. The subsequent sections of the paper focus on a comprehensive review of three primary coding schemes utilized in past generations and their recent advancements: low-density parity-check (LDPC) codes, turbo codes (including convolutional codes), polar codes (alongside Reed-Muller codes). Additionally, we examine alternative coding schemes like Fountain codes and sparse regression codes. Our evaluation includes a comparative analysis of error correction performance and the performance of hardware implementation for these coding schemes, providing insights into their potential and suitability for the upcoming 6G era. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 102 pages, 87 figures, IEEE Open Journal of the Communications Society (invited paper)

arXiv:2404.16253 [pdf, other]

Mitigating Automotive Radar Interference using Onboard Intelligent Reflective Surface

Authors: Shree Prasad Maruthi, Karrthik G. K., Vijaya Krishna A., Mahbub Hassan, **hong Yuan

Abstract: The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), o… ▽ More The use of automotive radars is gaining popularity as a means to enhance a vehicle's sensing capabilities. However, these radars can suffer from interference caused by transmissions from other radars mounted on nearby vehicles. To address this issue, we investigate the use of an onboard intelligent reflective surface (IRS) to artificially increase a vehicle's effective radar cross section (RCS), or its "electromagnetic visibility." Our proposed method utilizes the IRS's ability to form a coherent reflection of the incident radar waveform back towards the source radar, thereby improving radar performance under interference. We evaluated both passive and active IRS options. Passive IRS, which does not support reflection amplification, was found to be counter-productive and actually decreased the vehicle's effective RCS instead of enhancing it. In contrast, active IRS, which can amplify the reflection power of individual elements, effectively combats all types of automotive radar interference when the reflective elements are configured with a 15-35 dB reflection gain. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 7 pages, 9 Figures

arXiv:2403.14192 [pdf, ps, other]

Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS

Authors: Shuangyang Li, Peter Jung, Weijie Yuan, Zhiqiang Wei, **hong Yuan, Baoming Bai, Giuseppe Caire

Abstract: The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis… ▽ More The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis functions aligning with the time-frequency (TF)-consistency condition, which are globally quasi-periodic and locally twisted-shifted. We unveil that these features are translated to unique signal structures in both time and frequency, which are beneficial for communication purposes. Then, we focus on the practical implementations of DD Nyquist communications, where we show that rectangular windows achieve perfect DD orthogonality, while truncated periodic signals can obtain sufficient DD orthogonality. Particularly, smoothed rectangular window with excess bandwidth can result in a slightly worse orthogonality but better pulse localization in the DD domain. Furthermore, we present a practical pulse sha** framework for general DD communications and derive the corresponding input-output relation under various sha** pulses. Our numerical results agree with our derivations and also demonstrate advantages of DD communications over conventional orthogonal frequency-division multiplexing (OFDM). △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.10323 [pdf, ps, other]

Joint Optimization for Achieving Covertness in MIMO Over-the-Air Computation Networks

Authors: Junteng Yao, Tuo Wu, Ming **, Cunhua Pan, Quanzhong Li, **hong Yuan

Abstract: This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-sq… ▽ More This paper investigates covert data transmission within a multiple-input multiple-output (MIMO) over-the-air computation (AirComp) network, where sensors transmit data to the access point (AP) while guaranteeing covertness to the warden (Willie). Simultaneously, the AP introduces artificial noise (AN) to confuse Willie, meeting the covert requirement. We address the challenge of minimizing mean-square-error (MSE) of the AP, while considering transmit power constraints at both the AP and the sensors, as well as ensuring the covert transmission to Willie with a low detection error probability (DEP). However, obtaining globally optimal solutions for the investigated non-convex problem is challenging due to the interdependence of optimization variables. To tackle this problem, we introduce an exact penalty algorithm and transform the optimization problem into a difference-of-convex (DC) form problem to find a locally optimal solution. Simulation results showcase the superior performance in terms of our proposed scheme in comparison to the benchmark schemes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.02012 [pdf, other]

OTFS vs OFDM: Which is Superior in Multiuser LEO Satellite Communications

Authors: Yu Liu, Ming Chen, Cunhua Pan, Tantao Gong, **hong Yuan, Jiangzhou Wang

Abstract: Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satell… ▽ More Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satellite communication system have rarely been investigated. In this paper, we conduct a performance comparison under various channel conditions between the OTFS and OFDM modulations, encompassing evaluations of sum-rate and bit error ratio (BER). Additionally, we investigate the joint optimal allocation of power and delay-Doppler resource blocks aiming at maximizing sum-rate for multiuser downlink OTFS-based LEO satellite communication systems. Unlike the conventional modulations relaying on complex input-output relations within the Time-Frequency (TF) domain, the OTFS modulation exploits both time and frequency diversities, i.e., delay and Doppler shifts remain constant during a OTFS frame, which facilitates a DD domain input-output simple relation for our investigation. We transform the resulting non-convex and combinatorial optimization problem into an equivalent difference of convex problem by decoupling the conditional constraints, and solve the transformed problem via penalty convex-concave procedure algorithm. Simulation results demonstrate that the OTFS modulation is robust to carrier frequency offsets (CFO) caused by high-mobility of LEO satellites, and has superior performance to the OFDM modulation. Moreover, numerical results indicate that our proposed resource allocation scheme has higher sum-rate than existed schemes for the OTFS modulation, such as delay divided multiple access and Doppler divided multiple access, especially in the high signal-to-noise ratio (SNR) regime. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 13 pages, 9 figures

arXiv:2402.12127 [pdf, other]

Rate-Splitting Multiple Access for Transmissive Reconfigurable Intelligent Surface Transceiver Empowered ISAC System

Authors: Ziwei Liu, Wen Chen, Qingqing Wu, **hong Yuan, Shanshan Zhang, Zhendong Li, Jun Li

Abstract: In this paper, a novel transmissive reconfigurable intelligent surface (TRIS) transceiver empowered integrated sensing and communications (ISAC) system is proposed for future multi-demand terminals. To address interference management, we implement rate-splitting multiple access (RSMA), where the common stream is independently designed for the sensing service. We introduce the sensing quality of se… ▽ More In this paper, a novel transmissive reconfigurable intelligent surface (TRIS) transceiver empowered integrated sensing and communications (ISAC) system is proposed for future multi-demand terminals. To address interference management, we implement rate-splitting multiple access (RSMA), where the common stream is independently designed for the sensing service. We introduce the sensing quality of service (QoS) criteria based on this structure and construct an optimization problem with the sensing QoS criteria as the objective function to optimize the sensing stream precoding matrix and the communication stream precoding matrix. Due to the coupling of optimization variables, the formulated problem is a non-convex optimization problem that cannot be solved directly. To tackle the above-mentioned challenging problem, alternating optimization (AO) is utilized to decouple the optimization variables. Specifically, the problem is decoupled into three subproblems about the sensing stream precoding matrix, the communication stream precoding matrix, and the auxiliary variables, which is solved alternatively through AO until the convergence is reached. For solving the problem, successive convex approximation (SCA) is applied to deal with the sum-rate threshold constraints on communications, and difference-of-convex (DC) programming is utilized to solve rank-one non-convex constraints. Numerical simulation results verify the superiority of the proposed scheme in terms of improving the communication and sensing QoS. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.15164 [pdf, other]

AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations

Authors: Naresh Kumar Devulapally, Sidharth Anand, Sreyasee Das Bhattacharjee, Junsong Yuan, Yu-** Chang

Abstract: Analyzing individual emotions during group conversation is crucial in develo** intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make… ▽ More Analyzing individual emotions during group conversation is crucial in develo** intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make the task of emotion recognition very challenging. This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation. To meet this challenge, we propose a Multimodal Attention Network that captures cross-modal interactions at various levels of spatial abstraction by jointly learning its interactive bunch of mode-specific Peripheral and Central networks. The proposed MAN injects cross-modal attention via its Peripheral key-value pairs within each layer of a mode-specific Central query network. The resulting cross-attended mode-specific descriptors are then combined using an Adaptive Fusion technique that enables the model to integrate the discriminative and complementary mode-specific data patterns within an instance-specific multimodal descriptor. Given a dialogue represented by a sequence of utterances, the proposed AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level. This helps not only in delivering better classification performance (3-5% improvement in Weighted-F1 and 5-7% improvement in Accuracy) in large-scale public datasets but also helps the users in understanding the reasoning behind each emotion prediction made by the model via its Multimodal Explainability Visualization module. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.11058 [pdf, ps, other]

Low Complexity Turbo SIC-MMSE Detection for Orthogonal Time Frequency Space Modulation

Authors: Qi Li, **hong Yuan, Min Qiu, Shuangyang Li, Yixuan Xie

Abstract: Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are de… ▽ More Recently, orthogonal time frequency space (OTFS) modulation has garnered considerable attention due to its robustness against doubly-selective wireless channels. In this paper, we propose a low-complexity iterative successive interference cancellation based minimum mean squared error (SIC-MMSE) detection algorithm for zero-padded OTFS (ZP-OTFS) modulation. In the proposed algorithm, signals are detected based on layers processed by multiple SIC-MMSE linear filters for each sub-channel, with interference on the targeted signal layer being successively canceled either by hard or soft information. To reduce the complexity of computing individual layer filter coefficients, we also propose a novel filter coefficients recycling approach in place of generating the exact form of MMSE filter weights. Moreover, we design a joint detection and decoding algorithm for ZP-OTFS to enhance error performance. Compared to the conventional SIC-MMSE detection, our proposed algorithms outperform other linear detectors, e.g., maximal ratio combining (MRC), for ZP-OTFS with up to 3 dB gain while maintaining comparable computation complexity. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 15 pages, 12 figures, accepted by IEEE Transactions on Communications

arXiv:2401.01433 [pdf, other]

Multiple Access Techniques for Intelligent and Multi-Functional 6G: Tutorial, Survey, and Outlook

Authors: Bruno Clerckx, Yijie Mao, Zhaohui Yang, Mingzhe Chen, Ahmed Alkhateeb, Liang Liu, Min Qiu, **hong Yuan, Vincent W. S. Wong, Juan Montojo

Abstract: Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligenc… ▽ More Multiple access (MA) is a crucial part of any wireless system and refers to techniques that make use of the resource dimensions to serve multiple users/devices/machines/services, ideally in the most efficient way. Given the needs of multi-functional wireless networks for integrated communications, sensing, localization, computing, coupled with the surge of machine learning / artificial intelligence (AI) in wireless networks, MA techniques are expected to experience a paradigm shift in 6G and beyond. In this paper, we provide a tutorial, survey and outlook of past, emerging and future MA techniques and pay a particular attention to how wireless network intelligence and multi-functionality will lead to a re-thinking of those techniques. The paper starts with an overview of orthogonal, physical layer multicasting, space domain, power domain, ratesplitting, code domain MAs, and other domains, and highlight the importance of researching universal multiple access to shrink instead of grow the knowledge tree of MA schemes by providing a unified understanding of MA schemes across all resource dimensions. It then jumps into rethinking MA schemes in the era of wireless network intelligence, covering AI for MA such as AI-empowered resource allocation, optimization, channel estimation, receiver designs, user behavior predictions, and MA for AI such as federated learning/edge intelligence and over the air computation. We then discuss MA for network multi-functionality and the interplay between MA and integrated sensing, localization, and communications. We finish with studying MA for emerging intelligent applications before presenting a roadmap toward 6G standardization. We also point out numerous directions that are promising for future research. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: submitted for publication in Proceedings of the IEEE

arXiv:2311.15556 [pdf, other]

PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

Authors: Jiquan Yuan, Xinyan Cao, Chang** Li, Fanyi Yang, **long Lin, Xixin Cao

Abstract: As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natura… ▽ More As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natural images, and not all generated images meet the requirements of the real world. Therefore, it is of great significance to evaluate AIGIs more comprehensively. Although previous work has established several human perception-based AIGC image quality assessment (AIGCIQA) databases for text-generated images, the AI image generation technology includes scenarios like text-to-image and image-to-image, and assessing only the images generated by text-to-image models is insufficient. To address this issue, we establish a human perception-based image-to-image AIGCIQA database, named PKU-I2IQA. We conduct a well-organized subjective experiment to collect quality labels for AIGIs and then conduct a comprehensive analysis of the PKU-I2IQA database. Furthermore, we have proposed two benchmark models: NR-AIGCIQA based on the no-reference image quality assessment method and FR-AIGCIQA based on the full-reference image quality assessment method. Finally, leveraging this database, we conduct benchmark experiments and compare the performance of the proposed benchmark models. The PKU-I2IQA database and benchmarks will be released to facilitate future research on \url{https://github.com/jiquan123/I2IQA}. △ Less

Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 18 pages

arXiv:2311.13787 [pdf, other]

A Fast Power Spectrum Sensing Solution for Generalized Coprime Sampling

Authors: Kaili Jiang, Dechang Wang, Kailun Tian, Hancong Feng, Yuxin Zhao, Junyu Yuan, Bin Tang

Abstract: The growing scarcity of spectrum resources, wideband spectrum sensing is required to process a prohibitive volume of data at a high sampling rate. For some applications, spectrum estimation only requires second-order statistics. In this case, a fast power spectrum sensing solution is proposed based on the generalized coprime sampling. By exploring the sensing vector inherent structure, the autocor… ▽ More The growing scarcity of spectrum resources, wideband spectrum sensing is required to process a prohibitive volume of data at a high sampling rate. For some applications, spectrum estimation only requires second-order statistics. In this case, a fast power spectrum sensing solution is proposed based on the generalized coprime sampling. By exploring the sensing vector inherent structure, the autocorrelation sequence of inputs can be reconstructed from sub-Nyquist samples by only utilizing the parallel Fourier transform and simple multiplication operations. Thus, it takes less time than the state-of-the-art methods while maintaining the same performance, and it achieves higher performance than the existing methods within the same execution time, without the need for pre-estimating the number of inputs. Furthermore, the influence of the model mismatch has only a minor impact on the estimation performance, which allows for more efficient use of the spectrum resource in a distributed swarm scenario. Simulation results demonstrate the low complexity in sampling and computation, making it a more practical solution for real-time and distributed wideband spectrum sensing applications. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.07238 [pdf, ps, other]

Time-Frequency Localization Characteristics of the Delay-Doppler Plane Orthogonal Pulse

Authors: Akram Shafie, **hong Yuan, Nan Yang, Hai Lin

Abstract: The orthogonal delay-Doppler (DD) division multiplexing (ODDM) modulation has recently been proposed as a promising solution for ensuring reliable communications in high mobility scenarios. In this work, we investigate the time-frequency (TF) localization characteristics of the DD plane orthogonal pulse (DDOP), which is the prototype pulse of ODDM modulation. The TF localization characteristics ex… ▽ More The orthogonal delay-Doppler (DD) division multiplexing (ODDM) modulation has recently been proposed as a promising solution for ensuring reliable communications in high mobility scenarios. In this work, we investigate the time-frequency (TF) localization characteristics of the DD plane orthogonal pulse (DDOP), which is the prototype pulse of ODDM modulation. The TF localization characteristics examine how concentrated or spread out the energy of a pulse is in the joint TF domain. We first derive the TF localization metric, TF area (TFA), for the DDOP. Based on this result, we provide insights into the energy spread of the DDOP in the joint TF domain. Then, we delve into the potential advantages of the DDOP due to its energy spread, particularly in terms of leveraging both time and frequency diversities, and enabling high-resolution sensing. Furthermore, we determine the TFA for the recently proposed generalized design of the DDOP. Finally, we validate our analysis based on numerical results and show that the energy spread for the generalized design of the DDOP in the joint TF domain exhibits a step-wise increase as the duration of sub-pulses increases. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: This paper has been submitted for publication in an IEEE Conference. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.06850 [pdf, ps, other]

Coexistence of OTFS Modulation With OFDM-based Communication Systems

Authors: Akram Shafie, **hong Yuan, Yuting Fang, Paul Fitzpatrick, Taka Sakurai

Abstract: This study examines the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) wireless communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation (IOR) of OTFS when it coexists with an OFDM system while considering the impact of unequal lengths of the cy… ▽ More This study examines the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) wireless communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation (IOR) of OTFS when it coexists with an OFDM system while considering the impact of unequal lengths of the cyclic prefixes (CPs) in the OTFS signal. We show analytically that the inclusion of multiple CPs to the OTFS signal results in the effective sampled delay-Doppler (DD) domain channel response to be less sparse. We also show that the effective DD domain channel coefficients for OTFS in coexisting systems are influenced by the unequal lengths of the CPs. Subsequently, we propose an embedded pilot-aided channel estimation (CE) technique for OTFS in coexisting systems that leverages the derived IOR for accurate channel characterization. Using numerical results, we show that ignoring the impact of unequal lengths of the CPs during signal detection can degrade the bit error rate performance of OTFS in coexisting systems. We also show that the proposed CE technique for OTFS in coexisting systems outperforms the state-of-the-art threshold-based CE technique. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: This paper has been accepted for publication in IEEE Global Communications Conferences (GLOBECOM) 2023. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2310.19087 [pdf, other]

Transport-of-Intensity Model for Single-Mask X-ray Differential Phase Contrast Imaging

Authors: **gcheng Yuan, Mini Das

Abstract: X-ray phase contrast imaging holds great promise for improving the visibility of light-element materials such as soft tissues and tumors. Single-mask differential phase contrastnimaging method stands out as a simple and effective approach to yield differential phase contrast. In this work, we introduce a novel model for a single-mask phase imaging system based on the transport-of-intensity equatio… ▽ More X-ray phase contrast imaging holds great promise for improving the visibility of light-element materials such as soft tissues and tumors. Single-mask differential phase contrastnimaging method stands out as a simple and effective approach to yield differential phase contrast. In this work, we introduce a novel model for a single-mask phase imaging system based on the transport-of-intensity equation. Our model provides an accessible understanding of signal and contrast formation in single-mask X-ray phase imaging, offering a clear perspective on the image formation process, for example, the origin of alternate bright and dark fringes in phase contrast intensity images. Aided by our model, we present an efficient retrieval method that yields differential phase contrast imagery in a single acquisition step. Our model gives insight into the contrast generation and its dependence on the system geometry and imaging parameters in both the initial intensity image as well as in retrieved images. The model validity as well as the proposed retrieval method is demonstrated via both experimental results on a system developed in-house as well as with Monte Carlo simulations. In conclusion, our work not only provides a model for an intuitive visualization of image formation but also offers a method to optimize differential phase imaging setups, holding tremendous promise for advancing medical diagnostics and other applications. △ Less

Submitted 31 January, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: 10 pages, 6 figures

arXiv:2309.01823 [pdf]

Multi-dimension unified Swin Transformer for 3D Lesion Segmentation in Multiple Anatomical Locations

Authors: Shaoyan Pan, Yiqiao Liu, Sarah Halek, Michal Tomaszewski, Shubing Wang, Richard Baumgartner, Jianda Yuan, Gregory Goldmacher, Antong Chen

Abstract: In oncology research, accurate 3D segmentation of lesions from CT scans is essential for the modeling of lesion growth kinetics. However, following the RECIST criteria, radiologists routinely only delineate each lesion on the axial slice showing the largest transverse area, and delineate a small number of lesions in 3D for research purposes. As a result, we have plenty of unlabeled 3D volumes and… ▽ More In oncology research, accurate 3D segmentation of lesions from CT scans is essential for the modeling of lesion growth kinetics. However, following the RECIST criteria, radiologists routinely only delineate each lesion on the axial slice showing the largest transverse area, and delineate a small number of lesions in 3D for research purposes. As a result, we have plenty of unlabeled 3D volumes and labeled 2D images, and scarce labeled 3D volumes, which makes training a deep-learning 3D segmentation model a challenging task. In this work, we propose a novel model, denoted a multi-dimension unified Swin transformer (MDU-ST), for 3D lesion segmentation. The MDU-ST consists of a Shifted-window transformer (Swin-transformer) encoder and a convolutional neural network (CNN) decoder, allowing it to adapt to 2D and 3D inputs and learn the corresponding semantic information in the same encoder. Based on this model, we introduce a three-stage framework: 1) leveraging large amount of unlabeled 3D lesion volumes through self-supervised pretext tasks to learn the underlying pattern of lesion anatomy in the Swin-transformer encoder; 2) fine-tune the Swin-transformer encoder to perform 2D lesion segmentation with 2D RECIST slices to learn slice-level segmentation information; 3) further fine-tune the Swin-transformer encoder to perform 3D lesion segmentation with labeled 3D volumes. The network's performance is evaluated by the Dice similarity coefficient (DSC) and Hausdorff distance (HD) using an internal 3D lesion dataset with 593 lesions extracted from multiple anatomical locations. The proposed MDU-ST demonstrates significant improvement over the competing models. The proposed method can be used to conduct automated 3D lesion segmentation to assist radiomics and tumor growth modeling studies. This paper has been accepted by the IEEE International Symposium on Biomedical Imaging (ISBI) 2023. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.00928 [pdf, other]

S$^3$-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection

Authors: Xuan He, Kailun Yang, Junwei Zheng, ** Yuan, Luis M. Bergasa, Hui Zhang, Zhiyong Li

Abstract: Recently, transformer-based methods have shown exceptional performance in monocular 3D object detection, which can predict 3D attributes from a single 2D image. These methods typically use visual and depth representations to generate query points on objects, whose quality plays a decisive role in the detection accuracy. However, current unsupervised attention mechanisms without any geometry appear… ▽ More Recently, transformer-based methods have shown exceptional performance in monocular 3D object detection, which can predict 3D attributes from a single 2D image. These methods typically use visual and depth representations to generate query points on objects, whose quality plays a decisive role in the detection accuracy. However, current unsupervised attention mechanisms without any geometry appearance awareness in transformers are susceptible to producing noisy features for query points, which severely limits the network performance and also makes the model have a poor ability to detect multi-category objects in a single training process. To tackle this problem, this paper proposes a novel "Supervised Shape&Scale-perceptive Deformable Attention" (S$^3$-DA) module for monocular 3D object detection. Concretely, S$^3$-DA utilizes visual and depth features to generate diverse local features with various shapes and scales and predict the corresponding matching distribution simultaneously to impose valuable shape&scale perception for each query. Benefiting from this, S$^3$-DA effectively estimates receptive fields for query points belonging to any category, enabling them to generate robust query features. Besides, we propose a Multi-classification-based Shape$\&$Scale Matching (MSM) loss to supervise the above process. Extensive experiments on KITTI and Waymo Open datasets demonstrate that S$^3$-DA significantly improves the detection accuracy, yielding state-of-the-art performance of single-category and multi-category 3D object detection in a single training process compared to the existing approaches. The source code will be made publicly available at https://github.com/mikasa3lili/S3-MonoDETR. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: The source code will be made publicly available at https://github.com/mikasa3lili/S3-MonoDETR

arXiv:2308.08883 [pdf, other]

Coexistence of Heterogeneous Services in the Uplink with Discrete Signaling and Treating Interference as Noise

Authors: Min Qiu, Yu-Chih Huang, **hong Yuan

Abstract: The problem of enabling the coexistence of heterogeneous services, e.g., different ultra-reliable low-latency communications (URLLC) services and/or enhanced mobile broadband (eMBB) services, in the uplink is studied. Each service has its own error probability and blocklength constraints and the longer transmission block suffers from heterogeneous interference. Due to the latency concern, the deco… ▽ More The problem of enabling the coexistence of heterogeneous services, e.g., different ultra-reliable low-latency communications (URLLC) services and/or enhanced mobile broadband (eMBB) services, in the uplink is studied. Each service has its own error probability and blocklength constraints and the longer transmission block suffers from heterogeneous interference. Due to the latency concern, the decoding of URLLC messages cannot leverage successive interference cancellation (SIC) and should always be performed before the decoding of eMBB messages. This can significantly degrade the achievable rates of URLLC users when the interference from other users is strong. To overcome this issue, we propose a new transmission scheme based on discrete signaling and treating interference as noise decoding, i.e., without SIC. Guided by the deterministic model, we provide a systematic way to construct discrete signaling for handling heterogeneous interference effectively. We demonstrate theoretically and numerically that the proposed scheme can perform close to the benchmark scheme based on capacity-achieving Gaussian signaling with the assumption of perfect SIC. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 7 pages, accepted for presentation at IEEE Global Communications Conference (GLOBECOM) 2023

arXiv:2308.08172 [pdf, other]

AATCT-IDS: A Benchmark Abdominal Adipose Tissue CT Image Dataset for Image Denoising, Semantic Segmentation, and Radiomics Evaluation

Authors: Zhiyu Ma, Chen Li, Tianming Du, Le Zhang, Dechao Tang, Deguo Ma, Shanchuan Huang, Yan Liu, Yihao Sun, Zhihao Chen, ** Yuan, Qianqing Nie, Marcin Grzegorzek, Hongzan Sun

Abstract: Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentati… ▽ More Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentation models, and study radiomics. For different tasks, this paper compares and analyzes the performance of various methods on AATTCT-IDS by combining the visualization results and evaluation data. Thus, verify the research potential of this data set in the above three types of tasks. Results: In the comparative study of image denoising, algorithms using a smoothing strategy suppress mixed noise at the expense of image details and obtain better evaluation data. Methods such as BM3D preserve the original image structure better, although the evaluation data are slightly lower. The results show significant differences among them. In the comparative study of semantic segmentation of abdominal adipose tissue, the segmentation results of adipose tissue by each model show different structural characteristics. Among them, BiSeNet obtains segmentation results only slightly inferior to U-Net with the shortest training time and effectively separates small and isolated adipose tissue. In addition, the radiomics study based on AATTCT-IDS reveals three adipose distributions in the subject population. Conclusion: AATTCT-IDS contains the ground truth of adipose tissue regions in abdominal CT slices. This open-source dataset can attract researchers to explore the multi-dimensional characteristics of abdominal adipose tissue and thus help physicians and patients in clinical practice. AATCT-IDS is freely published for non-commercial purpose at: \url{https://figshare.com/articles/dataset/AATTCT-IDS/23807256}. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 17 pages, 7 figures

arXiv:2308.07079 [pdf]

Wideband Spectrum Acquisition for UAV Swarm Using the Sparse Coding Fourier Transform

Authors: Kaili Jiang, Kailun Tian, Hancong Feng, Junyu Yuan, Bin Tang

Abstract: As the trend towards small, safe, smart, speedy and swarm development grows, unmanned aerial vehicles (UAVs) are becoming increasingly popular for a wide range of applications. In this letter, the challenge of wideband spectrum acquisition for the UAV swarms is studied by proposing a processing method that features lower power consumption, higher compression rates, and a lower signal-to-noise rati… ▽ More As the trend towards small, safe, smart, speedy and swarm development grows, unmanned aerial vehicles (UAVs) are becoming increasingly popular for a wide range of applications. In this letter, the challenge of wideband spectrum acquisition for the UAV swarms is studied by proposing a processing method that features lower power consumption, higher compression rates, and a lower signal-to-noise ratio. Our system is equipped with multiple UAVs, each with a different sub-sampling rate. That allows for frequency backetization and estimation based on sparse Fourier transform theory. Unlike other techniques, the collisions and iterations caused by non-sparsity environ-ments are considered. We introduce sparse coding Fourier transform to address these issues. The key is to code the entire spectrum and decode it through spectrum correlation in the code. Simulation results show that our proposed method performs well in acquiring both narrowband and wideband signals simultaneously, compared to the other methods. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.07077 [pdf]

Distributed UAV Swarm Augmented Wideband Spectrum Sensing Using Nyquist Folding Receiver

Authors: Kaili Jiang, Kailun Tian, Hancong Feng, Yuxin Zhao, Dechang Wang, Sen Cao, Jian Gao, Xuying Zhang, Yanfei Li, Junyu Yuan, Ying Xiong, Bin Tang

Abstract: Distributed unmanned aerial vehicle (UAV) swarms are formed by multiple UAVs with increased portability, higher levels of sensing capabilities, and more powerful autonomy. These features make them attractive for many recent applica-tions, potentially increasing the shortage of spectrum resources. In this paper, wideband spectrum sensing augmented technology is discussed for distributed UAV swarms… ▽ More Distributed unmanned aerial vehicle (UAV) swarms are formed by multiple UAVs with increased portability, higher levels of sensing capabilities, and more powerful autonomy. These features make them attractive for many recent applica-tions, potentially increasing the shortage of spectrum resources. In this paper, wideband spectrum sensing augmented technology is discussed for distributed UAV swarms to improve the utilization of spectrum. However, the sub-Nyquist sampling applied in existing schemes has high hardware complexity, power consumption, and low recovery efficiency for non-strictly sparse conditions. Thus, the Nyquist folding receiver (NYFR) is considered for the distributed UAV swarms, which can theoretically achieve full-band spectrum detection and reception using a single analog-to-digital converter (ADC) at low speed for all circuit components. There is a focus on the sensing model of two multichannel scenarios for the distributed UAV swarms, one with a complete functional receiver for the UAV swarm with RIS, and another with a decentralized UAV swarm equipped with a complete functional receiver for each UAV element. The key issue is to consider whether the application of RIS technology will bring advantages to spectrum sensing and the data fusion problem of decentralized UAV swarms based on the NYFR architecture. Therefore, the property for multiple pulse reconstruction is analyzed through the Gershgorin circle theorem, especially for very short pulses. Further, the block sparse recovery property is analyzed for wide bandwidth signals. The proposed technology can improve the processing capability for multiple signals and wide bandwidth signals while reducing interference from folded noise and subsampled harmonics. Experiment results show augmented spectrum sensing efficiency under non-strictly sparse conditions. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.07075 [pdf, other]

Wideband Power Spectrum Sensing: a Fast Practical Solution for Nyquist Folding Receiver

Authors: Kaili Jiang, Dechang Wang, Kailun Tian, Hancong Feng, Yuxin Zhao, Sen Cao, Jian Gao, Xuying Zhang, Yanfei Li, Junyu Yuan, Ying Xiong, Bin Tang

Abstract: The limited availability of spectrum resources has been growing into a critical problem in wireless communications, remote sensing, and electronic surveillance, etc. To address the high-speed sampling bottleneck of wideband spectrum sensing, a fast and practical solution of power spectrum estimation for Nyquist folding receiver (NYFR) is proposed in this paper. The NYFR architectures is can theore… ▽ More The limited availability of spectrum resources has been growing into a critical problem in wireless communications, remote sensing, and electronic surveillance, etc. To address the high-speed sampling bottleneck of wideband spectrum sensing, a fast and practical solution of power spectrum estimation for Nyquist folding receiver (NYFR) is proposed in this paper. The NYFR architectures is can theoretically achieve the full-band signal sensing with a hundred percent of probability of intercept. But the existing algorithm is difficult to realize in real-time due to its high complexity and complicated calculations. By exploring the sub-sampling principle inherent in NYFR, a computationally efficient method is introduced with compressive covariance sensing. That can be efficient implemented via only the non-uniform fast Fourier transform, fast Fourier transform, and some simple multiplication operations. Meanwhile, the state-of-the-art power spectrum reconstruction model for NYFR of time-domain and frequency-domain is constructed in this paper as a comparison. Furthermore, the computational complexity of the proposed method scales linearly with the Nyquist-rate sampled number of samples and the sparsity of spectrum occupancy. Simulation results and discussion demonstrate that the low complexity in sampling and computation is a more practical solution to meet the real-time wideband spectrum sensing applications. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.01802 [pdf, ps, other]

Multi-Carrier Modulation: An Evolution from Time-Frequency Domain to Delay-Doppler Domain

Authors: Hai Lin, **hong Yuan, Wei Yu, **gxian Wu, Lajos Hanzo

Abstract: The recently proposed orthogonal delay-Doppler division multiplexing (ODDM) modulation, which is based on the new delay-Doppler (DD) domain orthogonal pulse (DDOP), is studied. A substantial benefit of the DDOP-based ODDM or general delay-Doppler domain multi-carrier (DDMC) modulation is that it achieves orthogonality with respect to the fine time and frequency resolutions of the DD domain. We fir… ▽ More The recently proposed orthogonal delay-Doppler division multiplexing (ODDM) modulation, which is based on the new delay-Doppler (DD) domain orthogonal pulse (DDOP), is studied. A substantial benefit of the DDOP-based ODDM or general delay-Doppler domain multi-carrier (DDMC) modulation is that it achieves orthogonality with respect to the fine time and frequency resolutions of the DD domain. We first revisit the family of wireless channel models conceived for linear time-varying (LTV) channels, and then review the conventional multi-carrier (MC) modulation schemes and their design guidelines for both linear time-invariant (LTI) and LTV channels. Then we discuss the time-varying property of the LTV channels' DD domain impulse response and propose an impulse function based transmission strategy for equivalent sampled DD domain (ESDD) channels. Next, we take an in-depth look into the DDOP and the corresponding ODDM modulation to unveil its unique input-output relation for transmission over ESDD channels. Then, we point out that the conventional MC modulation design guidelines based on the Wely-Heisenberg (WH) frame theory can be relaxed without compromising its orthogonality or without violating the WH frame theory. More specifically, for a communication system having given bandwidth and duration, MC modulation signals can be designed based on a WH subset associated with sufficient (bi)orthogonality, which governs the (bi)orthogonality of the MC signal within the bandwidth and duration. This novel design guideline could potentially open up opportunities for develo** future waveforms required by new applications such as communication systems associated with high delay and/or Doppler shifts, as well as integrated sensing and communications, etc. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: This paper has been submitted to the IEEE for possible publication. The supplementary material of this work will be posted at https://www.omu.ac.jp/eng/ees-sic/oddm/

arXiv:2308.01147 [pdf, other]

Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation

Authors: Guo** Zhong, ** Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li

Abstract: The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive posit… ▽ More The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive positive/negative samples into the diffusion model to boost performance for markup-to-image generation. Technically, we design a fine-grained cross-modal alignment module to well explore the sequence similarity between the two modalities for learning robust feature representations. To improve the generalization ability, we propose a contrast-augmented diffusion model to explicitly explore positive and negative samples by maximizing a novel contrastive variational objective, which is mathematically inferred to provide a tighter bound for the model's optimization. Moreover, the context-aware cross attention module is developed to capture the contextual information within markup language during the denoising process, yielding better noise prediction results. Extensive experiments are conducted on four benchmark datasets from different domains, and the experimental results demonstrate the effectiveness of the proposed components in FSA-CDM, significantly exceeding state-of-the-art performance by about 2%-12% DTW improvements. The code will be released at https://github.com/zgj77/FSACDM. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: Accepted to ACM MM 2023. The code will be released at https://github.com/zgj77/FSACDM

arXiv:2307.06605 [pdf, ps, other]

Intelligent Omni Surfaces assisted Integrated Multi Target Sensing and Multi User MIMO Communications

Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, **hong Yuan

Abstract: Drawing inspiration from the advantages of intelligent reflecting surfaces (IRS) in wireless networks,this paper presents a novel design for intelligent omni surface (IOS) enabled integrated sensing and communications (ISAC). By harnessing the power of multi antennas and a multitude of elements, the dual-function base station (BS) and IOS collaborate to realize joint active and passive beamforming… ▽ More Drawing inspiration from the advantages of intelligent reflecting surfaces (IRS) in wireless networks,this paper presents a novel design for intelligent omni surface (IOS) enabled integrated sensing and communications (ISAC). By harnessing the power of multi antennas and a multitude of elements, the dual-function base station (BS) and IOS collaborate to realize joint active and passive beamforming, enabling seamless 360-degree ISAC coverage. The objective is to maximize the minimum signal-tointerference-plus-noise ratio (SINR) of multi-target sensing, while ensuring the multi-user multi-stream communications. To achieve this, a comprehensive optimization approach is employed, encompassing the design of radar receive vector, transmit beamforming matrix, and IOS transmissive and reflective coefficients. Due to the non-convex nature of the formulated problem, an auxiliary variable is introduced to transform it into a more tractable form. Consequently, the problem is decomposed into three subproblems based on the block coordinate descent algorithm. Semidefinite relaxation and successive convex approximation methods are leveraged to convert the sub-problem into a convex problem, while the iterative rank minimization algorithm and penalty function method ensure the equivalence. Furthermore,the scenario is extended to mode switching and time switching protocols. Simulation results validate the convergence and superior performance of the proposed algorithm compared to other benchmark algorithms. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 30 pages, 7 figures

arXiv:2306.17451 [pdf, ps, other]

Self-Connected Spatially Coupled LDPC Codes with Improved Termination

Authors: Yihuan Liao, Min Qiu, **hong Yuan

Abstract: This paper investigates the design of self-connected spatially coupled low-density parity-check (SC-LDPC) codes. First, a termination method is proposed to reduce rate loss. Particularly, a single-side open SC-LDPC ensemble is introduced, which halves the rate loss of a conventional terminated SC-LDPC by reducing the number of check nodes. We further propose a self-connection method that allows re… ▽ More This paper investigates the design of self-connected spatially coupled low-density parity-check (SC-LDPC) codes. First, a termination method is proposed to reduce rate loss. Particularly, a single-side open SC-LDPC ensemble is introduced, which halves the rate loss of a conventional terminated SC-LDPC by reducing the number of check nodes. We further propose a self-connection method that allows reliable information to propagate from several directions to improve the decoding threshold. We demonstrate that the proposed ensembles not only achieve a better trade-off between rate loss and gap to capacity than several existing protograph SC-LDPC codes with short chain lengths but also exhibit threshold saturation behavior. Finite blocklength error performance is provided to exemplify the superiority of the proposed codes over conventional protograph SC-LDPC codes. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 6 pages, 8 figures, accepted for publication in IEEE Communications Letters

arXiv:2306.08704 [pdf, ps, other]

On the Pulse Sha** for Delay-Doppler Communications

Authors: Shuangyang Li, Weijie Yuan, Zhiqiang Wei, **hong Yuan, Baoming Bai, Giuseppe Caire

Abstract: In this paper, we study the pulse sha** for delay-Doppler (DD) communications. We start with constructing a basis function in the DD domain following the properties of the Zak transform. Particularly, we show that the constructed basis functions are globally quasi-periodic while locally twisted-shifted, and their significance in time and frequency domains are then revealed. We further analyze th… ▽ More In this paper, we study the pulse sha** for delay-Doppler (DD) communications. We start with constructing a basis function in the DD domain following the properties of the Zak transform. Particularly, we show that the constructed basis functions are globally quasi-periodic while locally twisted-shifted, and their significance in time and frequency domains are then revealed. We further analyze the ambiguity function of the basis function, and show that fully localized ambiguity function can be achieved by constructing the basis function using periodic signals. More importantly, we prove that time and frequency truncating such basis functions naturally leads to approximate delay and Doppler orthogonalities, if the truncating windows are periodic within the support. Motivated by this, we propose a DD Nyquist pulse sha** scheme considering signals with periodicity. Finally, our conclusions are verified by using various strictly or approximately periodic pulses. △ Less

Submitted 21 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

arXiv:2306.06656 [pdf, other]

VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

Authors: Xu Zhang, Kailun Yang, Jiacheng Lin, ** Yuan, Zhiyong Li, Shutao Li

Abstract: The integration of diverse visual prompts like clicks, scribbles, and boxes in interactive image segmentation could significantly facilitate user interaction as well as improve interaction efficiency. Most existing studies focus on a single type of visual prompt by simply concatenating prompts and images as input for segmentation prediction, which suffers from low-efficiency prompt representation… ▽ More The integration of diverse visual prompts like clicks, scribbles, and boxes in interactive image segmentation could significantly facilitate user interaction as well as improve interaction efficiency. Most existing studies focus on a single type of visual prompt by simply concatenating prompts and images as input for segmentation prediction, which suffers from low-efficiency prompt representation and weak interaction issues. This paper proposes a simple yet effective Visual Prompt Unified Transformer (VPUFormer), which introduces a concise unified prompt representation with deeper interaction to boost the segmentation performance. Specifically, we design a Prompt-unified Encoder (PuE) by using Gaussian map** to generate a unified one-dimensional vector for click, box, and scribble prompts, which well captures users' intentions as well as provides a denser representation of user prompts. In addition, we present a Prompt-to-Pixel Contrastive Loss (P2CL) that leverages user feedback to gradually refine candidate semantic features, aiming to bring image semantic features closer to the features that are similar to the user prompt, while pushing away those image semantic features that are dissimilar to the user prompt, thereby correcting results that deviate from expectations. On this basis, our approach injects prompt representations as queries into Dual-cross Merging Attention (DMA) blocks to perform a deeper interaction between image and query inputs. A comprehensive variety of experiments on seven challenging datasets demonstrates that the proposed VPUFormer with PuE, DMA, and P2CL achieves consistent improvements, yielding state-of-the-art segmentation performance. Our code will be made publicly available at https://github.com/XuZhang1211/VPUFormer. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Code will be made publicly available at https://github.com/XuZhang1211/VPUFormer

arXiv:2306.01304 [pdf, other]

doi 10.24963/ijcai.2023/544

JEPOO: Highly Accurate Joint Estimation of Pitch, Onset and Offset for Music Information Retrieval

Authors: Haojie Wei, Jun Yuan, Rui Zhang, Yueguo Chen, Gang Wang

Abstract: Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multipitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges… ▽ More Melody extraction is a core task in music information retrieval, and the estimation of pitch, onset and offset are key sub-tasks in melody extraction. Existing methods have limited accuracy, and work for only one type of data, either single-pitch or multipitch. In this paper, we propose a highly accurate method for joint estimation of pitch, onset and offset, named JEPOO. We address the challenges of joint learning optimization and handling both single-pitch and multi-pitch data through novel model design and a new optimization technique named Pareto modulated loss with loss weight regularization. This is the first method that can accurately handle both single-pitch and multi-pitch music data, and even a mix of them. A comprehensive experimental study on a wide range of real datasets shows that JEPOO outperforms state-ofthe-art methods by up to 10.6%, 8.3% and 10.3% for the prediction of Pitch, Onset and Offset, respectively, and JEPOO is robust for various types of data and instruments. The ablation study shows the effectiveness of each component of JEPOO. △ Less

Submitted 7 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: This paper has been accepted by IJCAI 2023; 11 pages, 6 figures

arXiv:2305.14892 [pdf, ps, other]

Segmented GRAND: Combining Sub-patterns in Near-ML Order

Authors: Mohammad Rowshan, **hong Yuan

Abstract: The recently introduced maximum-likelihood (ML) decoding scheme called guessing random additive noise decoding (GRAND) has demonstrated a remarkably low time complexity in high signal-to-noise ratio (SNR) regimes. However, the complexity is not as low at low SNR regimes and low code rates. To mitigate this concern, we propose a scheme for a near-ML variant of GRAND called ordered reliability bits… ▽ More The recently introduced maximum-likelihood (ML) decoding scheme called guessing random additive noise decoding (GRAND) has demonstrated a remarkably low time complexity in high signal-to-noise ratio (SNR) regimes. However, the complexity is not as low at low SNR regimes and low code rates. To mitigate this concern, we propose a scheme for a near-ML variant of GRAND called ordered reliability bits GRAND (or ORBGRAND), which divides codewords into segments based on the properties of the underlying code, generates sub-patterns for each segment consistent with the syndrome (thus reducing the number of inconsistent error patterns generated), and combines them in a near-ML order using two-level integer partitions of logistic weight. The numerical evaluation demonstrates that the proposed scheme, called segmented ORBGRAND, significantly reduces the average number of queries at any SNR regime. Moreover, the segmented ORBGRAND with abandonment also improves the error correction performance. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.07270 [pdf, other]

SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection

Authors: Xuan He, Fan Yang, Kailun Yang, Jiacheng Lin, Haolong Fu, Meng Wang, ** Yuan, Zhiyong Li

Abstract: Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which aims at predicting 3D attributes from a single 2D image. Most existing transformer-based methods leverage both visual and depth representations to explore valuable query points on objects, and the quality of the learned query points has a great impact on detection accuracy. Unfortunat… ▽ More Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which aims at predicting 3D attributes from a single 2D image. Most existing transformer-based methods leverage both visual and depth representations to explore valuable query points on objects, and the quality of the learned query points has a great impact on detection accuracy. Unfortunately, existing unsupervised attention mechanisms in transformers are prone to generate low-quality query features due to inaccurate receptive fields, especially on hard objects. To tackle this problem, this paper proposes a novel "Supervised Scale-aware Deformable Attention" (SSDA) for monocular 3D object detection. Specifically, SSDA presets several masks with different scales and utilizes depth and visual features to adaptively learn a scale-aware filter for object query augmentation. Imposing the scale awareness, SSDA could well predict the accurate receptive field of an object query to support robust query feature generation. Aside from this, SSDA is assigned with a Weighted Scale Matching (WSM) loss to supervise scale prediction, which presents more confident results as compared to the unsupervised attention mechanisms. Extensive experiments on the KITTI and Waymo Open datasets demonstrate that SSDA significantly improves the detection accuracy, especially on moderate and hard objects, yielding state-of-the-art performance as compared to the existing approaches. Our code will be made publicly available at https://github.com/mikasa3lili/SSD-MonoDETR. △ Less

Submitted 1 September, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted to IEEE Transactions on Intelligent Vehicles (T-IV). Code will be made publicly available at https://github.com/mikasa3lili/SSD-MonoDETR

arXiv:2305.02599 [pdf, ps, other]

Transmissive Reconfigurable Intelligent Surface Transmitter Empowered Cognitive RSMA Networks

Authors: Ziwei Liu, Wen Chen, Zhendong Li, **hong Yuan, Qingqing Wu, Kunlun Wang

Abstract: In this paper, we investigated the downlink transmission problem of a cognitive radio network (CRN) equipped with a novel transmissive reconfigurable intelligent surface (TRIS) transmitter. In order to achieve low power consumption and high-rate multi-streams communication, time-modulated arrays (TMA) is implemented and users access the network using rate splitting multiple access (RSMA). With suc… ▽ More In this paper, we investigated the downlink transmission problem of a cognitive radio network (CRN) equipped with a novel transmissive reconfigurable intelligent surface (TRIS) transmitter. In order to achieve low power consumption and high-rate multi-streams communication, time-modulated arrays (TMA) is implemented and users access the network using rate splitting multiple access (RSMA). With such a network framework, a multi-objective optimization problem with joint design of the precoding matrix and the common stream rate is constructed to achieve higher energy efficiency (EE) and spectral efficiency (SE). Since the objective function is a non-convex fractional function, we proposed a joint optimization algorithm based on difference-of-convex (DC) programming and successive convex approximation (SCA). Numerical results show that under this framework the proposed algorithm can considerably improve and balance the EE and SE. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: IEEE Communications Letters

arXiv:2305.01360 [pdf, other]

Self-supervised arbitrary scale super-resolution framework for anisotropic MRI

Authors: Haonan Zhang, Yuhan Zhang, Qing Wu, Jiangjie Wu, Zhiming Zhen, Feng Shi, Jianmin Yuan, Hongjiang Wei, Chen Liu, Yuyao Zhang

Abstract: In this paper, we propose an efficient self-supervised arbitrary-scale super-resolution (SR) framework to reconstruct isotropic magnetic resonance (MR) images from anisotropic MRI inputs without involving external training data. The proposed framework builds a training dataset using in-the-wild anisotropic MR volumes with arbitrary image resolution. We then formulate the 3D volume SR task as a SR… ▽ More In this paper, we propose an efficient self-supervised arbitrary-scale super-resolution (SR) framework to reconstruct isotropic magnetic resonance (MR) images from anisotropic MRI inputs without involving external training data. The proposed framework builds a training dataset using in-the-wild anisotropic MR volumes with arbitrary image resolution. We then formulate the 3D volume SR task as a SR problem for 2D image slices. The anisotropic volume's high-resolution (HR) plane is used to build the HR-LR image pairs for model training. We further adapt the implicit neural representation (INR) network to implement the 2D arbitrary-scale image SR model. Finally, we leverage the well-trained proposed model to up-sample the 2D LR plane extracted from the anisotropic MR volumes to their HR views. The isotropic MR volumes thus can be reconstructed by stacking and averaging the generated HR slices. Our proposed framework has two major advantages: (1) It only involves the arbitrary-resolution anisotropic MR volumes, which greatly improves the model practicality in real MR imaging scenarios (e.g., clinical brain image acquisition); (2) The INR-based SR model enables arbitrary-scale image SR from the arbitrary-resolution input image, which significantly improves model training efficiency. We perform experiments on a simulated public adult brain dataset and a real collected 7T brain dataset. The results indicate that our current framework greatly outperforms two well-known self-supervised models for anisotropic MR image SR tasks. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 10 pages, 5 figures

arXiv:2304.07802 [pdf, other]

Reconfigurable Intelligent Surface-Enabled Gridless DoA Estimation System for NLoS Scenarios

Authors: Jiawen Yuan, Shaodan Ma, Gong Zhang, Henry Leung

Abstract: The conventional direction-of-arrival (DoA) estimation approaches only be effective when the line-of-sight (LoS) link exists, while in the case of the non-line-of-sight (NLoS) situation, the spatial angle can not be captured and thus the DoA estimation performance would be significantly degraded. To address this challenge, a novel reconfigurable intelligent surface (RIS)- enabled gridless DoA esti… ▽ More The conventional direction-of-arrival (DoA) estimation approaches only be effective when the line-of-sight (LoS) link exists, while in the case of the non-line-of-sight (NLoS) situation, the spatial angle can not be captured and thus the DoA estimation performance would be significantly degraded. To address this challenge, a novel reconfigurable intelligent surface (RIS)- enabled gridless DoA estimation approach is proposed, where the RIS can establish the virtual LoS link between the base station (BS) and the targets. For extracting the statistics of the signal, the RIS-enabled signal model in the covariance domain is proposed. Then we estimate the noise variance by constraining the Frobenius norm of the measurement error matrix to obtain the RIS-enabled covariance matrix free of noise nuisance. Additionally, we reconstruct the Hermitian Toeplitz matrix by addressing the atom norm minimization (ANM) problem. To ease the calculation burden, an efficient iterative approach finally is designed to solve the ANM problem via the alternating direction method of multipliers (ADMM). Numerical experiments validate the robustness of the proposed method against the benchmark in terms of computational efficiency and multi-source DoA estimation precision. △ Less

Submitted 7 November, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

arXiv:2303.12360 [pdf]

Automatically Predict Material Properties with Microscopic Image Example Polymer Compatibility

Authors: Zhilong Liang, Zhenzhi Tan, Ruixin Hong, Wanli Ouyang, **ying Yuan, Changshui Zhang

Abstract: Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer miscibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Comput… ▽ More Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer miscibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Computer image recognition with machine learning method can make up the defects of artificial judging, giving accurate and quantitative judgement. We achieve automatic miscibility recognition utilizing convolution neural network and transfer learning method, and the model obtains up to 94% accuracy. We also put forward a quantitative criterion for polymer miscibility with this model. The proposed method can be widely applied to the quantitative characterization of the microstructure and properties of various materials. △ Less

Submitted 3 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.10727 [pdf, other]

ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement

Authors: Chaojian Li, Wenwan Chen, Jiayi Yuan, Yingyan Lin, Ashutosh Sabharwal

Abstract: Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned… ▽ More Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW x 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand. △ Less

Submitted 24 March, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP'23

arXiv:2303.07626 [pdf, other]

CAT: Causal Audio Transformer for Audio Classification

Authors: Xiaoyu Liu, Hanlin Lu, Jianbo Yuan, Xinyu Li

Abstract: The attention-based Transformers have been increasingly applied to audio classification because of their global receptive field and ability to handle long-term dependency. However, the existing frameworks which are mainly extended from the Vision Transformers are not perfectly compatible with audio signals. In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resoluti… ▽ More The attention-based Transformers have been increasingly applied to audio classification because of their global receptive field and ability to handle long-term dependency. However, the existing frameworks which are mainly extended from the Vision Transformers are not perfectly compatible with audio signals. In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resolution Multi-Feature (MRMF) feature extraction with an acoustic attention block for more optimized audio modeling. In addition, we propose a causal module that alleviates over-fitting, helps with knowledge transfer, and improves interpretability. CAT obtains higher or comparable state-of-the-art classification performance on ESC50, AudioSet and UrbanSound8K datasets, and can be easily generalized to other Transformer-based models. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023

arXiv:2303.06324 [pdf, other]

OCCL: a Deadlock-free Library for GPU Collective Communication

Authors: Lichen Pan, Juncheng Liu, **hui Yuan, Rongkai Zhang, Pengze Li, Zhen Xiao

Abstract: Various distributed deep neural network (DNN) training technologies lead to increasingly complicated use of collective communications on GPU. The deadlock-prone collectives on GPU force researchers to guarantee that collectives are enqueued in a consistent order on each GPU to prevent deadlocks. In complex distributed DNN training scenarios, manual hardcoding is the only practical way for deadlock… ▽ More Various distributed deep neural network (DNN) training technologies lead to increasingly complicated use of collective communications on GPU. The deadlock-prone collectives on GPU force researchers to guarantee that collectives are enqueued in a consistent order on each GPU to prevent deadlocks. In complex distributed DNN training scenarios, manual hardcoding is the only practical way for deadlock prevention, which poses significant challenges to the development of artificial intelligence. This paper presents OCCL, which is, to the best of our knowledge, the first deadlock-free collective communication library for GPU supporting dynamic decentralized preemption and gang-scheduling for collectives. Leveraging the preemption opportunity of collectives on GPU, OCCL dynamically preempts collectives in a decentralized way via the deadlock-free collective execution framework and allows dynamic decentralized gang-scheduling via the stickiness adjustment scheme. With the help of OCCL, researchers no longer have to struggle to get all GPUs to launch collectives in a consistent order to prevent deadlocks. We implement OCCL with several optimizations and integrate OCCL with a distributed deep learning framework OneFlow. Experimental results demonstrate that OCCL achieves comparable or better latency and bandwidth for collectives compared to NCCL, the state-of-the-art. When used in distributed DNN training, OCCL can improve the peak training throughput by up to 78% compared to statically sequenced NCCL, while introducing overheads of less than 6.5% across various distributed DNN training approaches. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2302.13869 [pdf, other]

doi 10.1016/j.bspc.2023.105280

EDMAE: An Efficient Decoupled Masked Autoencoder for Standard View Identification in Pediatric Echocardiography

Authors: Yiman Liu, Xiaoxiang Han, Tongtong Liang, Bin Dong, Jiajun Yuan, Menghan Hu, Qiaohong Liu, Jiangang Chen, Qingli Li, Yuqi Zhang

Abstract: This paper introduces the Efficient Decoupled Masked Autoencoder (EDMAE), a novel self-supervised method for recognizing standard views in pediatric echocardiography. EDMAE introduces a new proxy task based on the encoder-decoder structure. The EDMAE encoder is composed of a teacher and a student encoder. The teacher encoder extracts the potential representation of the masked image blocks, while t… ▽ More This paper introduces the Efficient Decoupled Masked Autoencoder (EDMAE), a novel self-supervised method for recognizing standard views in pediatric echocardiography. EDMAE introduces a new proxy task based on the encoder-decoder structure. The EDMAE encoder is composed of a teacher and a student encoder. The teacher encoder extracts the potential representation of the masked image blocks, while the student encoder extracts the potential representation of the visible image blocks. The loss is calculated between the feature maps output by the two encoders to ensure consistency in the latent representations they extract. EDMAE uses pure convolution operations instead of the ViT structure in the MAE encoder. This improves training efficiency and convergence speed. EDMAE is pre-trained on a large-scale private dataset of pediatric echocardiography using self-supervised learning, and then fine-tuned for standard view recognition. The proposed method achieves high classification accuracy in 27 standard views of pediatric echocardiography. To further verify the effectiveness of the proposed method, the authors perform another downstream task of cardiac ultrasound segmentation on the public dataset CAMUS. The experimental results demonstrate that the proposed method outperforms some popular supervised and recent self-supervised methods, and is more competitive on different downstream tasks. △ Less

Submitted 3 August, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 15 pages, 5 figures, 8 tables, Published in Biomedical Signal Processing and Control

Journal ref: Biomedical Signal Processing and Control 86 (2023) 105280

arXiv:2301.09303 [pdf, other]

Downlink Transmission under Heterogeneous Blocklength Constraints: Discrete Signaling with Single-User Decoding

Authors: Min Qiu, Yu-Chih Huang, **hong Yuan

Abstract: In this paper, we consider the downlink broadcast channel under heterogenous blocklength constraints, where each user experiences different interference statistics across its received symbols. Different from the homogeneous blocklength case, the strong users with short blocklength transmitted symbol blocks usually cannot wait to receive the entire transmission frame and perform successive interfer… ▽ More In this paper, we consider the downlink broadcast channel under heterogenous blocklength constraints, where each user experiences different interference statistics across its received symbols. Different from the homogeneous blocklength case, the strong users with short blocklength transmitted symbol blocks usually cannot wait to receive the entire transmission frame and perform successive interference cancellation (SIC) owing to their stringent latency requirements. Even if SIC is feasible, it may not be perfect under finite blocklength constraints. To cope with the heterogeneity in latency and reliability requirements, we propose a practical downlink transmission scheme with discrete signaling and single-user decoding, i.e., without SIC. In addition, we derive the finite blocklength achievable rate and use it for guiding the design of channel coding and modulations. Both achievable rate and error probability simulation show that the proposed scheme can operate close to the benchmark scheme which assumes capacity-achieving signaling and perfect SIC. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: 7 pages, 1 figure, accepted for presentation at IEEE ICC 2023. arXiv admin note: substantial text overlap with arXiv:2212.01736

arXiv:2301.06721 [pdf, ps, other]

doi 10.1109/GLOBECOM48099.2022.10001406

On Delay-Doppler Plane Orthogonal Pulse

Authors: Hai Lin, **hong Yuan

Abstract: In this paper, we analyze the recently discovered delay-Doppler plane orthogonal pulse (DDOP), which is essential for delay-Doppler plane multi-carrier modulation waveform. In particular, we introduce a local orthogonality property of pulses corresponding to Weyl-Heisenberg (WH) subset and justify the DDOP's existence, in contrast to global orthogonality corresponding to WH set governed by the WH… ▽ More In this paper, we analyze the recently discovered delay-Doppler plane orthogonal pulse (DDOP), which is essential for delay-Doppler plane multi-carrier modulation waveform. In particular, we introduce a local orthogonality property of pulses corresponding to Weyl-Heisenberg (WH) subset and justify the DDOP's existence, in contrast to global orthogonality corresponding to WH set governed by the WH frame theory. Then, sufficient conditions for locally-orthogonal pulses are presented and discussed. Based on the analysis, we propose a general DDOP design. We also derive the frequency domain representation of the DDOP, and compare the DDOP-based orthogonal delay-Doppler division multiplexing (ODDM) modulation with other modulation schemes, in terms of TF signal localization. Interestingly, we show perfect local orthogonality property of the DDOP with respect to delay-Doppler resolutions using its ambiguity function. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: This paper was presented at the IEEE GLOBECOM 2022

arXiv:2212.13059 [pdf]

OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Authors: Peng Xiao, Xiaodong Hu, Ke Ma, Gengyuan Wang, Ziqing Feng, Yuancong Huang, ** Yuan

Abstract: The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by c… ▽ More The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 10 pages, 6 figures, submitted to IEEE Transactions on Medical Imaging (TMI)

arXiv:2212.11594 [pdf, other]

Electromagnetic Based Communication Model for Dynamic Metasurface Antennas

Authors: Robin Jess Williams, Pablo Ramirez-Espinosa, Jide Yuan, Elisabeth De Carvalho

Abstract: Dynamic metasurface antennas (DMAs) arise as a promising technology in the field of massive multiple-input multiple-output (mMIMO) systems, offering the possibility of integrating a large number of antennas in a limited -- and potentially large -- aperture while kee** the required number of radio-frequency (RF) chains under control. Although envisioned as practical realizations of mMIMO systems,… ▽ More Dynamic metasurface antennas (DMAs) arise as a promising technology in the field of massive multiple-input multiple-output (mMIMO) systems, offering the possibility of integrating a large number of antennas in a limited -- and potentially large -- aperture while kee** the required number of radio-frequency (RF) chains under control. Although envisioned as practical realizations of mMIMO systems, DMAs represent a new paradigm in the design of signal processing techniques (such as beamforming) due to the constraints inherent to their physical implementation, for which no complete models are available yet. In this work, we propose a complete and electromagnetic-compliant narrowband communication model for a generic DMA based system. Specifically, the model accounts for: i) the wave propagation and reflections throughout the waveguides that feed the antenna elements, ii) the mutual coupling both through the air and the waveguides, and iii) the insertion losses. Also, we integrate the electromagnetic model in the conventional digital communication model, providing a complete and useful framework to design and characterize the performance of these systems. Finally, the accuracy of the model is verified through full-wave simulations. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2212.10390 [pdf, other]

UniDA3D: Unified Domain Adaptive 3D Semantic Segmentation Pipeline

Authors: Ben Fei, Siyuan Huang, Jiakang Yuan, Botian Shi, Bo Zhang, Weidong Yang, Min Dou, Yikang Li

Abstract: State-of-the-art 3D semantic segmentation models are trained on off-the-shelf public benchmarks, but they will inevitably face the challenge of recognition accuracy drop when these well-trained models are deployed to a new domain. In this paper, we introduce a Unified Domain Adaptive 3D semantic segmentation pipeline (UniDA3D) to enhance the weak generalization ability, and bridge the point distri… ▽ More State-of-the-art 3D semantic segmentation models are trained on off-the-shelf public benchmarks, but they will inevitably face the challenge of recognition accuracy drop when these well-trained models are deployed to a new domain. In this paper, we introduce a Unified Domain Adaptive 3D semantic segmentation pipeline (UniDA3D) to enhance the weak generalization ability, and bridge the point distribution gap between domains. Different from previous studies that only focus on a single adaptation task, UniDA3D can tackle several adaptation tasks in 3D segmentation field, by designing a unified source-and-target active sampling strategy, which selects a maximally-informative subset from both source and target domains for effective model adaptation. Besides, benefiting from the rise of multi-modal 2D-3D datasets, UniDA3D investigates the possibility of achieving a multi-modal sampling strategy, by develo** a cross-modality feature interaction module that can extract a representative pair of image and point features to achieve a bi-directional image-point feature interaction for safe model adaptation. Experimentally, UniDA3D is verified to be effective in many adaptation tasks including: 1) unsupervised domain adaptation, 2) unsupervised few-shot domain adaptation; 3) active domain adaptation. Their results demonstrate that, by easily coupling UniDA3D with off-the-shelf 3D segmentation baselines, domain generalization ability of these baselines can be enhanced. △ Less

Submitted 12 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Showing 1–50 of 141 results for author: Yuan, J