Search | arXiv e-print repository

VideoQA-SC: Adaptive Semantic Communication for Video Question Answering

Authors: Jiangyuan Guo, Wei Chen, Yuxuan Sun, Jialong Xu, Bo Ai

Abstract: Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth eff… ▽ More Although semantic communication (SC) has shown its potential in efficiently transmitting multi-modal data such as text, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth efficiency and real-time performance of various intelligent tasks. The difficulty in such system design lies in the extraction of task-related compact semantic representations and their accurate delivery over noisy channels. In this paper, we propose an end-to-end SC system for video question answering (VideoQA) tasks called VideoQA-SC. Our goal is to accomplish VideoQA tasks directly based on video semantics over noisy or fading wireless channels, bypassing the need for video reconstruction at the receiver. To this end, we develop a spatiotemporal semantic encoder for effective video semantic extraction, and a learning-based bandwidth-adaptive deep joint source-channel coding (DJSCC) scheme for efficient and robust video semantic transmission. Experiments demonstrate that VideoQA-SC outperforms traditional and advanced DJSCC-based SC systems that rely on video reconstruction at the receiver under a wide range of channel conditions and bandwidth constraints. In particular, when the signal-to-noise ratio is low, VideoQA-SC can improve the answer accuracy by 5.17% while saving almost 99.5% of the bandwidth at the same time, compared with the advanced DJSCC-based SC system. Our results show the great potential of task-oriented SC system design for video applications. △ Less

Submitted 17 May, 2024; originally announced June 2024.

arXiv:2406.16946 [pdf, ps, other]

Networked ISAC for Low-Altitude Economy: Coordinated Transmit Beamforming and UAV Trajectory Design

Authors: Gaoyuan Cheng, Xianxin Song, Zhonghao Lyu, Jie Xu

Abstract: This paper exploits the networked integrated sensing and communications (ISAC) to support low-altitude economy (LAE), in which a set of networked ground base stations (GBSs) cooperatively transmit joint information and sensing signals to communicate with multiple authorized unmanned aerial vehicles (UAVs) and concurrently detect unauthorized objects over the interested region in the three-dimensio… ▽ More This paper exploits the networked integrated sensing and communications (ISAC) to support low-altitude economy (LAE), in which a set of networked ground base stations (GBSs) cooperatively transmit joint information and sensing signals to communicate with multiple authorized unmanned aerial vehicles (UAVs) and concurrently detect unauthorized objects over the interested region in the three-dimensional (3D) space. We assume that each GBS is equipped with uniform linear array (ULA) antennas, which are deployed either horizontally or vertically to the ground. We also consider two types of UAV receivers, which have and do not have the capability of canceling the interference caused by dedicated sensing signals, respectively. Under each setup, we jointly design the coordinated transmit beamforming at multiple GBSs together with the authorized UAVs' trajectory control and their GBS associations, for enhancing the authorized UAVs' communication performance while ensuring the sensing requirements. In particular, we aim to maximize the average sum rate of authorized UAVs over a given flight period, subject to the minimum illumination power constraints toward the interested 3D sensing region, the maximum transmit power constraints at individual GBSs, and the flight constraints of UAVs. These problems are highly non-convex and challenging to solve, due to the involvement of binary UAV-GBS association variables as well as the coupling of beamforming and trajectory variables. To solve these non-convex problems, we propose efficient algorithms by using the techniques of alternating optimization, successive convex approximation, and semi-definite relaxation. Numerical results show that the proposed joint coordinated transmit beamforming and UAV trajectory designs efficiently balance the sensing-communication performance tradeoffs and significantly outperform various benchmarks. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2405.07568

arXiv:2406.16868 [pdf, other]

Neural Network-based Two-Dimensional Filtering for OTFS Symbol Detection

Authors: Jiarui Xu, Karim Said, Lizhong Zheng, Lingjia Liu

Abstract: Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design… ▽ More Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design the RC architecture based on the properties of the OTFS system to fully unlock the potential of RC. This paper introduces a novel two-dimensional RC (2D-RC) approach for online symbol detection on a subframe basis in the OTFS system. The 2D-RC is designed to have a two-dimensional (2D) filtering structure to equalize the 2D circular channel effect in the delay-Doppler (DD) domain of the OTFS system. With the introduced architecture, the 2D-RC can operate in the DD domain with only a single neural network, unlike our previous work which requires multiple RCs to track channel variations in the time domain. Experimental results demonstrate the advantages of the 2D-RC approach over the previous RC-based approach and the compared model-based methods across different modulation orders. △ Less

Submitted 8 March, 2024; originally announced June 2024.

Comments: 6 pages, conference paper. arXiv admin note: substantial text overlap with arXiv:2311.08543

arXiv:2406.14973 [pdf, other]

LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement

Authors: Haodong Yang, Jisheng Xu, Zhiliang Lin, Jian** He

Abstract: Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effe… ▽ More Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effectiveness of underwater vision perception. Nevertheless, for real-time vision tasks on underwater robots, it is necessary to overcome the challenges associated with algorithmic efficiency and real-time capabilities. In this paper, we introduce Lightweight Underwater Unet (LU2Net), a novel U-shape network designed specifically for real-time enhancement of underwater images. The proposed model incorporates axial depthwise convolution and the channel attention module, enabling it to significantly reduce computational demands and model parameters, thereby improving processing speed. The extensive experiments conducted on the dataset and real-world underwater robots demonstrate the exceptional performance and speed of proposed model. It is capable of providing well-enhanced underwater images at a speed 8 times faster than the current state-of-the-art underwater image enhancement method. Moreover, LU2Net is able to handle real-time underwater video enhancement. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14664 [pdf, ps, other]

Experimental Validation of Cooperative RSS-based Localization with Unknown Transmit Power, Path Loss Exponent, and Precise Anchor Location

Authors: Yingquan Li, Bodhibrata Mukhopadhyay, Jiajie Xu, Mohamed-Slim Alouini

Abstract: Received signal strength (RSS)--based cooperative localization has gained significant attention due to its straightforward system architectures and cost-effectiveness. In this paper, we propose Cooperative Localization Techniques (with Unknown Parameters), referred to as CTUP(s), which consider uncertainty in anchor nodes' locations and assume the transmit power and \textcolor{blue}{path loss expo… ▽ More Received signal strength (RSS)--based cooperative localization has gained significant attention due to its straightforward system architectures and cost-effectiveness. In this paper, we propose Cooperative Localization Techniques (with Unknown Parameters), referred to as CTUP(s), which consider uncertainty in anchor nodes' locations and assume the transmit power and \textcolor{blue}{path loss exponent (PLE)} to be unknown. Unlike prior studies, CTUP(s) address unknowns by estimating these parameters, along with the location of target nodes. The non-convex and non-linear nature of the maximum likelihood (ML) estimator of the problem is addressed through relaxation techniques, employing Taylor series expansion, semidefinite relaxation (SDR), and the epigraph method. The resulting problem is solved using semidefinite second-order cone programming (SDP-SOCP), leveraging the precision of SDP and the simplicity of SOCP. We deployed an extensive network comprising 50 BLE nodes covering an area of 640~m $\times$ 180~m to gather RSS data. The precise location of the nodes is obtained using real-time kinematics global positioning system (RTK-GPS), which is treated as the ground truth. Furthermore, to replicate real-world scenarios, we recorded the positions of the anchor nodes using a standard GPS, thereby introducing uncertainty into the anchor node locations. Extensive simulation and hardware experimentation demonstrate the superior performance of CTUP compared to existing techniques. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14092 [pdf, other]

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

Authors: **g Xu, Minglin Wu, Xixin Wu, Helen Meng

Abstract: Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Develo** a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose ad… ▽ More Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Develo** a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.12426 [pdf, other]

Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design

Authors: Yuan Fang, Xianghao Yu, Jie Xu, Ying-Jun Angela Zhang

Abstract: This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to… ▽ More This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cramér-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2404.13536

arXiv:2406.11082 [pdf, other]

doi 10.1109/JPROC.2024.3405351

Simultaneously Transmitting and Reflecting Surfaces for Ubiquitous Next Generation Multiple Access in 6G and Beyond

Authors: Xidong Mu, Jiaqi Xu, Zhaolin Wang, Naofal Al-Dhahir

Abstract: The ultimate goal of next generation multiple access (NGMA) is to support massive terminals and facilitate multiple functionalities over the limited radio resources of wireless networks in the most efficient manner possible. However, the random and uncontrollable wireless radio environment is a major obstacle to realizing this NGMA vision. Given the prominent feature of achieving 360° smart radio… ▽ More The ultimate goal of next generation multiple access (NGMA) is to support massive terminals and facilitate multiple functionalities over the limited radio resources of wireless networks in the most efficient manner possible. However, the random and uncontrollable wireless radio environment is a major obstacle to realizing this NGMA vision. Given the prominent feature of achieving 360° smart radio environment, simultaneously transmitting and reflecting surfaces (STARS) are emerging as one key enabling technology among the family of reconfigurable intelligent surfaces for NGMA. This paper provides a comprehensive overview of the recent research progress of STARS, focusing on fundamentals, performance analysis, and full-space beamforming design, as well as promising employments of STARS in NGMA. In particular, we first introduce the basics of STARS by elaborating on the foundational principles and operating protocols as well as discussing different STARS categories and prototypes. Moreover, we systematically survey the existing performance analysis and beamforming design for STARS-aided wireless communications in terms of diverse objectives and different mathematical approaches. Given the superiority of STARS, we further discuss advanced STARS applications as well as the attractive interplay between STARS and other emerging techniques to motivate future works for realizing efficient NGMA. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 25 pages, 18 figures, 7 tables

arXiv:2406.10276 [pdf, other]

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Authors: Peidong Wang, Jian Xue, **yu Li, Junkun Chen, Aswin Shanmugam Subramanian

Abstract: Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the o… ▽ More Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the other languages. We accomplish this by introducing a simple and effective linear input network. The linear input network is initialized as an identity matrix, which ensures that the model can perform as well as, or better than, the original model. Experimental results show that the proposed method can successfully enhance the specified language, while kee** the language-agnostic ability of the many-to-one ST models. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.10236 [pdf, other]

Lightening Anything in Medical Images

Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, **gyi Xu, Lipeng Ma, Yatian Yang, **hong Zhou

Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous paired images, presenting challenges in data collection and training costs, all while lacking the ability to generalize effectively. Here, we introduce a pioneering training-free Diffusion Model for Universal Medical Image Enhancement, named UniMIE. UniMIE demonstrates its unsupervised enhancement capabilities across various medical image modalities without the need for any fine-tuning. It accomplishes this by relying solely on a single pre-trained model from ImageNet. We conduct a comprehensive evaluation on 13 imaging modalities and over 15 medical types, demonstrating better qualities, robustness, and accuracy than other modality-specific and data-inefficient models. By delivering high-quality enhancement and corresponding accuracy downstream tasks across a wide range of tasks, UniMIE exhibits considerable potential to accelerate the advancement of diagnostic tools and customized treatment plans. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 23 pages, 6 figures

arXiv:2406.09664 [pdf, other]

Frequency-mix Knowledge Distillation for Fake Speech Detection

Authors: Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv

Abstract: In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA… ▽ More In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA method, Frequency-mix (Freqmix), and introduce the Freqmix knowledge distillation (FKD) to enhance model information extraction and generalization abilities. Specifically, we use Freqmix-enhanced data as input for the teacher model, while the student model's input undergoes time-domain DA method. We use a multi-level feature distillation approach to restore information and improve the model's generalization capabilities. Our approach achieves state-of-the-art results on ASVspoof 2021 LA dataset, showing a 31\% improvement over baseline and performs competitively on ASVspoof 2021 DF dataset. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.07662 [pdf, other]

Progress Towards Decoding Visual Imagery via fNIRS

Authors: Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

Abstract: We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 2… ▽ More We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system. △ Less

Submitted 22 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06086 [pdf, other]

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets. △ Less

Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.03714 [pdf, other]

Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Authors: **long Xue, Yayue Deng, Yingming Gao, Ya Li

Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current prompt-based TTS models choose the speech prompt manually or simply at random. Hence, in this paper, we adapt retrieval augmented generation (RAG) from LLMs to prompt-based TTS. Unlike traditional RAG methods, we additionally consider contextual information during the retrieval process and present a Context-Aware Contrastive Language-Audio Pre-training (CA-CLAP) model to extract context-aware, style-related features. The objective and subjective evaluations demonstrate that our proposed RAG method outperforms baselines, and our CA-CLAP achieves better results than text-only retrieval methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.03706 [pdf, other]

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Authors: **long Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we introduce a novel audio codec-based TTS model to adapt context features with multiple enhancements. Inspired by the success of Qformer, we propose a multi-modal context-enhanced Qformer (MMCE-Qformer) to utilize additional multi-modal context information. Besides, we adapt a pretrained LLM to leverage its understanding ability to predict semantic tokens, and use a SoundStorm to generate acoustic tokens thereby enhancing audio quality and speaker similarity. The extensive objective and subjective evaluations show that our proposed method outperforms baselines across various context TTS scenarios. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.01922 [pdf, ps, other]

Performance Analysis of Hybrid Cellular and Cell-free MIMO Network

Authors: Zhuoyin Dai, **gran Xu, Xiaoli Xu, Ruoguang Li, Yong Zeng

Abstract: Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio… ▽ More Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communication network and collaborate with the existing cellular base stations (BSs). To further explore the performance limits of hybrid cellular and cell-free networks, this paper develops a hybrid network model based on stochastic geometric toolkits, which reveals the coupling of the signal and interference from both the cellular and cell-free networks. Specifically, the conjugate beamforming is applied in hybrid cellular and cell-free networks, which enables user equipment (UE) to benefit from both cellular BSs and cell-free APs. The aggregate signal received from the hybrid network is approximated via moment matching, and coverage probability is characterized by deriving the Laplace transform of the interference. The analysis of signal strength and coverage probability is verified by extensive simulations. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.17329 [pdf, other]

Joint MIMO Transceiver and Reflector Design for Reconfigurable Intelligent Surface-Assisted Communication

Authors: Yaqiong Zhao, **dan Xu, Wei Xu, Kezhi Wang, Xinquan Ye, Chau Yuen, Xiaohu You

Abstract: In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output communication system with multiple antennas at both the base station (BS) and the user. We plan to maximize the achievable rate through jointly optimizing the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix under the constraints of the transmit power… ▽ More In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output communication system with multiple antennas at both the base station (BS) and the user. We plan to maximize the achievable rate through jointly optimizing the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix under the constraints of the transmit power at the BS and the unit-modulus reflection at the RIS. Regarding the non-trivial problem form, we initially reformulate it into an considerable problem to make it tractable by utilizing the relationship between the achievable rate and the weighted minimum mean squared error. Next, the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix are alternately optimized. In particular, the optimal transmit precoding matrix and receive combining matrix are obtained in closed forms. Furthermore, a pair of computationally efficient methods are proposed for the RIS reflection matrix, namely the semi-definite relaxation (SDR) method and the successive closed form (SCF) method. We theoretically prove that both methods are ensured to converge, and the SCF-based algorithm is able to converges to a Karush-Kuhn-Tucker point of the problem. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 14 pages, 12 figures

arXiv:2405.17250 [pdf, ps, other]

"Pass the butter": A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT

Authors: Haohua Que, Wenbin Pan, Jie Xu, Hao Luo, Pei Wang, Li Zhang

Abstract: In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, l… ▽ More In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, locally deploying a natural language model (NLP-BERT), and integrating visual recognition (CV-YOLO) and speech recognition technology (ASR-Whisper) as inputs to achieve autonomous decision-making and rational action by the desktop robot. Three comprehensive experiments were designed to validate the robotic arm, and the results demonstrate excellent performance using this approach across all three experiments. In Task 1, the execution rates for speech recognition and action performance were 92.6% and 84.3%, respectively. In Task 2, the highest execution rates under the given conditions reached 92.1% and 84.6%, while in Task 3, the highest execution rates were 95.2% and 80.8%, respectively. Therefore, it can be concluded that the proposed solution integrating ASR, NLP, and other technologies on edge devices is feasible and provides a technical and engineering foundation for realizing multimodal desktop-level robots. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.13634 [pdf, other]

Secure Communications in Near-Filed ISCAP Systems with Extremely Large-Scale Antenna Arrays

Authors: Zixiang Ren, Siyao Zhang, Xinmin Li, Ling Qiu, Jie Xu, Derrick Wing Kwan Ng

Abstract: This paper investigates secure communications in a near-field multi-functional integrated sensing, communication, and powering (ISCAP) system with an extremely large-scale antenna arrays (ELAA) equipped at the base station (BS). In this system, the BS sends confidential messages to a single communication user (CU), and at the same time wirelessly senses a point target and charges multiple energy r… ▽ More This paper investigates secure communications in a near-field multi-functional integrated sensing, communication, and powering (ISCAP) system with an extremely large-scale antenna arrays (ELAA) equipped at the base station (BS). In this system, the BS sends confidential messages to a single communication user (CU), and at the same time wirelessly senses a point target and charges multiple energy receivers (ERs). It is assumed that the ERs and the sensing target are potential eavesdroppers that may attempt to intercept the confidential messages intended for the CU. We consider the joint transmit beamforming design to support secure communications while ensuring the sensing and powering requirements. In particular, the BS transmits dedicated sensing/energy beams in addition to the information beam, which also play the role of artificial noise (AN) for effectively jamming potential eavesdroppers. Building upon this, we maximize the secrecy rate at the CU, subject to the maximum \ac{crb} constraints for target sensing and the minimum harvested energy constraints for the ERs. Although the formulated joint beamforming problem is non-convex and challenging to solve, we acquire the optimal solution via the semi-definite relaxation (SDR) and fractional programming techniques together with a one-dimensional (1D) search. Subsequently, we present two alternative designs based on zero-forcing (ZF) beamforming and maximum ratio transmission (MRT), respectively. Finally, our numerical results show that our proposed approaches exploit both the distance-domain resolution of near-field ELAA and the joint beamforming design for enhancing secure communication performance while ensuring the sensing and powering requirements in ISCAP, especially when the CU and the target and ER eavesdroppers are located at the same angle (but different distances) with respect to the BS. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 6 pages

arXiv:2405.10514 [pdf, other]

Secrecy Performance Analysis of Multi-Functional RIS-Assisted NOMA Networks

Authors: Yingjie Pei, Wanli Ni, ** Xu, Xinwei Yue, Xiaofeng Tao, Dusit Niyato

Abstract: Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-ort… ▽ More Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-orthogonal multiple access (NOMA) networks. Specifically, we derive closed-form expressions for the secrecy outage probability (SOP) and secrecy throughput of users in the MF-RIS-assisted NOMA networks with external and internal eavesdroppers. The asymptotic expressions for SOP and secrecy diversity order are also analyzed under high signal-to-noise ratio (SNR) conditions. Additionally, we examine the impact of receiver hardware limitations and error transmission-induced imperfect successive interference cancellation (SIC) on the secrecy performance. Numerical results indicate that: i) under the same power budget, the secrecy performance achieved by MF-RIS significantly outperforms active RIS and simultaneously transmitting and reflecting RIS; ii) with increasing power budget, residual interference caused by imperfect SIC surpasses thermal noise as the primary factor affecting secrecy capacity; and iii) deploying additional elements at the MF-RIS brings significant secrecy enhancements for the external eavesdrop** scenario, in contrast to the internal eavesdrop** case. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 14 pages, 9 figures, submitted to IEEE transactions on wireless communication

arXiv:2405.09353 [pdf, other]

Large coordinate kernel attention network for lightweight image super-resolution

Authors: Fangwei Hao, Jiesheng Wu, Haotian Lu, Ji Du, **g Xu

Abstract: The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in comp… ▽ More The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in computational and memory footprints as the convolutional kernel size increases. To address the first issue, we propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field, it can focus on the learning for the multi-scale information which is a vital component of discriminative representation. As for the second issue, we revisit the key properties of LKA in which we find that the adjacent direct interaction of local information and long-distance dependencies is crucial to provide remarkable performance. Thus, taking this into account and in order to mitigate the complexity of LKA, we propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels. LCKA enables the adjacent direct interaction of local information and long-distance dependencies not only in the horizontal direction but also in the vertical. Besides, LCKA allows for the direct use of extremely large kernels in the depth-wise convolutional layers to capture more contextual information, which helps to significantly improve the reconstruction performance, and it incurs lower computational complexity and memory footprints. Integrating MBSConv and LCKA, we propose a large coordinate kernel attention network (LCAN). △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07568 [pdf, ps, other]

Networked ISAC for Low-Altitude Economy: Transmit Beamforming and UAV Trajectory Design

Authors: Gaoyuan Cheng, Xianxin Song, Zhonghao Lyu, Jie Xu

Abstract: This paper studies the exploitation of networked integrated sensing and communications (ISAC) to support low-altitude economy (LAE), in which a set of networked ground base stations (GBSs) transmit wireless signals to cooperatively communicate with multiple authorized unmanned aerial vehicles (UAVs) and concurrently use the echo signals to detect the invasion of unauthorized objects in interested… ▽ More This paper studies the exploitation of networked integrated sensing and communications (ISAC) to support low-altitude economy (LAE), in which a set of networked ground base stations (GBSs) transmit wireless signals to cooperatively communicate with multiple authorized unmanned aerial vehicles (UAVs) and concurrently use the echo signals to detect the invasion of unauthorized objects in interested airspace. Under this setup, we jointly design the cooperative transmit beamforming at multiple GBSs together with the trajectory control of authorized UAVs and their GBS associations, for enhancing the authorized UAVs' communication performance while ensuring the sensing requirements for airspace monitoring. In particular, our objective is to maximize the average sum rate of authorized UAVs over a particular flight period, subject to the minimum illumination power constraints for sensing over the interested airspace, the maximum transmit power constraints at individual GBSs, and the flight constraints at UAVs. This problem is non-convex and challenging to solve, due to the involvement of integer variables and the coupling of optimization variables. To solve this non-convex problem, we propose an efficient algorithm by using the techniques of alternating optimization (AO), successive convex approximation (SCA), and semi-definite relaxation (SDR). Numerical results show that the obtained transmit beamforming and UAV trajectory designs in the proposed algorithm efficiently balance the tradeoff between the sensing and communication performances, thus significantly outperforming various benchmarks. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.05007 [pdf, other]

HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image Segmentation

Authors: Jiashu Xu

Abstract: Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmenta… ▽ More Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba. Specifically, we introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information without increasing the computational cost by extending the perceptual field of the convolution kernel. In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model. By combining dilated convolution and depthwise separable convolutions, HC-Mamba is able to process large-scale medical image data at a much lower computational cost while maintaining a high level of performance. We conduct comprehensive experiments on segmentation tasks including organ segmentation and skin lesion, and conduct extensive experiments on Synapse, ISIC17 and ISIC18 to demonstrate the potential of the HC-Mamba model in medical image segmentation. The experimental results show that HC-Mamba exhibits competitive performance on all these datasets, thereby proving its effectiveness and usefulness in medical image segmentation. △ Less

Submitted 11 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.00736 [pdf, other]

Joint Signal Detection and Automatic Modulation Classification via Deep Learning

Authors: Huijun Xing, Xuhui Zhang, Shuo Chang, **ke Ren, Zixun Zhang, Jie Xu, Shuguang Cui

Abstract: Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different ca… ▽ More Signal detection and modulation classification are two crucial tasks in various wireless communication systems. Different from prior works that investigate them independently, this paper studies the joint signal detection and automatic modulation classification (AMC) by considering a realistic and complex scenario, in which multiple signals with different modulation schemes coexist at different carrier frequencies. We first generate a coexisting RADIOML dataset (CRML23) to facilitate the joint design. Different from the publicly available AMC dataset ignoring the signal detection step and containing only one signal, our synthetic dataset covers the more realistic multiple-signal coexisting scenario. Then, we present a joint framework for detection and classification (JDM) for such a multiple-signal coexisting environment, which consists of two modules for signal detection and AMC, respectively. In particular, these two modules are interconnected using a designated data structure called "proposal". Finally, we conduct extensive simulations over the newly developed dataset, which demonstrate the effectiveness of our designs. Our code and dataset are now available as open-source (https://github.com/Singingkettle/ChangShuoRadioData). △ Less

Submitted 29 April, 2024; originally announced May 2024.

arXiv:2404.19265 [pdf, other]

Map** New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation

Authors: Zhenglin Li, Bo Guan, Yuanzhou Wei, Yiming Zhou, **gyu Zhang, **xin Xu

Abstract: Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail th… ▽ More Generative Adversarial Networks (GANs) have significantly advanced image processing, with Pix2Pix being a notable framework for image-to-image translation. This paper explores a novel application of Pix2Pix to transform abstract map images into realistic ground truth images, addressing the scarcity of such images crucial for domains like urban planning and autonomous vehicle training. We detail the Pix2Pix model's utilization for generating high-fidelity datasets, supported by a dataset of paired map and aerial images, and enhanced by a tailored training regimen. The results demonstrate the model's capability to accurately render complex urban features, establishing its efficacy and potential for broad real-world applications. △ Less

Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16289 [pdf, other]

Deep Joint CSI Feedback and Multiuser Precoding for MIMO OFDM Systems

Authors: Yiran Guo, Wei Chen, Jialong Xu, Lun Li, Bo Ai

Abstract: The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an e… ▽ More The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an end-to-end manner. Specifically, the eigenvectors of the CSI matrix are compressed using deep joint source-channel coding techniques. This compression method enhances the resilience of the feedback CSI information against degradation in the feedback channel. A joint multiuser precoding module and a power allocation module are designed to adjust the precoding direction and the precoding power for users based on the feedback CSI information. Experimental results demonstrate that the downlink sum-rate can be significantly improved by using the proposed method, especially in scenarios with low signal-to-noise ratio and low feedback overhead. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13536 [pdf, other]

Joint Transmit and Reflective Beamforming for Multi-Active-IRS-Assisted Cooperative Sensing

Authors: Yuan Fang, Xianghao Yu, Jie Xu

Abstract: This paper studies multi-active intelligent-reflecting-surface (IRS) cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to help the base station (BS) provide multi-view sensing. We focus on the scenario where the sensing target is located in the non-line-of-sight (NLoS) area of the BS. Based on the received echo signal, the BS aims to estimate the target's dire… ▽ More This paper studies multi-active intelligent-reflecting-surface (IRS) cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to help the base station (BS) provide multi-view sensing. We focus on the scenario where the sensing target is located in the non-line-of-sight (NLoS) area of the BS. Based on the received echo signal, the BS aims to estimate the target's direction-of-arrival (DoA) with respect to each IRS. In addition, we leverage active IRSs to overcome the severe path loss induced by multi-hop reflections. Under this setup, we minimize the maximum Cramér-Rao bound (CRB) among all IRSs by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum transmit power and the maximum power amplification gain at individual IRSs. To tackle the resulting highly non-convex max-CRB minimization problem, we propose an efficient algorithm based on alternating optimization, successive convex approximation, and semi-definite relaxation, to obtain a high-quality solution. Finally, numerical results are provided to verify the effectiveness of our proposed design and the benefits of active IRS-assisted sensing compared to the counterpart with passive IRSs. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13391 [pdf, other]

Online Planning of Power Flows for Power Systems Against Bushfires Using Spatial Context

Authors: Jianyu Xu, Qiuzhuang Sun, Yang Yang, Huadong Mo, Daoyi Dong

Abstract: The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the s… ▽ More The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the stochastic nature of bushfire spread, we develop a model to capture such dynamics based on Moore's neighborhood model. Under a periodic inspection scheme that reveals the in-situ bushfire status, we propose an online optimization modeling framework that sequentially plans the power flows in the electricity network. Our framework assumes that the spread of bushfires is non-stationary over time, and the spread and containment probabilities are unknown. To meet these challenges, we develop a contextual online learning algorithm that treats the in-situ geographical information of the bushfire as a 'spatial context'. The online learning algorithm learns the unknown probabilities sequentially based on the observed data and then makes the OPF decision accordingly. The sequential OPF decisions aim to minimize the regret function, which is defined as the cumulative loss against the clairvoyant strategy that knows the true model parameters. We provide a theoretical guarantee of our algorithm by deriving a bound on the regret function, which outperforms the regret bound achieved by other benchmark algorithms. Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.10233 [pdf, ps, other]

Little Pilot is Needed for Channel Estimation with Integrated Super-Resolution Sensing and Communication

Authors: **gran Xu, Huizhi Wang, Yong Zeng, Xiaoli Xu

Abstract: Integrated super-resolution sensing and communication (ISSAC) is a promising technology to achieve extremely high sensing performance for critical parameters, such as the angles of the wireless channels. In this paper, we propose an ISSAC-based channel estimation method, which requires little or even no pilot, yet still achieves accurate channel state information (CSI) estimation. The key idea is… ▽ More Integrated super-resolution sensing and communication (ISSAC) is a promising technology to achieve extremely high sensing performance for critical parameters, such as the angles of the wireless channels. In this paper, we propose an ISSAC-based channel estimation method, which requires little or even no pilot, yet still achieves accurate channel state information (CSI) estimation. The key idea is to exploit the fact that subspace-based super-resolution algorithms such as multiple signal classification (MUSIC) do not require a priori known pilots for accurate parameter estimation. Therefore, in the proposed method, the angles of the multi-path channel components are first estimated in a pilot-free manner while communication data symbols are sent. After that, the multi-path channel coefficients are estimated, where very little pilots are needed. The reasons are two folds. First, compared to the conventional channel estimation methods purely relying on channel training, much fewer parameters need to be estimated once the multi-path angles are accurately estimated. Besides, with angles obtained, the beamforming gain is also enjoyed when pilots are sent to estimate the channel path gains. To rigorously study the performance of the proposed method, we first consider the basic line-of-sight (LoS) channel. By analyzing the minimum mean square error (MMSE) of channel estimation and the resulting beamforming gains, we show that our proposed method significantly outperforms the conventional methods purely based on channel training. We then extend the study to the more general multipath channels. Simulation results are provided to demonstrate our theoretical results. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures, accepted by IEEE WCNC 2024 workshops

arXiv:2404.09140 [pdf, other]

RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion

Authors: Guoxuan Chi, Zheng Yang, Chenshu Wu, **gao Xu, Yuchong Gao, Yunhao Liu, Tony Xiao Han

Abstract: Along with AIGC shines in CV and NLP, its potential in the wireless domain has also emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for generating high-quality, time-series RF data due to limited representation capabilities. In this work, inspired by the stellar achievements of the diffusion model in CV and NLP, we adapt it to the RF domain and propose RF-Dif… ▽ More Along with AIGC shines in CV and NLP, its potential in the wireless domain has also emerged in recent years. Yet, existing RF-oriented generative solutions are ill-suited for generating high-quality, time-series RF data due to limited representation capabilities. In this work, inspired by the stellar achievements of the diffusion model in CV and NLP, we adapt it to the RF domain and propose RF-Diffusion. To accommodate the unique characteristics of RF signals, we first introduce a novel Time-Frequency Diffusion theory to enhance the original diffusion model, enabling it to tap into the information within the time, frequency, and complex-valued domains of RF signals. On this basis, we propose a Hierarchical Diffusion Transformer to translate the theory into a practical generative DNN through elaborated design spanning network architecture, functional block, and complex-valued operator, making RF-Diffusion a versatile solution to generate diverse, high-quality, and time-series RF data. Performance comparison with three prevalent generative models demonstrates the RF-Diffusion's superior performance in synthesizing Wi-Fi and FMCW signals. We also showcase the versatility of RF-Diffusion in boosting Wi-Fi sensing systems and performing channel estimation in 5G networks. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted by MobiCom 2024

ACM Class: I.2.0

arXiv:2404.07448 [pdf, other]

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Authors: **gxuan Xu, Wuyang Chen, Yao Zhao, Yunchao Wei

Abstract: Recent success of pre-trained foundation vision-language models makes Open-Vocabulary Segmentation (OVS) possible. Despite the promising performance, this approach introduces heavy computational overheads for two challenges: 1) large model sizes of the backbone; 2) expensive costs during the fine-tuning. These challenges hinder this OVS strategy from being widely applicable and affordable in real-… ▽ More Recent success of pre-trained foundation vision-language models makes Open-Vocabulary Segmentation (OVS) possible. Despite the promising performance, this approach introduces heavy computational overheads for two challenges: 1) large model sizes of the backbone; 2) expensive costs during the fine-tuning. These challenges hinder this OVS strategy from being widely applicable and affordable in real-world scenarios. Although traditional methods such as model compression and efficient fine-tuning can address these challenges, they often rely on heuristics. This means that their solutions cannot be easily transferred and necessitate re-training on different models, which comes at a cost. In the context of efficient OVS, we target achieving performance that is comparable to or even better than prior OVS works based on large vision-language foundation models, by utilizing smaller models that incur lower training costs. The core strategy is to make our efficiency principled and thus seamlessly transferable from one OVS framework to others without further customization. Comprehensive experiments on diverse OVS benchmarks demonstrate our superior trade-off between segmentation accuracy and computation costs over previous works. Our code is available on https://github.com/Xujxyang/OpenTrans △ Less

Submitted 3 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.03915 [pdf, other]

Nonlinear Kalman Filtering based on Self-Attention Mechanism and Lattice Trajectory Piecewise Linear Approximation

Authors: Jiaming Wang, Xinyu Geng, Jun Xu

Abstract: The traditional Kalman filter (KF) is widely applied in control systems, but it relies heavily on the accuracy of the system model and noise parameters, leading to potential performance degradation when facing inaccuracies. To address this issue, introducing neural networks into the KF framework offers a data-driven solution to compensate for these inaccuracies, improving the filter's performance… ▽ More The traditional Kalman filter (KF) is widely applied in control systems, but it relies heavily on the accuracy of the system model and noise parameters, leading to potential performance degradation when facing inaccuracies. To address this issue, introducing neural networks into the KF framework offers a data-driven solution to compensate for these inaccuracies, improving the filter's performance while maintaining interpretability. Nevertheless, existing studies mostly employ recurrent neural network (RNN), which fails to fully capture the dependencies among state sequences and lead to an unstable training process. In this paper, we propose a novel Kalman filtering algorithm named the attention Kalman filter (AtKF), which incorporates a self-attention network to capture the dependencies among state sequences. To address the instability in the recursive training process, a parallel pre-training strategy is devised. Specifically, this strategy involves piecewise linearizing the system via lattice trajectory piecewise linear (LTPWL) expression, and generating pre-training data through a batch estimation algorithm, which exploits the self-attention mechanism's parallel processing ability. Experimental results on a two-dimensional nonlinear system demonstrate that AtKF outperforms other filters under noise disturbances and model mismatches. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures

arXiv:2403.16353 [pdf, other]

Energy-Efficient Hybrid Beamforming with Dynamic On-off Control for Integrated Sensing, Communications, and Powering

Authors: Zeyu Hao, Yuan Fang, Xianghao Yu, Jie Xu, Ling Qiu, Lexi Xu, Shuguang Cui

Abstract: This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy… ▽ More This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy receivers (ERs) at the same time. To facilitate the energy-efficient design, we present a novel HAD architecture for the BS transmitter, which allows dynamic on-off control of its radio frequency (RF) chains and analog phase shifters (PSs) through a switch network. We also consider a practical and comprehensive power consumption model for the BS, by taking into account the power-dependent non-linear power amplifier (PA) efficiency, and the on-off non-transmission power consumption model of RF chains and PSs. We jointly design the hybrid beamforming and dynamic on-off control at the BS, aiming to minimize its total power consumption, while guaranteeing the performance requirements on communication rates, sensing Cramér-Rao bound (CRB), and harvested power levels. The formulation also takes into consideration the per-antenna transmit power constraint and the constant modulus constraints for the analog beamformer at the BS. The resulting optimization problem for ISCAP is highly non-convex. Please refer to the paper for a complete abstract. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: 13 pages, 6 figures, submitted to IEEE Transactions on Communications

arXiv:2403.15971 [pdf, other]

PSHop: A Lightweight Feed-Forward Method for 3D Prostate Gland Segmentation

Authors: Yi**g Yang, Vasileios Magoulianitis, Jiaxin Yang, **tang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

Abstract: Automatic prostate segmentation is an important step in computer-aided diagnosis of prostate cancer and treatment planning. Existing methods of prostate segmentation are based on deep learning models which have a large size and lack of transparency which is essential for physicians. In this paper, a new data-driven 3D prostate segmentation method on MRI is proposed, named PSHop. Different from dee… ▽ More Automatic prostate segmentation is an important step in computer-aided diagnosis of prostate cancer and treatment planning. Existing methods of prostate segmentation are based on deep learning models which have a large size and lack of transparency which is essential for physicians. In this paper, a new data-driven 3D prostate segmentation method on MRI is proposed, named PSHop. Different from deep learning based methods, the core methodology of PSHop is a feed-forward encoder-decoder system based on successive subspace learning (SSL). It consists of two modules: 1) encoder: fine to coarse unsupervised representation learning with cascaded VoxelHop units, 2) decoder: coarse to fine segmentation prediction with voxel-wise classification and local refinement. Experiments are conducted on the publicly available ISBI-2013 dataset, as well as on a larger private one. Experimental analysis shows that our proposed PSHop is effective, robust and lightweight in the tasks of prostate gland and zonal segmentation, achieving a Dice Similarity Coefficient (DSC) of 0.873 for the gland segmentation task. PSHop achieves a competitive performance comparatively to other deep learning methods, while kee** the model size and inference complexity an order of magnitude smaller. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 11 pages, 5 figures, 5 tables

arXiv:2403.15969 [pdf, other]

PCa-RadHop: A Transparent and Lightweight Feed-forward Method for Clinically Significant Prostate Cancer Segmentation

Authors: Vasileios Magoulianitis, Jiaxin Yang, Yi**g Yang, **tang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

Abstract: Prostate Cancer is one of the most frequently occurring cancers in men, with a low survival rate if not early diagnosed. PI-RADS reading has a high false positive rate, thus increasing the diagnostic incurred costs and patient discomfort. Deep learning (DL) models achieve a high segmentation performance, although require a large model size and complexity. Also, DL models lack of feature interpreta… ▽ More Prostate Cancer is one of the most frequently occurring cancers in men, with a low survival rate if not early diagnosed. PI-RADS reading has a high false positive rate, thus increasing the diagnostic incurred costs and patient discomfort. Deep learning (DL) models achieve a high segmentation performance, although require a large model size and complexity. Also, DL models lack of feature interpretability and are perceived as ``black-boxes" in the medical field. PCa-RadHop pipeline is proposed in this work, aiming to provide a more transparent feature extraction process using a linear model. It adopts the recently introduced Green Learning (GL) paradigm, which offers a small model size and low complexity. PCa-RadHop consists of two stages: Stage-1 extracts data-driven radiomics features from the bi-parametric Magnetic Resonance Imaging (bp-MRI) input and predicts an initial heatmap. To reduce the false positive rate, a subsequent stage-2 is introduced to refine the predictions by including more contextual information and radiomics features from each already detected Region of Interest (ROI). Experiments on the largest publicly available dataset, PI-CAI, show a competitive performance standing of the proposed method among other deep DL models, achieving an area under the curve (AUC) of 0.807 among a cohort of 1,000 patients. Moreover, PCa-RadHop maintains orders of magnitude smaller model size and complexity. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, 5 tables

arXiv:2403.14911 [pdf, ps, other]

doi 10.1109/GCWkshps58843.2023.10465162

Secure Outage Analysis for RIS-Aided MISO Systems with Randomly Located Eavesdroppers

Authors: Wei Shi, **dan Xu, Wei Xu, Chau Yuen, A. Lee Swindlehurst, Xiaohu You, Chunming Zhao

Abstract: In this paper, we consider the physical layer security of an RIS-assisted multiple-antenna communication system with randomly located eavesdroppers. The exact distributions of the received signal-to-noise-ratios (SNRs) at the legitimate user and the eavesdroppers located according to a Poisson point process (PPP) are derived, and a closed-form expression for the secrecy outage probability (SOP) is… ▽ More In this paper, we consider the physical layer security of an RIS-assisted multiple-antenna communication system with randomly located eavesdroppers. The exact distributions of the received signal-to-noise-ratios (SNRs) at the legitimate user and the eavesdroppers located according to a Poisson point process (PPP) are derived, and a closed-form expression for the secrecy outage probability (SOP) is obtained. It is revealed that the secrecy performance is mainly affected by the number of RIS reflecting elements, and the impact of the transmit antennas and transmit power at the base station is marginal. In addition, when the locations of the randomly located eavesdroppers are unknown, deploying the RIS closer to the legitimate user rather than to the base station is shown to be more efficient. We also perform an analytical study demonstrating that the secrecy diversity order depends on the path loss exponent of the RIS-to-ground links. Finally, numerical simulations are conducted to verify the accuracy of these theoretical observations. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by 2023 IEEE Globecom Workshops (GC Wkshps). arXiv admin note: substantial text overlap with arXiv:2312.16814

arXiv:2403.13648 [pdf, other]

Priority-based Energy Allocation in Buildings for Distributed Model Predictive Control

Authors: Hongyi Li, Jun Xu

Abstract: Many countries are facing energy shortage today and most of the global energy is consumed by HVAC systems in buildings. For the scenarios where the energy system is not sufficiently supplied to HVAC systems, a priority-based allocation scheme based on distributed model predictive control is proposed in this paper, which distributes the energy rationally based on priority order. According to the sc… ▽ More Many countries are facing energy shortage today and most of the global energy is consumed by HVAC systems in buildings. For the scenarios where the energy system is not sufficiently supplied to HVAC systems, a priority-based allocation scheme based on distributed model predictive control is proposed in this paper, which distributes the energy rationally based on priority order. According to the scenarios, two distributed allocation strategies, i.e., one-to-one priority strategy and multi-to-one priority strategy, are developed in this paper and validated by simulation in a building containing three zones and a building containing 36 rooms, respectively. Both strategies fully exploit the potential of predictive control solutions. The experiment shows that our scheme has good scalability and achieve the performance of centralized strategy while making the calculation tractable. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13601 [pdf, other]

Lattice piecewise affine approximation of explicit model predictive control with application to satellite attitude control

Authors: Zhengqi Xu, Jun Xu, Ai-Guo Wu, Shuning Wang

Abstract: Satellite attitude cotrol is a crucial part of aerospace technology, and model predictive control(MPC) is one of the most promising controllers in this area, which will be less effective if real-time online optimization can not be achieved. Explicit MPC converts the online calculation into a table lookup process, however the solution is difficult to obtain if the system dimension is high or the co… ▽ More Satellite attitude cotrol is a crucial part of aerospace technology, and model predictive control(MPC) is one of the most promising controllers in this area, which will be less effective if real-time online optimization can not be achieved. Explicit MPC converts the online calculation into a table lookup process, however the solution is difficult to obtain if the system dimension is high or the constraints are complex. The lattice piecewise affine(PWA) function was used to represent the control law of explicit MPC, although the online calculation complexity is reduced, the offline calculation is still prohibitive for complex problems. In this paper, we use the sample points in the feasible region with their corresponding affine functions to construct the lattice PWA approximation of the optimal MPC controller designed for satellite attitude control. The asymptotic stability of satellite attitude control system under lattice PWA approximation has been proven, and simulations are executed to verify that the proposed method can achieve almost the same performance as linear online MPC with much lower online computational complexity and use less fuel than LQR method. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.11953 [pdf, other]

Advancing COVID-19 Detection in 3D CT Scans

Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

Abstract: To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno… ▽ More To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11809 [pdf, other]

Sensing-Enhanced Channel Estimation for Near-Field XL-MIMO Systems

Authors: Shicong Liu, Xianghao Yu, Zhen Gao, Jie Xu, Derrick Wing Kwan Ng, Shuguang Cui

Abstract: Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channe… ▽ More Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. The spherical wavefront characteristics in the near field introduce additional degrees of freedom (DoFs), namely distance and angle, into the channel model, which leads to unique challenges in channel estimation (CE). In this paper, we propose a new sensing-enhanced uplink CE scheme for near-field XL-MIMO, which notably reduces the required quantity of baseband samples and the dictionary size. In particular, we first propose a sensing method that can be accomplished in a single time slot. It employs power sensors embedded within the antenna elements to measure the received power pattern rather than baseband samples. A time inversion algorithm is then proposed to precisely estimate the locations of users and scatterers, which offers a substantially lower computational complexity. Based on the estimated locations from sensing, a novel dictionary is then proposed by considering the eigen-problem based on the near-field transmission model, which facilitates efficient near-field CE with less baseband sampling and a more lightweight dictionary. Moreover, we derive the general form of the eigenvectors associated with the near-field channel matrix, revealing their noteworthy connection to the discrete prolate spheroidal sequence (DPSS). Simulation results unveil that the proposed time inversion algorithm achieves accurate localization with power measurements only, and remarkably outperforms various widely-adopted algorithms in terms of computational complexity. Furthermore, the proposed eigen-dictionary considerably improves the accuracy in CE with a compact dictionary size and a drastic reduction in baseband samples by up to 77%. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 14 pages, 9 figures

arXiv:2403.11498 [pdf, other]

Domain Adaptation Using Pseudo Labels for COVID-19 Detection

Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

Abstract: In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he… ▽ More In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent health crises. The innovative approach of generating pseudo labels enables the model to iteratively refine its learning process, thereby improving its accuracy and adaptability across different hospitals and medical centres. Experimental results on COV19-CT-DB database showcase the model's potential to achieve high diagnostic precision, significantly contributing to efficient patient management and alleviating the strain on healthcare systems. Our method achieves 0.92 Macro F1 Score on the validation set of Covid-19 domain adaptation challenge. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08758 [pdf]

Spatiotemporal Diffusion Model with Paired Sampling for Accelerated Cardiac Cine MRI

Authors: Shihan Qiu, Shaoyan Pan, Yikang Liu, Lin Zhao, Jian Xu, Qi Liu, Terrence Chen, Eric Z. Chen, Xiao Chen, Shanhui Sun

Abstract: Current deep learning reconstruction for accelerated cardiac cine MRI suffers from spatial and temporal blurring. We aim to improve image sharpness and motion delineation for cine MRI under high undersampling rates. A spatiotemporal diffusion enhancement model conditional on an existing deep learning reconstruction along with a novel paired sampling strategy was developed. The diffusion model prov… ▽ More Current deep learning reconstruction for accelerated cardiac cine MRI suffers from spatial and temporal blurring. We aim to improve image sharpness and motion delineation for cine MRI under high undersampling rates. A spatiotemporal diffusion enhancement model conditional on an existing deep learning reconstruction along with a novel paired sampling strategy was developed. The diffusion model provided sharper tissue boundaries and clearer motion than the original reconstruction in experts evaluation on clinical data. The innovative paired sampling strategy substantially reduced artificial noises in the generative results. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08749 [pdf]

Clinically Feasible Diffusion Reconstruction for Highly-Accelerated Cardiac Cine MRI

Authors: Shihan Qiu, Shaoyan Pan, Yikang Liu, Lin Zhao, Jian Xu, Qi Liu, Terrence Chen, Eric Z. Chen, Xiao Chen, Shanhui Sun

Abstract: The currently limited quality of accelerated cardiac cine reconstruction may potentially be improved by the emerging diffusion models, but the clinically unacceptable long processing time poses a challenge. We aim to develop a clinically feasible diffusion-model-based reconstruction pipeline to improve the image quality of cine MRI. A multi-in multi-out diffusion enhancement model together with fa… ▽ More The currently limited quality of accelerated cardiac cine reconstruction may potentially be improved by the emerging diffusion models, but the clinically unacceptable long processing time poses a challenge. We aim to develop a clinically feasible diffusion-model-based reconstruction pipeline to improve the image quality of cine MRI. A multi-in multi-out diffusion enhancement model together with fast inference strategies were developed to be used in conjunction with a reconstruction model. The diffusion reconstruction reduced spatial and temporal blurring in prospectively undersampled clinical data, as validated by experts inspection. The 1.5s per video processing time enabled the approach to be applied in clinical scenarios. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.07923 [pdf]

The Fusion of Deep Reinforcement Learning and Edge Computing for Real-time Monitoring and Control Optimization in IoT Environments

Authors: **gyu Xu, Weixiang Wan, Linying Pan, Wenjian Sun, Yuxiang Liu

Abstract: In response to the demand for real-time performance and control quality in industrial Internet of Things (IoT) environments, this paper proposes an optimization control system based on deep reinforcement learning and edge computing. The system leverages cloud-edge collaboration, deploys lightweight policy networks at the edge, predicts system states, and outputs controls at a high frequency, enabl… ▽ More In response to the demand for real-time performance and control quality in industrial Internet of Things (IoT) environments, this paper proposes an optimization control system based on deep reinforcement learning and edge computing. The system leverages cloud-edge collaboration, deploys lightweight policy networks at the edge, predicts system states, and outputs controls at a high frequency, enabling monitoring and optimization of industrial objectives. Additionally, a dynamic resource allocation mechanism is designed to ensure rational scheduling of edge computing resources, achieving global optimization. Results demonstrate that this approach reduces cloud-edge communication latency, accelerates response to abnormal situations, reduces system failure rates, extends average equipment operating time, and saves costs for manual maintenance and replacement. This ensures real-time and stable control. △ Less

Submitted 28 February, 2024; originally announced March 2024.

arXiv:2403.06993 [pdf]

Automatic driving lane change safety prediction model based on LSTM

Authors: Wenjian Sun, Linying Pan, **gyu Xu, Weixiang Wan, Yong Wang

Abstract: Autonomous driving technology can improve traffic safety and reduce traffic accidents. In addition, it improves traffic flow, reduces congestion, saves energy and increases travel efficiency. In the relatively mature automatic driving technology, the automatic driving function is divided into several modules: perception, decision-making, planning and control, and a reasonable division of labor can… ▽ More Autonomous driving technology can improve traffic safety and reduce traffic accidents. In addition, it improves traffic flow, reduces congestion, saves energy and increases travel efficiency. In the relatively mature automatic driving technology, the automatic driving function is divided into several modules: perception, decision-making, planning and control, and a reasonable division of labor can improve the stability of the system. Therefore, autonomous vehicles need to have the ability to predict the trajectory of surrounding vehicles in order to make reasonable decision planning and safety measures to improve driving safety. By using deep learning method, a safety-sensitive deep learning model based on short term memory (LSTM) network is proposed. This model can alleviate the shortcomings of current automatic driving trajectory planning, and the output trajectory not only ensures high accuracy but also improves safety. The cell state simulation algorithm simulates the trackability of the trajectory generated by this model. The research results show that compared with the traditional model-based method, the trajectory prediction method based on LSTM network has obvious advantages in predicting the trajectory in the long time domain. The intention recognition module considering interactive information has higher prediction and accuracy, and the algorithm results show that the trajectory is very smooth based on the premise of safe prediction and efficient lane change. And autonomous vehicles can efficiently and safely complete lane changes. △ Less

Submitted 28 February, 2024; originally announced March 2024.

arXiv:2403.05906 [pdf, other]

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Authors: **gyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao

Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing… ▽ More Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 13 pages, 10 figures, conference or other essential info

arXiv:2403.05179 [pdf, ps, other]

Device Fault Prediction Model based on LSTM and Random Forest

Authors: **g Xu, Yongbo Zhang

Abstract: The quality of power grid equipment forms the material foundation for the safety of the large power grid. Ensuring the quality of equipment entering the grid is a core task in material management. Currently, the inspection of incoming materials involves the generation of sampling plans, sampling, sealing, sample delivery, and testing. Due to the lack of a comprehensive control system and effective… ▽ More The quality of power grid equipment forms the material foundation for the safety of the large power grid. Ensuring the quality of equipment entering the grid is a core task in material management. Currently, the inspection of incoming materials involves the generation of sampling plans, sampling, sealing, sample delivery, and testing. Due to the lack of a comprehensive control system and effective control measures, it is not possible to trace the execution process of business operations afterward. This inability to trace hampers the investigation of testing issues and risk control, as it lacks effective data support. Additionally, a significant amount of original record information for key parameters in the testing process, which is based on sampling operation standards, has not been effectively utilized. To address these issues, we conduct researches on key monitoring technologies in the typical material inspection process based on the Internet of Things (IoT) and analyze the key parameters in inspection results. For purpose of complete the above tasks, this paper investigates the use of Long Short-Term Memory (LSTM) algorithms for quality prediction in material equipment based on key inspection parameters. In summary, this paper aims to provide professional and reliable quality data support for various business processes within the company, including material procurement, engineering construction, and equipment operation. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05079 [pdf, ps, other]

Sampling Model for Grid Material Inspection Based on Analytic Hierarchy Process with Absolute Measurement

Authors: **g Xu, Yongbo Zhang

Abstract: The quality of power grid equipment forms the material foundation for the safety of the large power grid. Ensuring the quality of equipment entering the grid is a core task in material management. Currently, the inspection of incoming materials involves the generation of sampling plans, sampling, sealing, sample delivery, and testing. Due to the lack of a comprehensive control system and effective… ▽ More The quality of power grid equipment forms the material foundation for the safety of the large power grid. Ensuring the quality of equipment entering the grid is a core task in material management. Currently, the inspection of incoming materials involves the generation of sampling plans, sampling, sealing, sample delivery, and testing. Due to the lack of a comprehensive control system and effective control measures, it is not possible to trace the execution process of business operations afterward. This inability to trace hampers the investigation of testing issues and risk control, as it lacks effective data support. Additionally, a significant amount of original record information for key parameters in the testing process, which is based on sampling operation standards, has not been effectively utilized. To address these issues, we conduct researches on key monitoring technologies in the typical material inspection process based on the Internet of Things (IoT) and analyze the key parameters in inspection results. For purpose of complete the above tasks, this paper investigates the use of Long Short-Term Memory (LSTM) algorithms for quality prediction in material equipment based on key inspection parameters. In summary, this paper aims to provide professional and reliable quality data support for various business processes within the company, including material procurement, engineering construction, and equipment operation. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05076 [pdf]

Correlation analysis technique of key parameters for transformer material inspection based on FP-tree and knowledge graph

Authors: **g Xu, Yongbo Zhang

Abstract: As one of the key equipment in the distribution system, the distribution transformer directly affects the reliability of the user power supply. The probability of accidents occurring in the operation of transformer equipment is high, so it has become a focus of material inspection in recent years. However, the large amount of raw data from sample testing is not being used effectively. Given the ab… ▽ More As one of the key equipment in the distribution system, the distribution transformer directly affects the reliability of the user power supply. The probability of accidents occurring in the operation of transformer equipment is high, so it has become a focus of material inspection in recent years. However, the large amount of raw data from sample testing is not being used effectively. Given the above problems, this paper aims to mine the relationship between the unqualified distribution transformer inspection items by using the association rule algorithm based on the distribution transformer inspection data collected from 2017 to 2021 and sorting out the key inspection items. At the same time, the unqualified judgment basis of the relevant items is given, and the internal relationship between the inspection items is clarified to a certain extent. Furthermore, based on material and equipment inspection reports, correlations between failed inspection items, and expert knowledge, the knowledge graph of material equipment inspection management is constructed in the graph database Neo4j. The experimental results show that the FP-Growth method performs significantly better than the Apriori method and can accurately assess the relationship between failed distribution transformer inspection items. Finally, the knowledge graph network is visualized to provide a systematic knowledge base for material inspection, which is convenient for knowledge query and management. This method can provide a scientific guidance program for operation and maintenance personnel to do equipment maintenance and also offers a reference for the state evaluation of other high-voltage equipment. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04145 [pdf, other]

A Crosstalk-Aware Timing Prediction Method in Routing

Authors: Leilei **, Jiajie Xu, Wenjie Fu, Hao Yan, Longxing Shi

Abstract: With shrinking interconnect spacing in advanced technology nodes, existing timing predictions become less precise due to the challenging quantification of crosstalk-induced delay. During the routing, the crosstalk effect is typically modeled by predicting coupling capacitance with congestion information. However, the timing estimation tends to be overly pessimistic, as the crosstalk-induced delay… ▽ More With shrinking interconnect spacing in advanced technology nodes, existing timing predictions become less precise due to the challenging quantification of crosstalk-induced delay. During the routing, the crosstalk effect is typically modeled by predicting coupling capacitance with congestion information. However, the timing estimation tends to be overly pessimistic, as the crosstalk-induced delay depends not only on the coupling capacitance but also on the signal arrival time. This work presents a crosstalk-aware timing estimation method using a two-step machine learning approach. Interconnects that are physically adjacent and overlap in signal timing windows are filtered first. Crosstalk delay is predicted by integrating physical topology and timing features without relying on post-routing results and the parasitic extraction. Experimental results show a match rate of over 99% for identifying crosstalk nets compared to the commercial tool on the OpenCores benchmarks, with prediction results being more accurate than those of other state-of-the-art methods. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 6 pages, 8 figures

ACM Class: I.6.4; B.7.3

Showing 1–50 of 413 results for author: Xue, J