Search | arXiv e-print repository

EFCNet: Every Feature Counts for Small Medical Object Segmentation

Authors: Lingjie Kong, Qiaoling Wei, Chengming Xu, Han Chen, Yanwei Fu

Abstract: This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation… ▽ More This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation may be attributed to information loss during their encoding and decoding process. In response to this challenge, we propose a novel model named EFCNet for small object segmentation in medical images. Our model incorporates two modules: the Cross-Stage Axial Attention Module (CSAA) and the Multi-Precision Supervision Module (MPS). These modules address information loss during encoding and decoding procedures, respectively. Specifically, CSAA integrates features from all stages of the encoder to adaptively learn suitable information needed in different decoding stages, thereby reducing information loss in the encoder. On the other hand, MPS introduces a novel multi-precision supervision mechanism to the decoder. This mechanism prioritizes attention to low-resolution features in the initial stages of the decoder, mitigating information loss caused by subsequent convolution and sampling processes and enhancing the model's global perception. We evaluate our model on two benchmark medical image datasets. The results demonstrate that EFCNet significantly outperforms previous segmentation methods designed for both medical and normal images. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18102 [pdf]

A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wen**g Xu, Huixiang Zhi

Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18088 [pdf, other]

LLM-Driven Multimodal Opinion Expression Identification

Authors: Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang

Abstract: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world s… ▽ More Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world scenarios. Utilizing CMU MOSEI and IEMOCAP datasets, we construct the CI-MOEI dataset. Additionally, Text-to-Speech (TTS) technology is applied to the MPQA dataset to obtain the CIM-OEI dataset. We design a template for the OEI task to take full advantage of the generative power of large language models (LLMs). Advancing further, we propose an LLM-driven method STOEI, which combines speech and text modal to identify opinion expressions. Our experiments demonstrate that MOEI significantly improves the performance while our method outperforms existing methods by 9.20\% and obtains SOTA results. △ Less

Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 Figures, Accept by Interspeech 2024

arXiv:2406.18018 [pdf, other]

A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in imaging data from various captured periods of lung cancer. If the evolution patterns of nodules across various periods in the patients' CT sequences can be explored, it will play a crucial role in guiding the precise screening identification of lung cancer. Therefore, a cross spatio-temporal lung nodule dataset with pathological information for nodule identification and diagnosis is constructed, which contains 328 CT sequences and 362 annotated nodules from 109 patients. This comprehensive database is intended to drive research in the field of CAD towards more practical and robust methods, and also contribute to the further exploration of precision medicine related field. To ensure patient confidentiality, we have removed sensitive information from the dataset. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17950 [pdf, other]

V2X Sidelink Positioning in FR1: From Ray-Tracing and Channel Estimation to Bayesian Tracking

Authors: Yu Ge, Maximilian Stark, Musa Furkan Keskin, Hui Chen, Guillaume Jornod, Thomas Hansen, Frank Hofmann, Henk Wymeersch

Abstract: Sidelink positioning research predominantly focuses on the snapshot positioning problem, often within the mmWave band. Only a limited number of studies have delved into vehicle-to-anything (V2X) tracking within sub-6 GHz bands. In this paper, we investigate the V2X sidelink tracking challenges over sub-6 GHz frequencies. We propose a Kalman-filter-based tracking approach that leverages the estimat… ▽ More Sidelink positioning research predominantly focuses on the snapshot positioning problem, often within the mmWave band. Only a limited number of studies have delved into vehicle-to-anything (V2X) tracking within sub-6 GHz bands. In this paper, we investigate the V2X sidelink tracking challenges over sub-6 GHz frequencies. We propose a Kalman-filter-based tracking approach that leverages the estimated error covariance lower bounds (EECLBs) as measurement covariance, alongside a gating method to augment tracking performance. Through simulations employing ray-tracing data and super-resolution channel parameter estimation, we validate the feasibility of sidelink tracking using our proposed tracking filter with two novel EECLBs. Additionally, we demonstrate the efficacy of the gating method in identifying line-of-sight paths and enhancing tracking performance. △ Less

Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16942 [pdf, other]

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

arXiv:2406.16929 [pdf, other]

Modelling the 5G Energy Consumption using Real-world Data: Energy Fingerprint is All You Need

Authors: Tingwei Chen, Yantao Wang, Hanzhi Chen, Zijian Zhao, Xinhao Li, Nicola Piovesan, Guangxu Zhu, Qingjiang Shi

Abstract: The introduction of fifth-generation (5G) radio technology has revolutionized communications, bringing unprecedented automation, capacity, connectivity, and ultra-fast, reliable communications. However, this technological leap comes with a substantial increase in energy consumption, presenting a significant challenge. To improve the energy efficiency of 5G networks, it is imperative to develop sop… ▽ More The introduction of fifth-generation (5G) radio technology has revolutionized communications, bringing unprecedented automation, capacity, connectivity, and ultra-fast, reliable communications. However, this technological leap comes with a substantial increase in energy consumption, presenting a significant challenge. To improve the energy efficiency of 5G networks, it is imperative to develop sophisticated models that accurately reflect the influence of base station (BS) attributes and operational conditions on energy usage.Importantly, addressing the complexity and interdependencies of these diverse features is particularly challenging, both in terms of data processing and model architecture design. This paper proposes a novel 5G base stations energy consumption modelling method by learning from a real-world dataset used in the ITU 5G Base Station Energy Consumption Modelling Challenge in which our model ranked second. Unlike existing methods that omit the Base Station Identifier (BSID) information and thus fail to capture the unique energy fingerprint in different base stations, we incorporate the BSID into the input features and encoding it with an embedding layer for precise representation. Additionally, we introduce a novel masked training method alongside an attention mechanism to further boost the model's generalization capabilities and accuracy. After evaluation, our method demonstrates significant improvements over existing models, reducing Mean Absolute Percentage Error (MAPE) from 12.75% to 4.98%, leading to a performance gain of more than 60%. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.11519 [pdf, other]

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA, a vision transformer-based foundation model for HSI interpretation, scalable to over a billion parameters. To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450K, for pre-training, which contains about 450K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, and real-world applicability. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

arXiv:2406.10724 [pdf, other]

Beyond the Visible: Jointly Attending to Spectral and Spatial Dimensions with HSI-Diffusion for the FINCH Spacecraft

Authors: Ian Vyse, Rishit Dagli, Dav Vrat Chadha, John P. Ma, Hector Chen, Isha Ruparelia, Prithvi Seran, Matthew Xie, Eesa Aamer, Aidan Armstrong, Naveen Black, Ben Borstein, Kevin Caldwell, Orrin Dahanaggamaarachchi, Joe Dai, Abeer Fatima, Stephanie Lu, Maxime Michet, Anoushka Paul, Carrie Ann Po, Shivesh Prakash, Noa Prosser, Riddhiman Roy, Mirai Shinjo, Iliya Shofman , et al. (4 additional authors not shown)

Abstract: Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and… ▽ More Satellite remote sensing missions have gained popularity over the past fifteen years due to their ability to cover large swaths of land at regular intervals, making them ideal for monitoring environmental trends. The FINCH mission, a 3U+ CubeSat equipped with a hyperspectral camera, aims to monitor crop residue cover in agricultural fields. Although hyperspectral imaging captures both spectral and spatial information, it is prone to various types of noise, including random noise, stripe noise, and dead pixels. Effective denoising of these images is crucial for downstream scientific tasks. Traditional methods, including hand-crafted techniques encoding strong priors, learned 2D image denoising methods applied across different hyperspectral bands, or diffusion generative models applied independently on bands, often struggle with varying noise strengths across spectral bands, leading to significant spectral distortion. This paper presents a novel approach to hyperspectral image denoising using latent diffusion models that integrate spatial and spectral information. We particularly do so by building a 3D diffusion model and presenting a 3-stage training approach on real and synthetically crafted datasets. The proposed method preserves image structure while reducing noise. Evaluations on both popular hyperspectral denoising datasets and synthetically crafted datasets for the FINCH mission demonstrate the effectiveness of this approach. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: To appear in 38th Annual Small Satellite Conference

arXiv:2406.09873 [pdf, other]

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs, to improve model recognition of Chinese dysarthric speech. Experimental results from our Chinese dysarthric speech dataset demonstrate consistent improvements in recognition performance with Perceiver-Prompt. Relative reduction up to 13.04% in CER is obtained over the fine-tuned Whisper. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by interspeech 2024

arXiv:2406.09696 [pdf, other]

MoME: Mixture of Multimodal Experts for Cancer Survival Prediction

Authors: Conghao Xiong, Hao Chen, Hao Zheng, Dong Wei, Yefeng Zheng, Joseph J. Y. Sung, Irwin King

Abstract: Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making. There are two main challenges in this task: significant heterogeneity and complex inter- and intra-modal interactions between the two modalities. Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separa… ▽ More Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making. There are two main challenges in this task: significant heterogeneity and complex inter- and intra-modal interactions between the two modalities. Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separate encoding. However, these approaches are insufficient for modeling the complex task due to the heterogeneous nature between the modalities. To address these issues, we propose a Biased Progressive Encoding (BPE) paradigm, performing encoding and fusion simultaneously. This paradigm uses one modality as a reference when encoding the other. It enables deep fusion of the modalities through multiple alternating iterations, progressively reducing the cross-modal disparities and facilitating complementary interactions. Besides modality heterogeneity, survival analysis involves various biomarkers from WSIs, genomics, and their combinations. The critical biomarkers may exist in different modalities under individual variations, necessitating flexible adaptation of the models to specific scenarios. Therefore, we further propose a Mixture of Multimodal Experts (MoME) layer to dynamically selects tailored experts in each stage of the BPE paradigm. Experts incorporate reference information from another modality to varying degrees, enabling a balanced or biased focus on different modalities during the encoding process. Extensive experimental results demonstrate the superior performance of our method on various datasets, including TCGA-BLCA, TCGA-UCEC and TCGA-LUAD. Codes are available at https://github.com/BearCleverProud/MoME. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 8 + 1/2 pages, early accepted to MICCAI2024

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07399 [pdf, other]

Redefining Automotive Radar Imaging: A Domain-Informed 1D Deep Learning Approach for High-Resolution and Efficient Performance

Authors: Ruxin Zheng, Shunqiao Sun, Holger Caesar, Honglei Chen, Jian Li

Abstract: Millimeter-wave (mmWave) radars are indispensable for perception tasks of autonomous vehicles, thanks to their resilience in challenging weather conditions. Yet, their deployment is often limited by insufficient spatial resolution for precise semantic scene interpretation. Classical super-resolution techniques adapted from optical imaging inadequately address the distinct characteristics of radar… ▽ More Millimeter-wave (mmWave) radars are indispensable for perception tasks of autonomous vehicles, thanks to their resilience in challenging weather conditions. Yet, their deployment is often limited by insufficient spatial resolution for precise semantic scene interpretation. Classical super-resolution techniques adapted from optical imaging inadequately address the distinct characteristics of radar signal data. In response, our study redefines radar imaging super-resolution as a one-dimensional (1D) signal super-resolution spectra estimation problem by harnessing the radar signal processing domain knowledge, introducing innovative data normalization and a domain-informed signal-to-noise ratio (SNR)-guided loss function. Our tailored deep learning network for automotive radar imaging exhibits remarkable scalability, parameter efficiency and fast inference speed, alongside enhanced performance in terms of radar imaging quality and resolution. Extensive testing confirms that our SR-SPECNet sets a new benchmark in producing high-resolution radar range-azimuth images, outperforming existing methods across varied antenna configurations and dataset sizes. Source code and new radar dataset will be made publicly available online. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.02640 [pdf, other]

Ghost imaging-based Non-contact Heart Rate Detection

Authors: Jianming Yu, Yuchen He, Bin Li, Hui Chen, Huaibin Zheng, Jianbin Liu, Zhuo Xu

Abstract: Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead… ▽ More Remote heart rate measurement is an increasingly concerned research field, usually using remote photoplethysmography (rPPG) to collect heart rate information through video data collection. However, in certain specific scenarios (such as low light conditions, intense lighting, and non-line-of-sight situations), traditional imaging methods fail to capture image information effectively, that may lead to difficulty or inability in measuring heart rate. To address these limitations, this study proposes using ghost imaging as a substitute for traditional imaging in the aforementioned scenarios. The mean absolute error between experimental measurements and reference true values is 4.24 bpm.Additionally, the bucket signals obtained by the ghost imaging system can be directly processed using digital signal processing techniques, thereby enhancing personal privacy protection. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 4 pages, 6 figures

arXiv:2406.00582 [pdf]

Joint Detection and Classification of Communication and Radar Signals in Congested RF Environments Using YOLOv8

Authors: Xiwen Kang, Hua-mei Chen, Genshe Chen, Kuo-Chu Chang, Thomas M. Clemons

Abstract: In this paper, we present a comprehensive study on the application of YOLOv8, a state-of-the-art computer vision (CV) model, to the challenging problem of joint detection and classification of signals in a highly dynamic and congested RF environment. Using our synthetic RF datasets, we explored three different scenarios with congested communication and radar signals. In the first study, we applied… ▽ More In this paper, we present a comprehensive study on the application of YOLOv8, a state-of-the-art computer vision (CV) model, to the challenging problem of joint detection and classification of signals in a highly dynamic and congested RF environment. Using our synthetic RF datasets, we explored three different scenarios with congested communication and radar signals. In the first study, we applied YOLOv8 to detect and classify multiple digital modulation signals coexisting within a highly congested and dynamic spectral environment with significant overlap in both frequency and time domains. The trained model was able to achieve an impressive mean average precision (mAP) of 0.888 at an IoU threshold of 50%, signifying its robustness against spectral congestion. The second part of our research focuses on the detection and classification of multiple polyphase pulse radar signals, including Frank code and P1 through P4 codes. We were able to successfully train YOLOv8 to deliver a nearly perfect mAP50 score of 0.995 in a densely populated signal environment, further showcasing its capability in radar signal processing. In the last scenario, we demonstrated that the model can also be applied to the multi-target detection problem for continuous-wave radar. The synthetic datasets used in these experiments reflect a realistic mix of communication and radar signals with varying degrees of interference and congestion - a setup that has been overlooked by many past research efforts, which have primarily focused on ML-based classification of digital communication signal modulation schemes. Our study demonstrated the potential of advanced CV models in addressing spectrum sensing challenges in congested and dynamic RF environments involving both communication and radar signals. We hope our findings will spur further collaborative efforts to tackle the complexities of congested RF spectrum environments. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE MILCOM 2024

arXiv:2405.18167 [pdf, other]

Confidence-aware multi-modality learning for eye disease screening

Authors: Ke Zou, Tian Lin, Zongbo Han, Meng Wang, Xuedong Yuan, Haoyu Chen, Changqing Zhang, Xiao**g Shen, Huazhu Fu

Abstract: Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evi… ▽ More Multi-modal ophthalmic image classification plays a key role in diagnosing eye diseases, as it integrates information from different sources to complement their respective performances. However, recent improvements have mainly focused on accuracy, often neglecting the importance of confidence and robustness in predictions for diverse modalities. In this study, we propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information using a multi-distribution fusion perspective. Specifically, our method first utilizes normal inverse gamma prior distributions over pre-trained models to learn both aleatoric and epistemic uncertainty for uni-modality. Then, the normal inverse gamma distribution is analyzed as the Student's t distribution. Furthermore, within a confidence-aware fusion framework, we propose a mixture of Student's t distributions to effectively integrate different modalities, imparting the model with heavy-tailed properties and enhancing its robustness and reliability. More importantly, the confidence-aware multi-modality ranking regularization term induces the model to more reasonably rank the noisy single-modal and fused-modal confidence, leading to improved reliability and accuracy. Experimental results on both public and internal datasets demonstrate that our model excels in robustness, particularly in challenging scenarios involving Gaussian noise and modality missing conditions. Moreover, our model exhibits strong generalization capabilities to out-of-distribution data, underscoring its potential as a promising solution for multimodal eye disease screening. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 27 pages, 7 figures, 9 tables

arXiv:2405.18092 [pdf]

LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins

Authors: Yuchen Xia, Daniel Dittler, Nasser Jazdi, Haonan Chen, Michael Weyrich

Abstract: This paper presents a novel design of a multi-agent system framework that applies a large language model (LLM) to automate the parametrization of process simulations in digital twins. We propose a multi-agent framework that includes four types of agents: observation, reasoning, decision and summarization. By enabling dynamic interaction between LLM agents and simulation model, the developed system… ▽ More This paper presents a novel design of a multi-agent system framework that applies a large language model (LLM) to automate the parametrization of process simulations in digital twins. We propose a multi-agent framework that includes four types of agents: observation, reasoning, decision and summarization. By enabling dynamic interaction between LLM agents and simulation model, the developed system can automatically explore the parametrization of the simulation and use heuristic reasoning to determine a set of parameters to control the simulation to achieve an objective. The proposed approach enhances the simulation model by infusing it with heuristics from LLM and enables autonomous search for feasible parametrization to solve a user task. Furthermore, the system has the potential to increase user-friendliness and reduce the cognitive load on human users by assisting in complex decision-making processes. The effectiveness and functionality of the system are demonstrated through a case study, and the visualized demos are available at a GitHub Repository: https://github.com/YuchenXia/LLMDrivenSimulation △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE-ETFA2024, under peer-review

arXiv:2405.17836 [pdf, other]

An Innovative Networks in Federated Learning

Authors: Zavareh Bozorgasl, Hao Chen

Abstract: This paper presents the development and application of Wavelet Kolmogorov-Arnold Networks (Wav-KAN) in federated learning. We implemented Wav-KAN \cite{wav-kan} in the clients. Indeed, we have considered both continuous wavelet transform (CWT) and also discrete wavelet transform (DWT) to enable multiresolution capabaility which helps in heteregeneous data distribution across clients. Extensive exp… ▽ More This paper presents the development and application of Wavelet Kolmogorov-Arnold Networks (Wav-KAN) in federated learning. We implemented Wav-KAN \cite{wav-kan} in the clients. Indeed, we have considered both continuous wavelet transform (CWT) and also discrete wavelet transform (DWT) to enable multiresolution capabaility which helps in heteregeneous data distribution across clients. Extensive experiments were conducted on different datasets, demonstrating Wav-KAN's superior performance in terms of interpretability, computational speed, training and test accuracy. Our federated learning algorithm integrates wavelet-based activation functions, parameterized by weight, scale, and translation, to enhance local and global model performance. Results show significant improvements in computational efficiency, robustness, and accuracy, highlighting the effectiveness of wavelet selection in scalable neural network design. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.13365 [pdf, other]

Clipped Uniform Quantizers for Communication-Efficient Federated Learning

Authors: Zavareh Bozorgasl, Hao Chen

Abstract: This paper introduces an approach to employ clipped uniform quantization in federated learning settings, aiming to enhance model efficiency by reducing communication overhead without compromising accuracy. By employing optimal clip** thresholds and adaptive quantization schemes, our method significantly curtails the bit requirements for model weight transmissions between clients and the server.… ▽ More This paper introduces an approach to employ clipped uniform quantization in federated learning settings, aiming to enhance model efficiency by reducing communication overhead without compromising accuracy. By employing optimal clip** thresholds and adaptive quantization schemes, our method significantly curtails the bit requirements for model weight transmissions between clients and the server. We explore the implications of symmetric clip** and uniform quantization on model performance, highlighting the utility of stochastic quantization to mitigate quantization artifacts and improve model robustness. Through extensive simulations on the MNIST dataset, our results demonstrate that the proposed method achieves near full-precision performance while ensuring substantial communication savings. Specifically, our approach facilitates efficient weight averaging based on quantization errors, effectively balancing the trade-off between communication efficiency and model accuracy. The comparative analysis with conventional quantization methods further confirms the superiority of our technique. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.12832 [pdf, other]

Wav-KAN: Wavelet Kolmogorov-Arnold Networks

Authors: Zavareh Bozorgasl, Hao Chen

Abstract: In this paper, we introduce Wav-KAN, an innovative neural network architecture that leverages the Wavelet Kolmogorov-Arnold Networks (Wav-KAN) framework to enhance interpretability and performance. Traditional multilayer perceptrons (MLPs) and even recent advancements like Spl-KAN face challenges related to interpretability, training speed, robustness, computational efficiency, and performance. Wa… ▽ More In this paper, we introduce Wav-KAN, an innovative neural network architecture that leverages the Wavelet Kolmogorov-Arnold Networks (Wav-KAN) framework to enhance interpretability and performance. Traditional multilayer perceptrons (MLPs) and even recent advancements like Spl-KAN face challenges related to interpretability, training speed, robustness, computational efficiency, and performance. Wav-KAN addresses these limitations by incorporating wavelet functions into the Kolmogorov-Arnold network structure, enabling the network to capture both high-frequency and low-frequency components of the input data efficiently. Wavelet-based approximations employ orthogonal or semi-orthogonal basis and maintain a balance between accurately representing the underlying data structure and avoiding overfitting to the noise. While continuous wavelet transform (CWT) has a lot of potentials, we also employed discrete wavelet transform (DWT) for multiresolution analysis, which obviated the need for recalculation of the previous steps in finding the details. Analogous to how water conforms to the shape of its container, Wav-KAN adapts to the data structure, resulting in enhanced accuracy, faster training speeds, and increased robustness compared to Spl-KAN and MLPs. Our results highlight the potential of Wav-KAN as a powerful tool for develo** interpretable and high-performance neural networks, with applications spanning various fields. This work sets the stage for further exploration and implementation of Wav-KAN in frameworks such as PyTorch and TensorFlow, aiming to make wavelets in KAN as widespread as activation functions like ReLU and sigmoid in universal approximation theory (UAT). The codes to replicate the simulations are available at https://github.com/zavareh1/Wav-KAN. △ Less

Submitted 27 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: Work in progress; codes are available at are available at https://github.com/zavareh1/Wav-KAN

arXiv:2405.10157 [pdf]

Incorporating ESO into Deep Koopman Operator Modelling for Control of Autonomous Vehicles

Authors: Hao Chen, Chen Lv

Abstract: Koopman operator theory is a kind of data-driven modelling approach that accurately captures the nonlinearities of mechatronic systems such as vehicles against physics-based methods. However, the infinite-dimensional Koopman operator is impossible to implement in real-world applications. To approximate the infinite-dimensional Koopman operator through collection dataset rather than manual trial an… ▽ More Koopman operator theory is a kind of data-driven modelling approach that accurately captures the nonlinearities of mechatronic systems such as vehicles against physics-based methods. However, the infinite-dimensional Koopman operator is impossible to implement in real-world applications. To approximate the infinite-dimensional Koopman operator through collection dataset rather than manual trial and error, we adopt deep neural networks (DNNs) to extract basis functions by offline training and map the nonlinearities of vehicle planar dynamics into a linear form in the lifted space. Besides, the effects of the dimensions of basis functions on the model accuracy are explored. Further, the extended state observer (ESO) is introduced to online estimate the total disturbance in the lifted space and compensate for the modelling errors and residuals of the learned deep Koopman operator (DK) while also improving its generalization. Then, the proposed model is applied to predict vehicle states within prediction horizons and later formulates the constrained finite-time optimization problem of model predictive control (MPC), i.e., ESO-DKMPC. In terms of the trajectory tracking of autonomous vehicles, the ESO-DKMPC generates the wheel steering angle to govern lateral motions based on the decoupling control structure. The various conditions under the double-lane change scenarios are built on the CarSim/Simulink co-simulation platform, and extensive comparisons are conducted with the linear MPC (LMPC) and nonlinear MPC (NMPC) informed by the physics-based model. The results indicate that the proposed ESO-DKMPC has better tracking performance and moderate efficacy both within linear and nonlinear regions. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10145 [pdf]

Deep Koopman Operator-Informed Safety Command Governor for Autonomous Vehicles

Authors: Hao Chen, Xiangkun He, Shuo Cheng, Chen Lv

Abstract: Modeling of nonlinear behaviors with physical-based models poses challenges. However, Koopman operator maps the original nonlinear system into an infinite-dimensional linear space to achieve global linearization of the nonlinear system through input and output data, which derives an absolute equivalent linear representation of the original state space. Due to the impossibility of implementing the… ▽ More Modeling of nonlinear behaviors with physical-based models poses challenges. However, Koopman operator maps the original nonlinear system into an infinite-dimensional linear space to achieve global linearization of the nonlinear system through input and output data, which derives an absolute equivalent linear representation of the original state space. Due to the impossibility of implementing the infinite-dimensional Koopman operator, finite-dimensional kernel functions are selected as an approximation. Given its flexible structure and high accuracy, deep learning is initially employed to extract kernel functions from data and acquire a linear evolution dynamic of the autonomous vehicle in the lifted space. Additionally, the control barrier function (CBF) converts the state constraints to the constraints on the input to render safety property. Then, in terms of the lateral stability of the in-wheel motor driven vehicle, the CBF conditions are incorporated with the learned deep Koopman model. Because of the linear fashion of the deep Koopman model, the quadratic programming problem is formulated to generate the applied driving torque with minimal perturbation to the original driving torque as a safety command governor. In the end, to validate the fidelity of the deep Koopman model compared to other mainstream approaches and demonstrate the lateral improvement achieved by the proposed safety command governor, data collection and safety testing scenarios are conducted on a hardware-in-the-loop platform. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09472 [pdf, other]

Perception- and Fidelity-aware Reduced-Reference Super-Resolution Image Quality Assessment

Authors: Xinying Lin, Xuyang Liu, Hong Yang, Xiaohai He, Honggang Chen

Abstract: With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-r… ▽ More With the advent of image super-resolution (SR) algorithms, how to evaluate the quality of generated SR images has become an urgent task. Although full-reference methods perform well in SR image quality assessment (SR-IQA), their reliance on high-resolution (HR) images limits their practical applicability. Leveraging available reconstruction information as much as possible for SR-IQA, such as low-resolution (LR) images and the scale factors, is a promising way to enhance assessment performance for SR-IQA without HR for reference. In this letter, we attempt to evaluate the perceptual quality and reconstruction fidelity of SR images considering LR images and scale factors. Specifically, we propose a novel dual-branch reduced-reference SR-IQA network, \ie, Perception- and Fidelity-aware SR-IQA (PFIQA). The perception-aware branch evaluates the perceptual quality of SR images by leveraging the merits of global modeling of Vision Transformer (ViT) and local relation of ResNet, and incorporating the scale factor to enable comprehensive visual perception. Meanwhile, the fidelity-aware branch assesses the reconstruction fidelity between LR and SR images through their visual perception. The combination of the two branches substantially aligns with the human visual system, enabling a comprehensive SR image evaluation. Experimental results indicate that our PFIQA outperforms current state-of-the-art models across three widely-used SR-IQA benchmarks. Notably, PFIQA excels in assessing the quality of real-world SR images. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.09079 [pdf, other]

Integrated Monostatic Sensing and Full-Duplex Multiuser Communication for mmWave Systems

Authors: Murat Bayraktar, Nuria González-Prelcic, Mikko Valkama, Hao Chen, Charlie Jianzhong Zhang

Abstract: In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation… ▽ More In this paper, we propose a hybrid precoding/combining framework for communication-centric integrated sensing and full-duplex (FD) communication operating at mmWave bands. The designed precoders and combiners enable multiuser (MU) FD communication while simultaneously supporting monostatic sensing in a frequency-selective setting. The joint design of precoders and combiners involves the mitigation of self-interference (SI) caused by simultaneous transmission and reception at the FD base station (BS). Additionally, MU interference needs to be handled by the precoder/combiner design. The resulting optimization problem involves non-convex constraints since hybrid analog/digital architectures utilize networks of phase shifters. To solve the proposed problem, we separate the optimization of each precoder/combiner, and design each one of them while fixing the others. The precoders at the FD BS are designed by reformulating the communication and sensing constraints as signal-to-leakage-plus-noise ratio (SLNR) maximization problems that consider SI and MU interference as leakage. Furthermore, we design the frequency-flat analog combiner such that the residual SI at the FD BS is minimized under communication and sensing gain constraints. Finally, we design an interference-aware digital combining stage that separates MU signals and target reflections. The communication performance and sensing results show that the proposed framework efficiently supports both functionalities simultaneously. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 13 pages, 7 figures

arXiv:2405.05579 [pdf]

Intelligent EC Rearview Mirror: Enhancing Driver Safety with Dynamic Glare Mitigation via Cloud Edge Collaboration

Authors: Junyi Yang, Zefei Xu, Huayi Lai, Hongjian Chen, Sifan Kong, Yutong Wu, Huan Yang

Abstract: Sudden glare from trailing vehicles significantly increases driving safety risks. Existing anti-glare technologies such as electronic, manually-adjusted, and electrochromic rearview mirrors, are expensive and lack effective adaptability in different lighting conditions. To address these issues, our research introduces an intelligent rearview mirror system utilizing novel all-liquid electrochromic… ▽ More Sudden glare from trailing vehicles significantly increases driving safety risks. Existing anti-glare technologies such as electronic, manually-adjusted, and electrochromic rearview mirrors, are expensive and lack effective adaptability in different lighting conditions. To address these issues, our research introduces an intelligent rearview mirror system utilizing novel all-liquid electrochromic technology. This system integrates IoT with ensemble and federated learning within a cloud edge collaboration framework, dynamically controlling voltage to effectively eliminate glare and maintain clear visibility. Utilizing an ensemble learning model, it automatically adjusts mirror transmittance based on light intensity, achieving a low RMSE of 0.109 on the test set. Furthermore, the system leverages federated learning for distributed data training across devices, which enhances privacy and updates the cloud model continuously. Distinct from conventional methods, our experiment utilizes the Schmidt-Clausen and Bindels de Boer 9-point scale with TOPSIS for comprehensive evaluation of rearview mirror glare. Designed to be convenient and costeffective, this system demonstrates how IoT and AI can significantly enhance rearview mirror anti-glare performance. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05239 [pdf, other]

Cellular Traffic Prediction Using Online Prediction Algorithms

Authors: Hossein Mehri, Hao Chen, Hani Mehrpouyan

Abstract: The advent of 5G technology promises a paradigm shift in the realm of telecommunications, offering unprecedented speeds and connectivity. However, the efficient management of traffic in 5G networks remains a critical challenge. It is due to the dynamic and heterogeneous nature of network traffic, varying user behaviors, extended network size, and diverse applications, all of which demand highly ac… ▽ More The advent of 5G technology promises a paradigm shift in the realm of telecommunications, offering unprecedented speeds and connectivity. However, the efficient management of traffic in 5G networks remains a critical challenge. It is due to the dynamic and heterogeneous nature of network traffic, varying user behaviors, extended network size, and diverse applications, all of which demand highly accurate and adaptable prediction models to optimize network resource allocation and management. This paper investigates the efficacy of live prediction algorithms for forecasting cellular network traffic in real-time scenarios. We apply two live prediction algorithms on machine learning models, one of which is recently proposed Fast LiveStream Prediction (FLSP) algorithm. We examine the performance of these algorithms under two distinct data gathering methodologies: synchronous, where all network cells report statistics simultaneously, and asynchronous, where reporting occurs across consecutive time slots. Our study delves into the impact of these gathering scenarios on the predictive performance of traffic models. Our study reveals that the FLSP algorithm can halve the required bandwidth for asynchronous data reporting compared to conventional online prediction algorithms, while simultaneously enhancing prediction accuracy and reducing processing load. Additionally, we conduct a thorough analysis of algorithmic complexity and memory requirements across various machine learning models. Through empirical evaluation, we provide insights into the trade-offs inherent in different prediction strategies, offering valuable guidance for network optimization and resource allocation in dynamic environments. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.05235 [pdf, other]

RACH Traffic Prediction in Massive Machine Type Communications

Authors: Hossein Mehri, Hao Chen, Hani Mehrpouyan

Abstract: Traffic pattern prediction has emerged as a promising approach for efficiently managing and mitigating the impacts of event-driven bursty traffic in massive machine-type communication (mMTC) networks. However, achieving accurate predictions of bursty traffic remains a non-trivial task due to the inherent randomness of events, and these challenges intensify within live network environments. Consequ… ▽ More Traffic pattern prediction has emerged as a promising approach for efficiently managing and mitigating the impacts of event-driven bursty traffic in massive machine-type communication (mMTC) networks. However, achieving accurate predictions of bursty traffic remains a non-trivial task due to the inherent randomness of events, and these challenges intensify within live network environments. Consequently, there is a compelling imperative to design a lightweight and agile framework capable of assimilating continuously collected data from the network and accurately forecasting bursty traffic in mMTC networks. This paper addresses these challenges by presenting a machine learning-based framework tailored for forecasting bursty traffic in multi-channel slotted ALOHA networks. The proposed machine learning network comprises long-term short-term memory (LSTM) and a DenseNet with feed-forward neural network (FFNN) layers, where the residual connections enhance the training ability of the machine learning network in capturing complicated patterns. Furthermore, we develop a new low-complexity online prediction algorithm that updates the states of the LSTM network by leveraging frequently collected data from the mMTC network. Simulation results and complexity analysis demonstrate the superiority of our proposed algorithm in terms of both accuracy and complexity, making it well-suited for time-critical live scenarios. We evaluate the performance of the proposed framework in a network with a single base station and thousands of devices organized into groups with distinct traffic-generating characteristics. Comprehensive evaluations and simulations indicate that our proposed machine learning approach achieves a remarkable $52\%$ higher accuracy in long-term predictions compared to traditional methods, without imposing additional processing load on the system. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Report number: TMLCN-10-23-0189

arXiv:2405.02788 [pdf, other]

Antenna Failure Resilience: Deep Learning-Enabled Robust DOA Estimation with Single Snapshot Sparse Arrays

Authors: Ruxin Zheng, Shunqiao Sun, Hongshan Liu, Honglei Chen, Mojtaba Soltanalian, Jian Li

Abstract: Recent advancements in Deep Learning (DL) for Direction of Arrival (DOA) estimation have highlighted its superiority over traditional methods, offering faster inference, enhanced super-resolution, and robust performance in low Signal-to-Noise Ratio (SNR) environments. Despite these advancements, existing research predominantly focuses on multi-snapshot scenarios, a limitation in the context of aut… ▽ More Recent advancements in Deep Learning (DL) for Direction of Arrival (DOA) estimation have highlighted its superiority over traditional methods, offering faster inference, enhanced super-resolution, and robust performance in low Signal-to-Noise Ratio (SNR) environments. Despite these advancements, existing research predominantly focuses on multi-snapshot scenarios, a limitation in the context of automotive radar systems which demand high angular resolution and often rely on limited snapshots, sometimes as scarce as a single snapshot. Furthermore, the increasing interest in sparse arrays for automotive radar, owing to their cost-effectiveness and reduced antenna element coupling, presents additional challenges including susceptibility to random sensor failures. This paper introduces a pioneering DL framework featuring a sparse signal augmentation layer, meticulously crafted to bolster single snapshot DOA estimation across diverse sparse array setups and amidst antenna failures. To our best knowledge, this is the first work to tackle this issue. Our approach improves the adaptability of deep learning techniques to overcome the unique difficulties posed by sparse arrays with single snapshot. We conduct thorough evaluations of our network's performance using simulated and real-world data, showcasing the efficacy and real-world viability of our proposed solution. The code and real-world dataset employed in this study are available at https://github.com/ruxinzh/Deep_RSA_DOA. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: Invited paper for IEEE Asilomar conference 2024

arXiv:2405.00391 [pdf, ps, other]

Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays

Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Ahmed Alhammadi, Hui Chen, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

Abstract: The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial… ▽ More The beamforming technology with large holographic antenna arrays is one of the key enablers for the next generation of wireless systems, which can significantly improve the spectral efficiency. However, the deployment of large antenna arrays implies high algorithm complexity and resource overhead at both receiver and transmitter ends. To address this issue, advanced technologies such as artificial intelligence have been developed to reduce beamforming overhead. Intuitively, if we can implement the near-optimal beamforming only using a tiny subset of the all channel information, the overhead for channel estimation and beamforming would be reduced significantly compared with the traditional beamforming methods that usually need full channel information and the inversion of large dimensional matrix. In light of this idea, we propose a novel scheme that utilizes Wasserstein generative adversarial network with gradient penalty to infer the full beamforming matrices based on very little of channel information. Simulation results confirm that it can accomplish comparable performance with the weighted minimum mean-square error algorithm, while reducing the overhead by over 50%. △ Less

Submitted 15 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.17089 [pdf, other]

Auto-Calibration and 2D-DOA Estimation in UCAs via an Integrated Wideband Dictionary

Authors: Zavareh Bozorgasl, Hao Chen, Mohammad J. Dehghani

Abstract: In this paper, we present a novel auto-calibration scheme for the joint estimation of the two-dimensional (2-D) direction-of-arrival (DOA) and the mutual coupling matrix (MCM) for a signal measured using uniform circular arrays. The method employs an integrated wideband dictionary to mitigate the detrimental effects of the discretization of the continuous parameter space over the considered azimut… ▽ More In this paper, we present a novel auto-calibration scheme for the joint estimation of the two-dimensional (2-D) direction-of-arrival (DOA) and the mutual coupling matrix (MCM) for a signal measured using uniform circular arrays. The method employs an integrated wideband dictionary to mitigate the detrimental effects of the discretization of the continuous parameter space over the considered azimuth and elevation angles. This leads to a reduction of the computational complexity and obtaining of more accurate DOA estimates. Given the more reliable DOA estimates, the method also allows for the estimation of more accurate mutual coupling coefficients. The method utilizes an integrated dictionary in order to iteratively refine the active parameter space, thereby reducing the required computational complexity without reducing the overall performance. The complexity is further reduced by employing only the dominant subspace of the measured signal. Furthermore, the proposed method does not require a constraint on the prior knowledge of the number of nonzero coupling coefficients nor suffer from ambiguity problems. Moreover, a simple formulation for 2-D non-numerical integration is presented. Simulation results show the effectiveness of the proposed method. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: This is a completed version of a work which will be sent to 2024 Asilomar Conference on Signals, Systems, and Computers

arXiv:2404.16484 [pdf, other]

Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: CVPR 2024, AI for Streaming (AIS) Workshop

arXiv:2404.14879 [pdf, other]

Device-Free 3D Drone Localization in RIS-Assisted mmWave MIMO Networks

Authors: Jiguang He, Charles Vanwynsberghe, Hui Chen, Chongwen Huang, Aymen Fakhreddine

Abstract: In this paper, we investigate the potential of reconfigurable intelligent surfaces (RISs) in facilitating passive/device-free three-dimensional (3D) drone localization within existing cellular infrastructure operating at millimeter-wave (mmWave) frequencies and employing multiple antennas at the transceivers. The developed localization system operates in the bi-static mode without requiring direct… ▽ More In this paper, we investigate the potential of reconfigurable intelligent surfaces (RISs) in facilitating passive/device-free three-dimensional (3D) drone localization within existing cellular infrastructure operating at millimeter-wave (mmWave) frequencies and employing multiple antennas at the transceivers. The developed localization system operates in the bi-static mode without requiring direct communication between the drone and the base station. We analyze the theoretical performance limits via Fisher information analysis and Cramér Rao lower bounds (CRLBs). Furthermore, we develop a low-complexity yet effective drone localization algorithm based on coordinate gradient descent and examine the impact of factors such as radar cross section (RCS) of the drone and training overhead on system performance. It is demonstrated that integrating RIS yields significant benefits over its RIS-free counterpart, as evidenced by both theoretical analyses and numerical simulations. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures, submitted to IEEE GLOBECOM 2024

arXiv:2404.10021 [pdf]

Monitoring Based Fatigue Damage Prognosis of Wind Turbine Composite Blades under Uncertain Wind Loads

Authors: Chizhi Zhang, Hua-Peng Chen

Abstract: Lifecycle assessment of wind turbines is essential to improve their design and to optimum maintenance plans for preventing failures during the design life. A critical element of wind turbines is the composite blade due to uncertain cyclic wind loads with relatively high frequency and amplitude in offshore environments. It is critical to detect the wind fatigue damage evolution in composite blades… ▽ More Lifecycle assessment of wind turbines is essential to improve their design and to optimum maintenance plans for preventing failures during the design life. A critical element of wind turbines is the composite blade due to uncertain cyclic wind loads with relatively high frequency and amplitude in offshore environments. It is critical to detect the wind fatigue damage evolution in composite blades before they fail catastrophically and destroy the entire wind turbines. This study presents a methodology for analysing the fatigue failure probability of a wind turbine composite blade by using monitoring stochastic deterioration modelling. On the basis of 5 minutes mean wind speed measurements, the internal stresses can be accurately obtained from finite element analysis, and failure probabilities are predicted by stochastic gamma process fatigue damage model in design service life. A numerical example of a wind turbine composite blade is investigated to show the applicability of the proposed model. The results show that the stochastic fatigue damage model can give reliable results for time-dependent reliability analysis for composite blades of wind turbines. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09436 [pdf]

Image Reconstruction with B0 Inhomogeneity using an Interpretable Deep Unrolled Network on an Open-bore MRI-Linac

Authors: Shanshan Shan, Yang Gao, David E. J. Waddington, Hongli Chen, Brendan Whelan, Paul Z. Y. Liu, Yaohui Wang, Chunyi Liu, Hong** Gan, Mingyuan Gao, Feng Liu

Abstract: MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free… ▽ More MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free images from B0 inhomogeneity-corrupted k-space for fast MRI-guided radiotherapy applications. RebinNet includes convolutional neural network (CNN) blocks to perform image regularizations and nonuniform fast Fourier Transform (NUFFT) modules to incorporate B0 inhomogeneity information. The RebinNet was trained on a publicly available MR dataset from eleven healthy volunteers for both fully sampled and subsampled acquisitions. Grid phantom and human brain images acquired from an open-bore 1T MRI-Linac scanner were used to evaluate the performance of the proposed network. The RebinNet was compared with the conventional regularization algorithm and our recently developed UnUNet method in terms of root mean squared error (RMSE), structural similarity (SSIM), residual distortions, and computation time. Imaging results demonstrated that the RebinNet reconstructed images with lowest RMSE (<0.05) and highest SSIM (>0.92) at four-time acceleration for simulated brain images. The RebinNet could better preserve structural details and substantially improve the computational efficiency (ten-fold faster) compared to the conventional regularization methods, and had better generalization ability than the UnUNet method. The proposed RebinNet can achieve rapid image reconstruction and overcome the B0 inhomogeneity distortions simultaneously, which would facilitate accurate and fast image guidance in radiotherapy treatments. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09179 [pdf]

doi 10.1109/JSTARS.2023.3310208

Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Jiepan Li, Hongruixuan Chen

Abstract: The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficien… ▽ More The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficient expression problem of change features in the conventional U-Net structure adopted in previous methods, which causes inaccurate edge detection and internal holes. Change maps from deep features with rich semantic information are generated and used as prior information to guide multi-scale feature fusion, which can improve the expression ability of change features. Meanwhile, we propose a self-attention module named Change Guide Module (CGM), which can effectively capture the long-distance dependency among pixels and effectively overcome the problem of the insufficient receptive field of traditional convolutional neural networks. On four major CD datasets, we verify the usefulness and efficiency of the CGNet, and a large number of experiments and ablation studies demonstrate the effectiveness of CGNet. We're going to open-source our code at https://github.com/ChengxiHAN/CGNet-CD. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.07092 [pdf, other]

Net 835-Gb/s/λ Carrier- and LO-Free 100-km Transmission Using Channel-Aware Phase Retrieval Reception

Authors: Hanzi Huang, Haoshuo Chen, Qian Hu, Di Che, Yetian Huang, Brian Stern, Nicolas K. Fontaine, Mikael Mazur, Lauren Dallachiesa, Roland Ryf, Zhengxuan Li, Yingxiong Song

Abstract: We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency. We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 3 pages, 3 figures

arXiv:2404.04947 [pdf, other]

Gull: A Generative Multifunctional Audio Codec

Authors: Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng

Abstract: We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recen… ▽ More We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recent progress in audio source separation, (2) gain-shape representations motivated by traditional audio codecs, (3) improved residual vector quantization modules, (4) elastic decoder network that enables user-defined model size and complexity during inference time, (5) built-in ability for audio super-resolution without the increase of bitrate. We compare Gull with existing traditional and neural audio codecs and show that Gull is able to achieve on par or better performance across various sample rates, bitrates and model complexities in both subjective and objective evaluation metrics. △ Less

Submitted 7 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Demo page: https://yluo42.github.io/Gull/

arXiv:2404.03425 [pdf, other]

doi 10.1109/TGRS.2024.3417253

ChangeMamba: Remote Sensing Change Detection with Spatio-Temporal State Space Model

Authors: Hongruixuan Chen, Jian Song, Chengxi Han, Junshi Xia, Naoto Yokoya

Abstract: Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on… ▽ More Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings: CNN are constrained by a limited receptive field that may hinder their ability to capture broader spatial contexts, while Transformers are computationally intensive, making them costly to train and deploy on large datasets. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in https://github.com/ChenHongruixuan/MambaCD △ Less

Submitted 26 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE TGRS

arXiv:2404.02394 [pdf, other]

Cohort-Individual Cooperative Learning for Multimodal Cancer Survival Analysis

Authors: Huajun Zhou, Fengtao Zhou, Hao Chen

Abstract: Recently, we have witnessed impressive achievements in cancer survival analysis by integrating multimodal data, e.g., pathology images and genomic profiles. However, the heterogeneity and high dimensionality of these modalities pose significant challenges for extracting discriminative representations while maintaining good generalization. In this paper, we propose a Cohort-individual Cooperative L… ▽ More Recently, we have witnessed impressive achievements in cancer survival analysis by integrating multimodal data, e.g., pathology images and genomic profiles. However, the heterogeneity and high dimensionality of these modalities pose significant challenges for extracting discriminative representations while maintaining good generalization. In this paper, we propose a Cohort-individual Cooperative Learning (CCL) framework to advance cancer survival analysis by collaborating knowledge decomposition and cohort guidance. Specifically, first, we propose a Multimodal Knowledge Decomposition (MKD) module to explicitly decompose multimodal knowledge into four distinct components: redundancy, synergy and uniqueness of the two modalities. Such a comprehensive decomposition can enlighten the models to perceive easily overlooked yet important information, facilitating an effective multimodal fusion. Second, we propose a Cohort Guidance Modeling (CGM) to mitigate the risk of overfitting task-irrelevant information. It can promote a more comprehensive and robust understanding of the underlying multimodal data, while avoiding the pitfalls of overfitting and enhancing the generalization ability of the model. By cooperating the knowledge decomposition and cohort guidance methods, we develop a robust multimodal survival analysis model with enhanced discrimination and generalization abilities. Extensive experimental results on five cancer datasets demonstrate the effectiveness of our model in integrating multimodal data for survival analysis. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 10 pages, 9 figures

arXiv:2404.01192 [pdf, other]

iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer

Authors: Fengtao Zhou, Yingxue Xu, Yanfen Cui, Shenyan Zhang, Yun Zhu, Weiyang He, Jiguang Wang, Xin Wang, Ronald Chan, Louis Ho Shing Lau, Chu Han, Dafu Zhang, Zhenhui Li, Hao Chen

Abstract: Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among… ▽ More Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among patients, with a considerable subset displaying treatment resistance. Ineffective NACT not only leads to adverse effects but also misses the optimal therapeutic window, resulting in lower survival rate. However, existing multimodal learning methods assume the availability of all modalities for each patient, which does not align with the reality of clinical practice. The limited availability of modalities for each patient would cause information loss, adversely affecting predictive accuracy. In this study, we propose an incomplete multimodal data integration framework for GC (iMD4GC) to address the challenges posed by incomplete multimodal data, enabling precise response prediction and survival analysis. Specifically, iMD4GC incorporates unimodal attention layers for each modality to capture intra-modal information. Subsequently, the cross-modal interaction layers explore potential inter-modal interactions and capture complementary information across modalities, thereby enabling information compensation for missing modalities. To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis. The scale of our datasets is significantly larger than previous studies. The iMD4GC achieved impressive performance with an 80.2% AUC on GastricRes, 71.4% C-index on GastricSur, and 66.1% C-index on TCGA-STAD, significantly surpassing other compared methods. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 27 pages, 9 figures, 3 tables (under review)

arXiv:2404.00989 [pdf, other]

360+x: A Panoptic Multi-modal Scene Understanding Dataset

Authors: Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao

Abstract: Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentri… ▽ More Human perception of the world is shaped by a multitude of viewpoints and modalities. While many existing datasets focus on scene understanding from a certain perspective (e.g. egocentric or third-person views), our dataset offers a panoptic perspective (i.e. multiple viewpoints with multiple data modalities). Specifically, we encapsulate third-person panoramic and front views, as well as egocentric monocular/binocular views with rich modalities including video, multi-channel audio, directional binaural delay, location data and textual scene descriptions within each scene captured, presenting comprehensive observation of the world. Figure 1 offers a glimpse of all 28 scene categories of our 360+x dataset. To the best of our knowledge, this is the first database that covers multiple viewpoints with multiple data modalities to mimic how daily information is accessed in the real world. Through our benchmark analysis, we presented 5 different scene understanding tasks on the proposed 360+x dataset to evaluate the impact and benefit of each data modality and perspective in panoptic scene understanding. We hope this unique dataset could broaden the scope of comprehensive scene understanding and encourage the community to approach these problems from more diverse perspectives. △ Less

Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: CVPR 2024 (Oral Presentation), Project page: https://x360dataset.github.io/

Journal ref: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

arXiv:2403.19785 [pdf, other]

Integrated Communication, Localization, and Sensing in 6G D-MIMO Networks

Authors: Hao Guo, Henk Wymeersch, Behrooz Makki, Hui Chen, Yibo Wu, Giuseppe Durisi, Musa Furkan Keskin, Mohammad H. Moghaddam, Charitha Madapatha, Han Yu, Peter Hammarberg, Hyowon Kim, Tommy Svensson

Abstract: Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple out… ▽ More Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple output (D-MIMO) systems deploy a large number of distributed nodes and efficiently control them, making distributed integrated sensing and communications (ISAC) possible. In this paper, we investigate ISAC in D-MIMO through the lens of different design architectures and deployments, revealing both conflicts and synergies. In addition, simulation and demonstration results reveal both opportunities and challenges towards the implementation of ISAC in D-MIMO. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.17770 [pdf, other]

CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node generation and the nnU-Net model for lymph node segmentation to improve the segmentation performance of abdominal lymph nodes through synthesizing a diversity of realistic abdominal lymph node data. We propose LN-DDPM, a conditional denoising diffusion probabilistic model (DDPM) for lymph node (LN) generation. LN-DDPM utilizes lymph node masks and anatomical structure masks as model conditions. These conditions work in two conditioning mechanisms: global structure conditioning and local detail conditioning, to distinguish between lymph nodes and their surroundings and better capture lymph node characteristics. The obtained paired abdominal lymph node images and masks are used for the downstream segmentation task. Experimental results on the abdominal lymph node datasets demonstrate that LN-DDPM outperforms other generative methods in the abdominal lymph node image synthesis and better assists the downstream abdominal lymph node segmentation task. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.16361 [pdf, other]

RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Authors: Ziheng Deng, Hua Chen, Haibo Hu, Zhiyong Xu, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao

Abstract: Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, there is a limited number of cone-beam projections available for image reconstruction. Consequentl… ▽ More Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, there is a limited number of cone-beam projections available for image reconstruction. Consequently, the 4D CBCT images are covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ ordinary network models, neglecting the intrinsic structural prior within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images.Specifically, we find that streak artifacts exhibit a periodic rotational motion along with the patient's respiration. This unique motion pattern inspires us to distinguish the artifacts from the desired anatomical structures in the spatiotemporal domain. Thereafter, we propose a spatiotemporal neural network named RSTAR-Net with separable and circular convolutions for Rotational Streak Artifact Reduction. The specially designed model effectively encodes dynamic image features, facilitating the recovery of 4D CBCT images. Moreover, RSTAR-Net is also lightweight and computationally efficient. Extensive experiments substantiate the effectiveness of our proposed method, and RSTAR-Net shows superior performance to comparison methods. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15739 [pdf, other]

Towards Channel-Resilient CSI-Based RF Fingerprinting using Deep Learning

Authors: Ruiqi Kong, He Chen

Abstract: This work introduces DeepCRF, a deep learning framework designed for channel state information-based radio frequency fingerprinting (CSI-RFF). The considered CSI-RFF is built on micro-CSI, a recently discovered radio-frequency (RF) fingerprint that manifests as micro-signals appearing on the channel state information (CSI) curves of commercial WiFi devices. Micro-CSI facilitates CSI-RFF which is m… ▽ More This work introduces DeepCRF, a deep learning framework designed for channel state information-based radio frequency fingerprinting (CSI-RFF). The considered CSI-RFF is built on micro-CSI, a recently discovered radio-frequency (RF) fingerprint that manifests as micro-signals appearing on the channel state information (CSI) curves of commercial WiFi devices. Micro-CSI facilitates CSI-RFF which is more streamlined and easily implementable compared to existing schemes that rely on raw I/Q samples. The primary challenge resides in the precise extraction of micro-CSI from the inherently fluctuating CSI measurements, a process critical for reliable RFF. The construction of a framework that is resilient to channel variability is essential for the practical deployment of CSI-RFF techniques. DeepCRF addresses this challenge with a thoughtfully trained convolutional neural network (CNN). This network's performance is significantly enhanced by employing effective and strategic data augmentation techniques, which bolster its ability to generalize to novel, unseen channel conditions. Furthermore, DeepCRF incorporates supervised contrastive learning to enhance its robustness against noises. Our evaluations demonstrate that DeepCRF significantly enhances the accuracy of device identification across previously unencountered channels. It outperforms both the conventional model-based methods and standard CNN that lack our specialized training and enhancement strategies. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 6 pages,5 figures, INFOCOM WKSHPS 2024

arXiv:2403.14935 [pdf, ps, other]

Data-Driven Predictive Control with Adaptive Disturbance Attenuation for Constrained Systems

Authors: Nan Li, Ilya Kolmanovsky, Hong Chen

Abstract: In this paper, we propose a novel data-driven predictive control approach for systems subject to time-domain constraints. The approach combines the strengths of H-infinity control for rejecting disturbances and MPC for handling constraints. In particular, the approach can dynamically adapt H-infinity disturbance attenuation performance depending on measured system state and forecasted disturbance… ▽ More In this paper, we propose a novel data-driven predictive control approach for systems subject to time-domain constraints. The approach combines the strengths of H-infinity control for rejecting disturbances and MPC for handling constraints. In particular, the approach can dynamically adapt H-infinity disturbance attenuation performance depending on measured system state and forecasted disturbance level to satisfy constraints. We establish theoretical properties of the approach including robust guarantees of closed-loop stability, disturbance attenuation, constraint satisfaction under noisy data, as well as sufficient conditions for recursive feasibility, and illustrate the approach with a numerical example. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 11 pages, 2 figures

arXiv:2403.13225 [pdf, other]

Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation

Authors: Linshan Wu, Zhun Zhong, Jiayi Ma, Yunchao Wei, Hao Chen, Leyuan Fang, Shutao Li

Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each… ▽ More Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each other in the feature space are more likely to share the same class, and those closer to the distribution centers tend to have higher confidence. Motivated by this, we propose to model the underlying label distributions and employ cross-label constraints to generate more accurate pseudo labels. In this paper, we develop a unified WSSS framework named Adaptive Gaussian Mixtures Model, which leverages a GMM to model the label distributions. Specifically, we calculate the feature distribution centers of pseudo-labeled pixels and build the GMM by measuring the distance between the centers and each pseudo-labeled pixel. Then, we introduce an Online Expectation-Maximization (OEM) algorithm and a novel maximization loss to optimize the GMM adaptively, aiming to learn more discriminative decision boundaries between different class-wise Gaussian mixtures. Based on the label distributions, we leverage the GMM to generate high-quality pseudo labels for more reliable supervision. Our framework is capable of solving different forms of weak labels: image-level labels, points, scribbles, blocks, and bounding-boxes. Extensive experiments on PASCAL, COCO, Cityscapes, and ADE20K datasets demonstrate that our framework can effectively provide more reliable supervision and outperform the state-of-the-art methods under all settings. Code will be available at https://github.com/Luffy03/AGMM-SASS. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11953 [pdf, other]

Advancing COVID-19 Detection in 3D CT Scans

Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

Abstract: To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno… ▽ More To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11498 [pdf, other]

Domain Adaptation Using Pseudo Labels for COVID-19 Detection

Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

Abstract: In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he… ▽ More In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent health crises. The innovative approach of generating pseudo labels enables the model to iteratively refine its learning process, thereby improving its accuracy and adaptability across different hospitals and medical centres. Experimental results on COV19-CT-DB database showcase the model's potential to achieve high diagnostic precision, significantly contributing to efficient patient management and alleviating the strain on healthcare systems. Our method achieves 0.92 Macro F1 Score on the validation set of Covid-19 domain adaptation challenge. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10537 [pdf, ps, other]

Semantic Extraction Model Selection for IoT Devices in Edge-assisted Semantic Communications

Authors: Hong Chen, Fang Fang, Xianbin Wang

Abstract: Semantic communications offer the potential to alleviate communication loads by exchanging meaningful information. However, semantic extraction (SE) is computationally intensive, posing challenges for resource-constrained Internet of Things (IoT) devices. To address this, leveraging computing resources at the edge servers (ESs) is essential. ESs can support multiple SE models for various tasks, ma… ▽ More Semantic communications offer the potential to alleviate communication loads by exchanging meaningful information. However, semantic extraction (SE) is computationally intensive, posing challenges for resource-constrained Internet of Things (IoT) devices. To address this, leveraging computing resources at the edge servers (ESs) is essential. ESs can support multiple SE models for various tasks, making it crucial to select appropriate SE models based on diverse requirements of IoT devices and limited ES computing resources. In this letter, we study an SE model selection problem, where the ES co-located at the access point can provide multiple SE models to execute the uploaded SE tasks of associated IoT devices. We aim to maximize the total semantic rate of all SE tasks by selecting appropriate SE models, while considering SE delay and ES capacity constraints, and SE accuracy requirements. The formulated NP-complete integer programming problem is transformed into a modified Knapsack problem. The proposed efficient approximation algorithm using dynamic programming can yield a guaranteed near-optimum solution. Simulation results demonstrate the superior performance of proposed solution. △ Less

Submitted 26 February, 2024; originally announced March 2024.

Comments: Submitted to IEEE Communications Letters

Showing 1–50 of 511 results for author: Chen, H