Search | arXiv e-print repository

arXiv:2406.19959 [pdf, other]

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge this simulation-to-real gap, this paper presents a new relatively large-scale Real-recorded and annotated Microphone Array speech&Noise (RealMAN) dataset. The proposed dataset is valuable in two aspects: 1) benchmarking speech enhancement and localization algorithms in real scenarios; 2) offering a substantial amount of real-world training data for potentially improving the performance of real-world applications. Specifically, a 32-channel array with high-fidelity microphones is used for recording. A loudspeaker is used for playing source speech signals. A total of 83-hour speech signals (48 hours for static speaker and 35 hours for moving speaker) are recorded in 32 different scenes, and 144 hours of background noise are recorded in 31 different scenes. Both speech and noise recording scenes cover various common indoor, outdoor, semi-outdoor and transportation environments, which enables the training of general-purpose speech enhancement and source localization networks. To obtain the task-specific annotations, the azimuth angle of the loudspeaker is annotated with an omni-direction fisheye camera by automatically detecting the loudspeaker. The direct-path signal is set as the target clean speech for speech enhancement, which is obtained by filtering the source speech signal with an estimated direct-path propagation filter. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18592 [pdf, ps, other]

On the Coexistence of OTFS Modulation with OFDM-based Communication Systems

Authors: Akram Shafie, **hong Yuan, Paul Fitzpatrick, Taka Sakurai, Yuting Fang

Abstract: We investigate the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation of OTFS in the considered coexisting system. In this derivation, we consider (i) the inclusion of multiple cyclic prefixes… ▽ More We investigate the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation of OTFS in the considered coexisting system. In this derivation, we consider (i) the inclusion of multiple cyclic prefixes (CPs) with unequal lengths to the OTFS signal and (ii) edge carrier unloading (ECU), to account for the impacts of CP length, frame structure, and subcarrier arrangement described in 3GPP standards for 4G/5G systems. Our analysis reveals that the inclusion of multiple CPs to the OTFS signal and ECU lead to the channel response exhibiting spreading effects/leakage along the Doppler and delay dimensions, respectively. Consequently, the effective sampled delay-Doppler (DD) domain channel model for OTFS in coexisting systems may exhibit reduced sparsity. We also show that the effective DD domain channel coefficients for OTFS in coexisting systems are influenced by the unequal lengths of CPs. Subsequently, we propose an interference cancellation-based channel estimation (CE) technique for OTFS in coexisting systems. Through numerical results, we validate our analysis, highlight the importance of not ignoring the unequal lengths of CPs during signal detection, and show the significance of the proposed CE technique. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: This paper has been submitted for publication in an IEEE Journal. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: text overlap with arXiv:2311.06850

arXiv:2406.17173 [pdf, other]

Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks

Authors: Zihao **, Yingying Fang, Jiahao Huang, Caiwen Xu, Simon Walsh, Guang Yang

Abstract: The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D dat… ▽ More The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D datasets and they easily encounter overfitting issues on small medical image datasets. To address this limitation, we propose a Diffusion-based 3D Vision Transformer (Diff3Dformer), which utilizes the latent space of the Diffusion model to form the slice sequence for 3D analysis and incorporates clustering attention into ViT to aggregate repetitive information within 3D CT scans, thereby harnessing the power of the advanced transformer in 3D classification tasks on small datasets. Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans, surpassing the state of the art 3D methods and other transformer-based approaches that emerged during the COVID-19 pandemic, demonstrating its robust and superior performance across different scales of data. Experimental results underscore the superiority of our proposed method, indicating its potential for enhancing medical image classification tasks in real-world scenarios. △ Less

Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: conference

arXiv:2406.16189 [pdf, other]

Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation

Authors: Sheng Zhang, Yang Nan, Yingying Fang, Shiyi Wang, Xiaodan Xing, Zhifan Gao, Guang Yang

Abstract: Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue… ▽ More Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue. Inspired by these, this paper introduces an effective lung organ segmentation method called Fuzzy Attention-based Border Rendering (FABR) network. Since fuzzy logic can handle the uncertainty in feature extraction, hence the fusion of deep networks and fuzzy sets should be a viable solution for better performance. Meanwhile, unlike prior top-tier methods that operate on all regular dense points, our FABR depicts lung organ regions as cube-trees, focusing only on recycle-sampled border vulnerable points, rendering the severely discontinuous, false-negative/positive organ regions with a novel Global-Local Cube-tree Fusion (GLCF) module. All experimental results, on four challenging datasets of airway & artery, demonstrate that our method can achieve the favorable performance significantly. △ Less

Submitted 1 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: MICCAI 2024

arXiv:2406.12426 [pdf, other]

Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design

Authors: Yuan Fang, Xianghao Yu, Jie Xu, Ying-Jun Angela Zhang

Abstract: This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to… ▽ More This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cramér-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2404.13536

arXiv:2406.09389 [pdf, other]

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Authors: Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

Abstract: Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color map**, which enhances the visual representation by expanding the image's color range and adjusting the brightness… ▽ More Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color map**, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while kee** the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: https://sagiri0208.github.io

arXiv:2406.07773 [pdf]

Development of Focused X-ray Luminescence Compute Tomography Imaging

Authors: Yile Fang, Yibing Zhang, Changqing Li

Abstract: X-ray luminescence is produced when contrast agents absorb energy from X-ray photons and release a portion of that energy by emitting photons in the visible and near-infrared range. X-ray luminescence computed tomography (XLCT) was introduced in the past decade as a hybrid molecular imaging modality combining the merits of both X-ray imaging (high spatial resolution) and optical imaging (high sens… ▽ More X-ray luminescence is produced when contrast agents absorb energy from X-ray photons and release a portion of that energy by emitting photons in the visible and near-infrared range. X-ray luminescence computed tomography (XLCT) was introduced in the past decade as a hybrid molecular imaging modality combining the merits of both X-ray imaging (high spatial resolution) and optical imaging (high sensitivity to tracer nanophosphors). △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.19298 [pdf, other]

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA) model, which is capable of producing qualitatively comparative responses and effectively translating these discrete comparative levels into a continuous quality score. Specifically, during training, we present to generate scaled-up comparative instructions by comparing images from the same IQA dataset, allowing for more flexible integration of diverse IQA datasets. Utilizing the established large-scale training corpus, we develop a human-like visual quality comparator. During inference, moving beyond binary choices, we propose a soft comparison method that calculates the likelihood of the test image being preferred over multiple predefined anchor images. The quality score is further optimized by maximum a posteriori estimation with the resulting probability matrix. Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios. Moreover, we verify that the probability-matrix-based inference conversion not only improves the rating accuracy of Compare2Score but also zero-shot general-purpose LMMs, suggesting its intrinsic effectiveness. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.06178 [pdf, other]

ACTION: Augmentation and Computation Toolbox for Brain Network Analysis with Functional MRI

Authors: Yuqi Fang, Junhao Zhang, Linmin Wang, Qianqian Wang, Mingxia Liu

Abstract: Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies… ▽ More Functional magnetic resonance imaging (fMRI) has been increasingly employed to investigate functional brain activity. Many fMRI-related software/toolboxes have been developed, providing specialized algorithms for fMRI analysis. However, existing toolboxes seldom consider fMRI data augmentation, which is quite useful, especially in studies with limited or imbalanced data. Moreover, current studies usually focus on analyzing fMRI using conventional machine learning models that rely on human-engineered fMRI features, without investigating deep learning models that can automatically learn data-driven fMRI representations. In this work, we develop an open-source toolbox, called Augmentation and Computation Toolbox for braIn netwOrk aNalysis (ACTION), offering comprehensive functions to streamline fMRI analysis. The ACTION is a Python-based and cross-platform toolbox with graphical user-friendly interfaces. It enables automatic fMRI augmentation, covering blood-oxygen-level-dependent (BOLD) signal augmentation and brain network augmentation. Many popular methods for brain network construction and network feature extraction are included. In particular, it supports constructing deep learning models, which leverage large-scale auxiliary unlabeled data (3,800+ resting-state fMRI scans) for model pretraining to enhance model performance for downstream tasks. To facilitate multi-site fMRI studies, it is also equipped with several popular federated learning strategies. Furthermore, it enables users to design and test custom algorithms through scripting, greatly improving its utility and extensibility. We demonstrate the effectiveness and user-friendliness of ACTION on real fMRI data and present the experimental results. The software, along with its source code and manual, can be accessed online. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, 5 tables

arXiv:2404.13536 [pdf, other]

Joint Transmit and Reflective Beamforming for Multi-Active-IRS-Assisted Cooperative Sensing

Authors: Yuan Fang, Xianghao Yu, Jie Xu

Abstract: This paper studies multi-active intelligent-reflecting-surface (IRS) cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to help the base station (BS) provide multi-view sensing. We focus on the scenario where the sensing target is located in the non-line-of-sight (NLoS) area of the BS. Based on the received echo signal, the BS aims to estimate the target's dire… ▽ More This paper studies multi-active intelligent-reflecting-surface (IRS) cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to help the base station (BS) provide multi-view sensing. We focus on the scenario where the sensing target is located in the non-line-of-sight (NLoS) area of the BS. Based on the received echo signal, the BS aims to estimate the target's direction-of-arrival (DoA) with respect to each IRS. In addition, we leverage active IRSs to overcome the severe path loss induced by multi-hop reflections. Under this setup, we minimize the maximum Cramér-Rao bound (CRB) among all IRSs by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum transmit power and the maximum power amplification gain at individual IRSs. To tackle the resulting highly non-convex max-CRB minimization problem, we propose an efficient algorithm based on alternating optimization, successive convex approximation, and semi-definite relaxation, to obtain a high-quality solution. Finally, numerical results are provided to verify the effectiveness of our proposed design and the benefits of active IRS-assisted sensing compared to the counterpart with passive IRSs. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.08566 [pdf, other]

Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation

Authors: Liu Yang, Qiang Li, Xiaoyang Ren, Yi Fang, Shafei Wang

Abstract: Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in develo** state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, whe… ▽ More Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in develo** state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, where the RFFI model is trained and deployed on different receivers. Due to altered receiver characteristics, direct deployment of RFFI model on a new receiver leads to significant performance degradation. To address this issue, we formulate the cross-receiver RFFI as a model adaptation problem, which adapts the trained model to unlabeled signals from a new receiver. We first develop a theoretical generalization error bound for the adaptation model. Motivated by the bound, we propose a novel method to solve the cross-receiver RFFI problem, which includes domain alignment and adaptive pseudo-labeling. The former aims at finding a feature space where both domains exhibit similar distributions, effectively reducing the domain discrepancy. Meanwhile, the latter employs a dynamic pseudo-labeling scheme to implicitly transfer the label information from the labeled receiver to the new receiver. Experimental results indicate that the proposed method can effectively mitigate the receiver impact and improve the cross-receiver RFFI performance. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE Internet of Things Journal

arXiv:2403.16353 [pdf, other]

Energy-Efficient Hybrid Beamforming with Dynamic On-off Control for Integrated Sensing, Communications, and Powering

Authors: Zeyu Hao, Yuan Fang, Xianghao Yu, Jie Xu, Ling Qiu, Lexi Xu, Shuguang Cui

Abstract: This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy… ▽ More This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy receivers (ERs) at the same time. To facilitate the energy-efficient design, we present a novel HAD architecture for the BS transmitter, which allows dynamic on-off control of its radio frequency (RF) chains and analog phase shifters (PSs) through a switch network. We also consider a practical and comprehensive power consumption model for the BS, by taking into account the power-dependent non-linear power amplifier (PA) efficiency, and the on-off non-transmission power consumption model of RF chains and PSs. We jointly design the hybrid beamforming and dynamic on-off control at the BS, aiming to minimize its total power consumption, while guaranteeing the performance requirements on communication rates, sensing Cramér-Rao bound (CRB), and harvested power levels. The formulation also takes into consideration the per-antenna transmit power constraint and the constant modulus constraints for the analog beamformer at the BS. The resulting optimization problem for ISCAP is highly non-convex. Please refer to the paper for a complete abstract. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: 13 pages, 6 figures, submitted to IEEE Transactions on Communications

arXiv:2403.09096 [pdf, other]

Deep unfolding Network for Hyperspectral Image Super-Resolution with Automatic Exposure Correction

Authors: Yuan Fang, Yipeng Liu, Jie Chen, Zhen Long, Ao Li, Chong-Yung Chi, Ce Zhu

Abstract: In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgr… ▽ More In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgrading the yielded HSISR. In contrast to most existing methods based on respective low-light enhancements (LLIE) of MSI and HSI followed by their fusion, a deep Unfolding HSI Super-Resolution with Automatic Exposure Correction (UHSR-AEC) is proposed, that can effectively generate a high-quality fused HSI-SR (in texture and features) even under very imbalanced exposures, thanks to the correlation between LLIE and HSI-SR taken into account. Extensive experiments are provided to demonstrate the state-of-the-art overall performance of the proposed UHSR-AEC, including comparison with some benchmark peer methods. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.13511 [pdf, other]

Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Authors: Rui Zhou, Xian Li, Ying Fang, Xiaofei Li

Abstract: In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. Mel-FullSubNet takes as input the noisy and reverberant Mel-spectrogram and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech waveform wi… ▽ More In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. Mel-FullSubNet takes as input the noisy and reverberant Mel-spectrogram and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech waveform with a neural vocoder or directly used for ASR. Mel-FullSubNet encapsulates interleaved full-band and sub-band networks, for learning the full-band spectral pattern of signals and the sub-band/narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the major advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit both speech quality and ASR. Experimental results demonstrate a significant improvement in both speech quality and ASR performance achieved by the proposed model. △ Less

Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.09752 [pdf]

Vector spectrometer with Hertz-level resolution and super-recognition capability

Authors: Ting Qing, Shupeng Li, Huashan Yang, Lihan Wang, Yijie Fang, Xiaohu Tang, Meihui Cao, Jianming Lu, Jijun He, Junqiu Liu, Yueguang Lyu, Shilong Pan

Abstract: High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, re… ▽ More High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, resolving overlap** spectral lines during spectroscopic analyses remains a huge challenge. Here, we propose a vector spectrometer with ultrahigh resolution, combining broadband optical frequency hop**, ultrafine microwave-photonic scanning, and vector detection. A programmable frequency-hop** laser was developed, facilitating a sub-Hz linewidth and Hz-level frequency stability, an improvement of four and six orders of magnitude, respectively, compared to those of state-of-the-art tunable lasers. We also designed an asymmetric optical transmitter and receiver to eliminate measurement errors arising from modulation nonlinearity and multi-channel crosstalk. The resultant vector spectrometer exhibits an unprecedented frequency resolution of 2 Hz, surpassing the state-of-the-art by four orders of magnitude, over a 33-nm range. Through high-resolution vector analysis, we observed that group delay information enhances the separation capability of overlap** spectral lines by over 47%, significantly streamlining the real-time identification of diverse substances. Our technique fills the gap in optical spectrometers with resolutions below 10 kHz and enables vector measurement to embrace revolution in functionality. △ Less

Submitted 6 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 21 pages, 6 figures

arXiv:2402.05378 [pdf, other]

Graph Neural Networks for Physical-Layer Security in Multi-User Flexible-Duplex Networks

Authors: Tharaka Perera, Saman Atapattu, Yuting Fang, Jamie Evans

Abstract: This paper explores Physical-Layer Security (PLS) in Flexible Duplex (FlexD) networks, considering scenarios involving eavesdroppers. Our investigation revolves around the intricacies of the sum secrecy rate maximization problem, particularly when faced with coordinated and distributed eavesdroppers employing a Minimum Mean Square Error (MMSE) receiver. Our contributions include an iterative class… ▽ More This paper explores Physical-Layer Security (PLS) in Flexible Duplex (FlexD) networks, considering scenarios involving eavesdroppers. Our investigation revolves around the intricacies of the sum secrecy rate maximization problem, particularly when faced with coordinated and distributed eavesdroppers employing a Minimum Mean Square Error (MMSE) receiver. Our contributions include an iterative classical optimization solution and an unsupervised learning strategy based on Graph Neural Networks (GNNs). To the best of our knowledge, this work marks the initial exploration of GNNs for PLS applications. Additionally, we extend the GNN approach to address the absence of eavesdroppers' channel knowledge. Extensive numerical simulations highlight FlexD's superiority over Half-Duplex (HD) communications and the GNN approach's superiority over the classical method in both performance and time complexity. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03473 [pdf, other]

Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images

Authors: Xiaodan Xing, Huiyu Zhou, Yingying Fang, Guang Yang

Abstract: AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital… ▽ More AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital watermarks aimed at facilitating the discernment of synthetic images' authenticity. These watermarks are embedded within the image pixels and are invisible to the human eye while remains their detectability. Nevertheless, a comprehensive investigation into the potential impact of these invisible watermarks on the utility of synthetic medical images has been lacking. In this study, we propose the incorporation of invisible watermarks into synthetic medical images and seek to evaluate their efficacy in the context of downstream classification tasks. Our goal is to pave the way for discussions on the viability of such watermarks in boosting the detectability of synthetic medical images, fortifying ethical standards, and safeguarding against data pollution and potential scams. △ Less

Submitted 21 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 5 pages

Journal ref: ISBI 2024

arXiv:2401.16564 [pdf]

Data and Physics driven Deep Learning Models for Fast MRI Reconstruction: Fundamentals and Methodologies

Authors: Jiahao Huang, Yinzhe Wu, Fanwen Wang, Yingying Fang, Yang Nan, Cagan Alkan, Lei Xu, Zhifan Gao, Weiwen Wu, Lei Zhu, Zhaolin Chen, Peter Lally, Neal Bangerter, Kawin Setsompop, Yike Guo, Daniel Rueckert, Ge Wang, Guang Yang

Abstract: Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-… ▽ More Magnetic Resonance Imaging (MRI) is a pivotal clinical diagnostic tool, yet its extended scanning times often compromise patient comfort and image quality, especially in volumetric, temporal and quantitative scans. This review elucidates recent advances in MRI acceleration via data and physics-driven models, leveraging techniques from algorithm unrolling models, enhancement-based models, and plug-and-play models to emergent full spectrum of generative models. We also explore the synergistic integration of data models with physics-based insights, encompassing the advancements in multi-coil hardware accelerations like parallel imaging and simultaneous multi-slice imaging, and the optimization of sampling patterns. We then focus on domain-specific challenges and opportunities, including image redundancy exploitation, image integrity, evaluation metrics, data heterogeneity, and model generalization. This work also discusses potential solutions and future research directions, emphasizing the role of data harmonization, and federated learning for further improving the general applicability and performance of these methods in MRI reconstruction. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.13442 [pdf, other]

Finite-Precision Arithmetic Transceiver for Massive MIMO Systems

Authors: Yiming Fang, Li Chen, Yunfei Chen, Huarui Yin

Abstract: Efficient implementation of massive multiple-input-multiple-output (MIMO) transceivers is essential for the next-generation wireless networks. To reduce the high computational complexity of the massive MIMO transceiver, in this paper, we propose a new massive MIMO architecture using finite-precision arithmetic. First, we conduct the rounding error analysis and derive the lower bound of the achieva… ▽ More Efficient implementation of massive multiple-input-multiple-output (MIMO) transceivers is essential for the next-generation wireless networks. To reduce the high computational complexity of the massive MIMO transceiver, in this paper, we propose a new massive MIMO architecture using finite-precision arithmetic. First, we conduct the rounding error analysis and derive the lower bound of the achievable rate for single-input-multiple-output (SIMO) using maximal ratio combining (MRC) and multiple-input-single-output (MISO) systems using maximal ratio transmission (MRT) with finite-precision arithmetic. Then, considering the multi-user scenario, the rounding error analysis of zero-forcing (ZF) detection and precoding is derived by using the normal equations (NE) method. The corresponding lower bounds of the achievable sum rate are also derived and asymptotic analyses are presented. Built upon insights from these analyses and lower bounds, we propose a mixed-precision architecture for massive MIMO systems to offset performance gaps due to finite-precision arithmetic. The corresponding analysis of rounding errors and computational costs is obtained. Simulation results validate the derived bounds and underscore the superiority of the proposed mixed-precision architecture to the conventional structure. △ Less

Submitted 25 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: 16 pages, 8 figures. Submitted to IEEE JSAC for possible publication

arXiv:2401.05431 [pdf, other]

TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal Processing

Authors: Luyuan Xie, Cong Li, Xin Zhang, Shengfang Zhai, Yuejian Fang, Qingni Shen, Zhonghai Wu

Abstract: Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get m… ▽ More Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get more informative representations. We transform the input time-domain medical signals into spectrograms and design a time-frequency encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale representations from the augmented spectrograms. Our TRLS takes spectrogram as input with two types of different data augmentations and maximizes the similarity between positive ones, which effectively circumvents the problem of designing negative samples. Our evaluation of four real-world medical signal datasets focusing on medical signal classification shows that TRLS is superior to the existing frameworks. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: This paper is accept by ICASSP 2024. This is a more detailed version

arXiv:2401.01544 [pdf, other]

Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and Opportunities

Authors: Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang

Abstract: Autonomous driving has attracted significant attention from both academia and industries, which is expected to offer a safer and more efficient driving system. However, current autonomous driving systems are mostly based on a single vehicle, which has significant limitations which still poses threats to driving safety. Collaborative perception with connected and autonomous vehicles (CAVs) shows a… ▽ More Autonomous driving has attracted significant attention from both academia and industries, which is expected to offer a safer and more efficient driving system. However, current autonomous driving systems are mostly based on a single vehicle, which has significant limitations which still poses threats to driving safety. Collaborative perception with connected and autonomous vehicles (CAVs) shows a promising solution to overcoming these limitations. In this article, we first identify the challenges of collaborative perception, such as data sharing asynchrony, data volume, and pose errors. Then, we discuss the possible solutions to address these challenges with various technologies, where the research opportunities are also elaborated. Furthermore, we propose a scheme to deal with communication efficiency and latency problems, which is a channel-aware collaborative perception framework to dynamically adjust the communication graph and minimize latency, thereby improving perception performance while increasing communication efficiency. Finally, we conduct experiments to demonstrate the effectiveness of our proposed scheme. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.14383 [pdf, other]

Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal

Authors: Yicheng Leng, Chaowei Fang, Gen Li, Yixiang Fang, Guanbin Li

Abstract: Visible watermarks, while instrumental in protecting image copyrights, frequently distort the underlying content, complicating tasks like scene interpretation and image editing. Visible watermark removal aims to eliminate the interference of watermarks and restore the background content. However, existing methods often implement watermark component removal and background restoration tasks within a… ▽ More Visible watermarks, while instrumental in protecting image copyrights, frequently distort the underlying content, complicating tasks like scene interpretation and image editing. Visible watermark removal aims to eliminate the interference of watermarks and restore the background content. However, existing methods often implement watermark component removal and background restoration tasks within a singular branch, leading to residual watermarks in the predictions and ignoring cases where watermarks heavily obscure the background. To address these limitations, this study introduces the Removing Interference and Recovering Content Imaginatively (RIRCI) framework. RIRCI embodies a two-stage approach: the initial phase centers on discerning and segregating the watermark component, while the subsequent phase focuses on background content restoration. To achieve meticulous background restoration, our proposed model employs a dual-path network capable of fully exploring the intrinsic background information beneath semi-transparent watermarks and peripheral contextual information from unaffected regions. Moreover, a Global and Local Context Interaction module is built upon multi-layer perceptrons and bidirectional feature transformation for comprehensive representation modeling in the background restoration phase. The efficacy of our approach is empirically validated across two large-scale datasets, and our findings reveal a marked enhancement over existing watermark removal techniques. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.05557 [pdf, ps, other]

Long-Term Rate-Fairness-Aware Beamforming Based Massive MIMO Systems

Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, H. V. Poor, L. Hanzo

Abstract: This is the first treatise on multi-user (MU) beamforming designed for achieving long-term rate-fairness in fulldimensional MU massive multi-input multi-output (m-MIMO) systems. Explicitly, based on the channel covariances, which can be assumed to be known beforehand, we address this problem by optimizing the following objective functions: the users' signal-toleakage-noise ratios (SLNRs) using SLN… ▽ More This is the first treatise on multi-user (MU) beamforming designed for achieving long-term rate-fairness in fulldimensional MU massive multi-input multi-output (m-MIMO) systems. Explicitly, based on the channel covariances, which can be assumed to be known beforehand, we address this problem by optimizing the following objective functions: the users' signal-toleakage-noise ratios (SLNRs) using SLNR max-min optimization, geometric mean of SLNRs (GM-SLNR) based optimization, and SLNR soft max-min optimization. We develop a convex-solver based algorithm, which invokes a convex subproblem of cubic time-complexity at each iteration for solving the SLNR maxmin problem. We then develop closed-form expression based algorithms of scalable complexity for the solution of the GMSLNR and of the SLNR soft max-min problem. The simulations provided confirm the users' improved-fairness ergodic rate distributions. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2311.06850 [pdf, ps, other]

Coexistence of OTFS Modulation With OFDM-based Communication Systems

Authors: Akram Shafie, **hong Yuan, Yuting Fang, Paul Fitzpatrick, Taka Sakurai

Abstract: This study examines the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) wireless communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation (IOR) of OTFS when it coexists with an OFDM system while considering the impact of unequal lengths of the cy… ▽ More This study examines the coexistence of orthogonal time-frequency space (OTFS) modulation with current fourth- and fifth-generation (4G/5G) wireless communication systems that primarily use orthogonal frequency-division multiplexing (OFDM) waveforms. We first derive the input-output-relation (IOR) of OTFS when it coexists with an OFDM system while considering the impact of unequal lengths of the cyclic prefixes (CPs) in the OTFS signal. We show analytically that the inclusion of multiple CPs to the OTFS signal results in the effective sampled delay-Doppler (DD) domain channel response to be less sparse. We also show that the effective DD domain channel coefficients for OTFS in coexisting systems are influenced by the unequal lengths of the CPs. Subsequently, we propose an embedded pilot-aided channel estimation (CE) technique for OTFS in coexisting systems that leverages the derived IOR for accurate channel characterization. Using numerical results, we show that ignoring the impact of unequal lengths of the CPs during signal detection can degrade the bit error rate performance of OTFS in coexisting systems. We also show that the proposed CE technique for OTFS in coexisting systems outperforms the state-of-the-art threshold-based CE technique. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: This paper has been accepted for publication in IEEE Global Communications Conferences (GLOBECOM) 2023. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.05372 [pdf, other]

Joint Angle and Delay Cramér-Rao Bound Optimization for Integrated Sensing and Communications

Authors: Chao Hu, Yuan Fang, Ling Qiu

Abstract: In this paper, we study a multi-input multi-output (MIMO) beamforming design in an integrated sensing and communication (ISAC) system, in which an ISAC base station (BS) is used to communicate with multiple downlink users and simultaneously the communication signals are reused for sensing multiple targets. Our interested sensing parameters are the angle and delay information of the targets, which… ▽ More In this paper, we study a multi-input multi-output (MIMO) beamforming design in an integrated sensing and communication (ISAC) system, in which an ISAC base station (BS) is used to communicate with multiple downlink users and simultaneously the communication signals are reused for sensing multiple targets. Our interested sensing parameters are the angle and delay information of the targets, which can be used to locate these targets. Under this consideration, we first derive the Cramér-Rao bound (CRB) for angle and delay estimation. Then, we optimize the transmit beamforming at the BS to minimize the CRB, subject to communication rate and power constraints. In particular, we obtain the optimal solution in closed-form in the case of single-target and single-user, and in the case of multi-target and multi-user scenario, the sparsity of the optimal solution is proven, leading to a reduction in computational complexity during optimization. The numerical results demonstrate that the optimized beamforming yields excellent positioning performance and effectively reduces the requirement for a large number of antennas at the BS. △ Less

Submitted 13 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: This paper has been submitted to IEEE TVT

arXiv:2311.01066 [pdf, other]

Dynamic Multimodal Information Bottleneck for Multimodality Classification

Authors: Yingying Fang, Shuang Wu, Sheng Zhang, Chaoyan Huang, Tieyong Zeng, Xiaodan Xing, Simon Walsh, Guang Yang

Abstract: Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These appro… ▽ More Effectively leveraging multimodal data such as various images, laboratory tests and clinical information is gaining traction in a variety of AI-based medical diagnosis and prognosis tasks. Most existing multi-modal techniques only focus on enhancing their performance by leveraging the differences or shared features from various modalities and fusing feature across different modalities. These approaches are generally not optimal for clinical settings, which pose the additional challenges of limited training data, as well as being rife with redundant data or noisy modality channels, leading to subpar performance. To address this gap, we study the robustness of existing methods to data redundancy and noise and propose a generalized dynamic multimodal information bottleneck framework for attaining a robust fused feature representation. Specifically, our information bottleneck module serves to filter out the task-irrelevant information and noises in the fused feature, and we further introduce a sufficiency loss to prevent drop** of task-relevant information, thus explicitly preserving the sufficiency of prediction information in the distilled feature. We validate our model on an in-house and a public COVID19 dataset for mortality prediction as well as two public biomedical datasets for diagnostic tasks. Extensive experiments show that our method surpasses the state-of-the-art and is significantly more robust, being the only method to remain performance when large-scale noisy channels exist. Our code is publicly available at https://github.com/ayanglab/DMIB. △ Less

Submitted 25 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2311.01003 [pdf, other]

Minimum Snap Trajectory Generation and Control for an Under-actuated Flap** Wing Aerial Vehicle

Authors: Chen Qian, Rui Chen, Peiyao Shen, Yongchun Fang, Jifu Yan, Tiefeng Li

Abstract: Minimum Snap Trajectory Generation and Control for an Under-actuated Flap** Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flap** wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and d… ▽ More Minimum Snap Trajectory Generation and Control for an Under-actuated Flap** Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flap** wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and develop a general-purpose trajectory generation strategy. Subsequently, the trajectory tracking controller is developed with the help of robust control and switch control techniques. After that, the overall system asymptotic stability is guaranteed by Lyapunov stability analysis. To make the controller applicable in real flight, we also provide several instructions. Finally, a series of experiment results manifest the successful implementation of the proposed trajectory generation strategy and tracking control strategy. This work firstly achieves the closed-loop integration of trajectory generation and control for real 3-dimensional flight of an underactuated FWAV to a practical level. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2309.08150 [pdf, other]

Unimodal Aggregation for CTC-based Speech Recognition

Authors: Ying Fang, Xiaofei Li

Abstract: This paper works on non-autoregressive automatic speech recognition. A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token, and thus to learn better feature representations for text tokens. The frame-wise features and weights are both derived from an encoder. Then, the feature frames with unimodal weights are integrated and further… ▽ More This paper works on non-autoregressive automatic speech recognition. A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token, and thus to learn better feature representations for text tokens. The frame-wise features and weights are both derived from an encoder. Then, the feature frames with unimodal weights are integrated and further processed by a decoder. Connectionist temporal classification (CTC) loss is applied for training. Compared to the regular CTC, the proposed method learns better feature representations and shortens the sequence length, resulting in lower recognition error and computational complexity. Experiments on three Mandarin datasets show that UMA demonstrates superior or comparable performance to other advanced non-autoregressive methods, such as self-conditioned CTC. Moreover, by integrating self-conditioned CTC into the proposed framework, the performance can be further noticeably improved. △ Less

Submitted 19 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: Accepted by ICASSP 2024

arXiv:2309.03472 [pdf, other]

Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation

Authors: Xiangjie Sui, Hanwei Zhu, Xuelin Liu, Yuming Fang, Shiqi Wang, Zhou Wang

Abstract: Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that sig… ▽ More Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that significantly sway user gaze patterns and the overreliance on a single viewport sequence from the 360$^\circ$ image for quality inference. To address these issues, we introduce a unique generative scanpath representation (GSR) for effective quality inference of 360$^\circ$ images, which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition. More specifically, given a viewing condition characterized by the starting point of viewing and exploration time, a set of scanpaths consisting of dynamic visual fixations can be produced using an apt scanpath generator. Following this vein, we use the scanpaths to convert the 360$^\circ$ image into the unique GSR, which provides a global overview of gazed-focused contents derived from scanpaths. As such, the quality inference of the 360$^\circ$ image is swiftly transformed to that of GSR. We then propose an efficient OIQA computational framework by learning the quality maps of GSR. Comprehensive experimental results validate that the predictions of the proposed framework are highly consistent with human perception in the spatiotemporal domain, especially in the challenging context of locally distorted 360$^\circ$ images under varied viewing conditions. The code will be released at https://github.com/xiangjieSui/GSR △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 12 pages, 5 figures

arXiv:2309.02259 [pdf, ps, other]

Design of a New CIM-DCSK-Based Ambient Backscatter Communication System

Authors: Ruipeng Yang, Yi Fang, **** Chen, Huan Ma

Abstract: To improve the data rate in differential chaos shift keying (DCSK) based ambient backscatter communication (AmBC) system, we propose a new AmBC system based on code index modulation (CIM), referred to as CIM-DCSK-AmBC system. In the proposed system, the CIM-DCSK signal transmitted in the direct link is used as the radio frequency source of the backscatter link. The signal format in the backscatter… ▽ More To improve the data rate in differential chaos shift keying (DCSK) based ambient backscatter communication (AmBC) system, we propose a new AmBC system based on code index modulation (CIM), referred to as CIM-DCSK-AmBC system. In the proposed system, the CIM-DCSK signal transmitted in the direct link is used as the radio frequency source of the backscatter link. The signal format in the backscatter link is designed to increase the data rate as well as eliminate the interference of the direct link signal. As such, the direct link signal and the backscatter link signal can be received and demodulated simultaneously. Moreover, we derive and validate the theoretical bit error rate (BER) expressions of the CIM-DCSK-AmBC system over multipath Rayleigh fading channels. Regarding the short reference DCSK-based AmBC (SR-DCSK-AmBC) system as a benchmark system, numerical results reveal that the CIM-DCSK-AmBC system can achieve better BER performance in the direct link and higher throughput in the backscatter link than the benchmark system. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.16551 [pdf]

Object Detection for Caries or Pit and Fissure Sealing Requirement in Children's First Permanent Molars

Authors: Chenyao Jiang, Shiyao Zhai, Hengrui Song, Yuqing Ma, Yachen Fan, Yancheng Fang, Dongmei Yu, Canyang Zhang, Sanyang Han, Runming Wang, Yong Liu, Jianbo Li, Peiwu Qin

Abstract: Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit… ▽ More Dental caries is one of the most common oral diseases that, if left untreated, can lead to a variety of oral problems. It mainly occurs inside the pits and fissures on the occlusal/buccal/palatal surfaces of molars and children are a high-risk group for pit and fissure caries in permanent molars. Pit and fissure sealing is one of the most effective methods that is widely used in prevention of pit and fissure caries. However, current detection of pits and fissures or caries depends primarily on the experienced dentists, which ordinary parents do not have, and children may miss the remedial treatment without timely detection. To address this issue, we present a method to autodetect caries and pit and fissure sealing requirements using oral photos taken by smartphones. We use the YOLOv5 and YOLOX models and adopt a tiling strategy to reduce information loss during image pre-processing. The best result for YOLOXs model with tiling strategy is 72.3 mAP.5, while the best result without tiling strategy is 71.2. YOLOv5s6 model with/without tiling attains 70.9/67.9 mAP.5, respectively. We deploy the pre-trained network to mobile devices as a WeChat applet, allowing in-home detection by parents or children guardian. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.09359 [pdf, other]

Training-Free Energy Beamforming Assisted by Wireless Sensing

Authors: Li Zhang, Yuan Fang, Zixiang Ren, Ling Qiu, Jie Xu

Abstract: This paper studies the transmit energy beamforming in a multi-antenna wireless power transfer (WPT) system, in which an access point (AP) equipped with a uniform linear array (ULA) sends radio signals to wirelessly charge multiple single-antenna energy receivers (ERs). Different from conventional energy beamforming designs that require the AP to acquire the channel state information (CSI) via trai… ▽ More This paper studies the transmit energy beamforming in a multi-antenna wireless power transfer (WPT) system, in which an access point (AP) equipped with a uniform linear array (ULA) sends radio signals to wirelessly charge multiple single-antenna energy receivers (ERs). Different from conventional energy beamforming designs that require the AP to acquire the channel state information (CSI) via training and feedback, we propose a new training-free energy beamforming approach assisted by wireless radar sensing, which is implemented based on the following two-stage protocol. In the first stage, the AP performs wireless radar sensing to estimate the path gain and angle parameters of the ERs for constructing the corresponding CSI. In the second stage, the AP implements the transmit energy beamforming based on the constructed CSI to efficiently charge these ERs in a fair manner. Under this setup, first, we jointly optimize the sensing beamformers and duration in the first stage to minimize the sensing duration, while ensuring a given accuracy threshold for parameters estimation subject to the maximum transmit power constraint at the AP. Next, we optimize the energy beamformers in the second stage to maximize the minimum harvested energy by all ERs. In this approach, the estimation accuracy threshold for the first stage is properly designed to balance the resource allocation between the two stages for optimizing the ultimate energy harvesting performance. Finally, numerical results show that the proposed training-free energy beamforming design performs close to the performance upper bound with perfect CSI, and outperforms the benchmark schemes without such joint optimization and that with isotropic transmission. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 6 pages, 5 figures, submitted to globecom workshop

arXiv:2308.04126 [pdf, other]

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

Authors: Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

Abstract: This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm lev… ▽ More This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data. △ Less

Submitted 17 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.12845 [pdf, other]

Multi-View Vertebra Localization and Identification from CT Images

Authors: Han Wu, Jiadong Zhang, Yu Fang, Zhentao Liu, Nizhuan Wang, Zhiming Cui, Dinggang Shen

Abstract: Accurately localizing and identifying vertebrae from CT images is crucial for various clinical applications. However, most existing efforts are performed on 3D with crop** patch operation, suffering from the large computation costs and limited global information. In this paper, we propose a multi-view vertebra localization and identification from CT images, converting the 3D problem into a 2D lo… ▽ More Accurately localizing and identifying vertebrae from CT images is crucial for various clinical applications. However, most existing efforts are performed on 3D with crop** patch operation, suffering from the large computation costs and limited global information. In this paper, we propose a multi-view vertebra localization and identification from CT images, converting the 3D problem into a 2D localization and identification task on different views. Without the limitation of the 3D cropped patch, our method can learn the multi-view global information naturally. Moreover, to better capture the anatomical structure information from different view perspectives, a multi-view contrastive learning strategy is developed to pre-train the backbone. Additionally, we further propose a Sequence Loss to maintain the sequential structure embedded along the vertebrae. Evaluation results demonstrate that, with only two 2D networks, our method can localize and identify vertebrae in CT images accurately, and outperforms the state-of-the-art methods consistently. Our code is available at https://github.com/ShanghaiTech-IMPACT/Multi-View-Vertebra-Localization-and-Identification-from-CT-Images. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: MICCAI 2023

arXiv:2307.11337 [pdf, other]

Fundamental CRB-Rate Tradeoff in Multi-Antenna ISAC Systems with Information Multicasting and Multi-Target Sensing

Authors: Zixiang Ren, Yunfei Peng, Xianxin Song, Yuan Fang, Ling Qiu, Liang Liu, Derrick Wing Kwan Ng, Jie Xu

Abstract: This paper investigates the performance tradeoff for a multi-antenna integrated sensing and communication (ISAC) system with simultaneous information multicasting and multi-target sensing, in which a multi-antenna base station (BS) sends the common information messages to a set of single-antenna communication users (CUs) and estimates the parameters of multiple sensing targets based on the echo si… ▽ More This paper investigates the performance tradeoff for a multi-antenna integrated sensing and communication (ISAC) system with simultaneous information multicasting and multi-target sensing, in which a multi-antenna base station (BS) sends the common information messages to a set of single-antenna communication users (CUs) and estimates the parameters of multiple sensing targets based on the echo signals concurrently. We consider two target sensing scenarios without and with prior target knowledge at the BS, in which the BS is interested in estimating the complete multi-target response matrix and the target reflection coefficients/angles, respectively. First, we consider the capacity-achieving transmission and characterize the fundamental tradeoff between the achievable rate and the multi-target estimation Cramér-Rao bound (CRB) accordingly. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: 32 pages

arXiv:2307.05382 [pdf, other]

Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling

Authors: Ziyue Li, Yuchen Fang, You Li, Kan Ren, Yansen Wang, Xufang Luo, Juanyong Duan, Congrui Huang, Dongsheng Li, Lili Qiu

Abstract: A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to… ▽ More A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: Accepted in IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023

arXiv:2307.05127 [pdf, ps, other]

Optimal Coordinated Transmit Beamforming for Networked Integrated Sensing and Communications

Authors: Gaoyuan Cheng, Yuan Fang, Jie Xu, Derrick Wing Kwan Ng

Abstract: This paper studies a multi-antenna networked integrated sensing and communications (ISAC) system, in which a set of multi-antenna base stations (BSs) employ the coordinated transmit beamforming to serve multiple single-antenna communication users (CUs) and perform joint target detection by exploiting the reflected signals simultaneously. To facilitate target sensing, the BSs transmit dedicated sen… ▽ More This paper studies a multi-antenna networked integrated sensing and communications (ISAC) system, in which a set of multi-antenna base stations (BSs) employ the coordinated transmit beamforming to serve multiple single-antenna communication users (CUs) and perform joint target detection by exploiting the reflected signals simultaneously. To facilitate target sensing, the BSs transmit dedicated sensing signals combined with their information signals. Accordingly, we consider two types of CU receivers with and without the capability of canceling the interference from the dedicated sensing signals, respectively. In addition, we investigate two scenarios with and without time synchronization among the BSs. For the scenario with synchronization, the BSs can exploit the target-reflected signals over both the direct links (BS-to-target-to-originated BS links) and the cross-links (BS-to-target-to-other BSs links) for joint detection, while in the unsynchronized scenario, the BSs can only utilize the target-reflected signals over the direct links. For each scenario under different types of CU receivers, we optimize the coordinated transmit beamforming at the BSs to maximize the minimum detection probability over a particular targeted area, while guaranteeing the required minimum signal-to-interference-plus-noise ratio (SINR) constraints at the CUs. These SINR-constrained detection probability maximization problems are recast as non-convex quadratically constrained quadratic programs (QCQPs), which are then optimally solved via the semi-definite relaxation (SDR) technique. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.01085

arXiv:2307.02242 [pdf, other]

Multi-IRS-Enabled Integrated Sensing and Communications

Authors: Yuan Fang, Siyao Zhang, Xinmin Li, Xianghao Yu, Jie Xu, Shuguang Cui

Abstract: This paper studies a multi-intelligent-reflecting-surface-(IRS)-enabled integrated sensing and communications (ISAC) system, in which multiple IRSs are installed to help the base station (BS) provide ISAC services at separate line-of-sight (LoS) blocked areas. We focus on the scenario with semi-passive uniform linear array (ULA) IRSs for sensing, in which each IRS is integrated with dedicated sens… ▽ More This paper studies a multi-intelligent-reflecting-surface-(IRS)-enabled integrated sensing and communications (ISAC) system, in which multiple IRSs are installed to help the base station (BS) provide ISAC services at separate line-of-sight (LoS) blocked areas. We focus on the scenario with semi-passive uniform linear array (ULA) IRSs for sensing, in which each IRS is integrated with dedicated sensors for processing echo signals, and each IRS simultaneously serves one sensing target and multiple communication users (CUs) in its coverage area. In particular, we suppose that the BS sends combined information and dedicated sensing signals for ISAC. Two cases with point and extended targets are considered, in which each IRS aims to estimate the direction-of-arrival (DoA) of the corresponding target and the complete target response matrix, respectively. Under this setup, we first derive the closed-form Cram{é}r-Rao bounds (CRBs) for parameters estimation under the two target models. For the point target case, the CRB for DoA estimation is shown to be inversely proportional to the cubic of the number of sensors at each IRS, while for the extended target case, the CRB for target response matrix estimation is proportional to the number of IRS sensors. Next, we consider two different types of CU receivers that can and cannot cancel the interference from dedicated sensing signals prior to information decoding. To achieve fair and optimized sensing performance, we minimize the maximum CRB at all IRSs for the two target cases, via jointly optimizing the transmit beamformers at the BS and the reflective beamformers at the multiple IRSs, subject to the minimum signal-to-interference-plus-noise ratio (SINR) constraints at individual CUs, the maximum transmit power constraint at the BS, and the unit-modulus constraints at the multiple IRSs. △ Less

Submitted 17 December, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2307.00327 [pdf]

SDRCNN: A single-scale dense residual connected convolutional neural network for pansharpening

Authors: Yuan Fang, Yuanzhi Cai, Lei Fan

Abstract: Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better… ▽ More Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better trade-off between accuracy and efficiency. The performance of SDRCNN was tested using four datasets from the WorldView-3, WorldView-2 and QuickBird satellites. The compared methods include eight traditional methods (i.e., GS, GSA, PRACS, BDSD, SFIM, GLP-CBD, CDIF and LRTCFPan) and five lightweight deep learning methods (i.e., PNN, PanNet, BayesianNet, DMDNet and FusionNet). Based on a visual inspection of the pansharpened images created and the associated absolute residual maps, SDRCNN exhibited least spatial detail blurring and spectral distortion, amongst all the methods considered. The values of the quantitative evaluation metrics were closest to their ideal values when SDRCNN was used. The processing time of SDRCNN was also the shortest among all methods tested. Finally, the effectiveness of each component in the SDRCNN was demonstrated in ablation experiments. All of these confirmed the superiority of SDRCNN. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: This paper has been accepted for publication in the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

arXiv:2306.15303 [pdf, other]

Energy-Efficient MIMO Integrated Sensing and Communications with On-off Non-transmission Power

Authors: Guanlin Wu, Yuan Fang, Jie Xu, Zhiyong Feng, Shuguang Cui

Abstract: This paper investigates the energy efficiency of a multiple-input multiple-output (MIMO) integrated sensing and communications (ISAC) system, in which one multi-antenna base station (BS) transmits unified ISAC signals to a multi-antenna communication user (CU) and at the same time use the echo signals to estimate an extended target. We focus on one particular ISAC transmission block and take into… ▽ More This paper investigates the energy efficiency of a multiple-input multiple-output (MIMO) integrated sensing and communications (ISAC) system, in which one multi-antenna base station (BS) transmits unified ISAC signals to a multi-antenna communication user (CU) and at the same time use the echo signals to estimate an extended target. We focus on one particular ISAC transmission block and take into account the practical on-off non-transmission power at the BS. Under this setup, we minimize the energy consumption at the BS while ensuring a minimum average data rate requirement for communication and a maximum Cramér-Rao bound (CRB) requirement for target estimation, by jointly optimizing the transmit covariance matrix and the ``on'' duration for active transmission. We obtain the optimal solution to the rate-and-CRB-constrained energy minimization problem in a semi-closed form. Interestingly, the obtained optimal solution is shown to unify the spectrum-efficient and energy-efficient communications and sensing designs. In particular, for the special MIMO sensing case with rate constraint inactive, the optimal solution follows the isotropic transmission with shortest ``on'' duration, in which the BS radiates the required sensing energy by using sufficiently high power over the shortest duration. For the general ISAC case, the optimal transmit covariance solution is of full rank and follows the eigenmode transmission based on the communication channel, while the optimal ``on'' duration is determined based on both the rate and CRB constraints. Numerical results show that the proposed ISAC design achieves significantly reduced energy consumption as compared to the benchmark schemes based on isotropic transmission, always-on transmission, and sensing or communications only designs, especially when the rate and CRB constraints become stringent. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 13 pages, 9 figures

arXiv:2306.11977 [pdf]

Encoding Enhanced Complex CNN for Accurate and Highly Accelerated MRI

Authors: Zimeng Li, Sa Xiao, Cheng Wang, Haidong Li, Xiuchao Zhao, Caohui Duan, Qian Zhou, Qiuchen Rao, Yuan Fang, Junshuai Xie, Lei Shi, Fumin Guo, Chaohui Ye, Xin Zhou

Abstract: Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) direc… ▽ More Magnetic resonance imaging (MRI) using hyperpolarized noble gases provides a way to visualize the structure and function of human lung, but the long imaging time limits its broad research and clinical applications. Deep learning has demonstrated great potential for accelerating MRI by reconstructing images from undersampled data. However, most existing deep conventional neural networks (CNN) directly apply square convolution to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. In this work, we propose an encoding enhanced (EN2) complex CNN for highly undersampled pulmonary MRI reconstruction. EN2 employs convolution along either the frequency or phase-encoding direction, resembling the mechanisms of k-space sampling, to maximize the utilization of the encoding correlation and integrity within a row or column of k-space. We also employ complex convolution to learn rich representations from the complex k-space data. In addition, we develop a feature-strengthened modularized unit to further boost the reconstruction performance. Experiments demonstrate that our approach can accurately reconstruct hyperpolarized 129Xe and 1H lung MRI from 6-fold undersampled k-space data and provide lung function measurements with minimal biases compared with fully-sampled image. These results demonstrate the effectiveness of the proposed algorithmic components and indicate that the proposed approach could be used for accelerated pulmonary MRI in research and clinical lung disease patient care. △ Less

Submitted 13 November, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2305.12311 [pdf, other]

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Authors: Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Abstract: The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a… ▽ More The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals. △ Less

Submitted 20 May, 2023; originally announced May 2023.

arXiv:2305.09331 [pdf, other]

Energy-Efficient WiFi Backscatter Communication for Green IoTs

Authors: Yimeng Huang, Lijie Liu, Jihong Yu, Yuguang Fang, Wei Gong

Abstract: The boom of the Internet of Things has revolutionized people's lives, but it has also resulted in massive resource consumption and environmental pollution. Recently, Green IoT (GIoT) has become a worldwide consensus to address this issue. In this paper, we propose EEWScatter, an energy-efficient WiFi backscatter communication system to pursue the goal of GIoT. Unlike previous backscatter systems t… ▽ More The boom of the Internet of Things has revolutionized people's lives, but it has also resulted in massive resource consumption and environmental pollution. Recently, Green IoT (GIoT) has become a worldwide consensus to address this issue. In this paper, we propose EEWScatter, an energy-efficient WiFi backscatter communication system to pursue the goal of GIoT. Unlike previous backscatter systems that solely focus on tags, our approach offers a comprehensive system-wide view on energy conservation. Specifically, we reuse ambient signals as carriers and utilize an ultra-low-power and battery-free design for tag nodes by backscatter. Further, we design a new CRC-based algorithm that enables the demodulation of both ambient and tag data by only a single receiver while using ambient carriers. Such a design eliminates system reliance on redundant transceivers with high power consumption. Results demonstrate that EEWScatter achieves the lowest overall system power consumption and saves at least half of the energy. What's more, the power consumption of our tag is only 1/1000 of that of active radio. We believe that EEWScatter is a critical step towards a sustainable future. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2303.14739 [pdf, other]

Geometry-Aware Attenuation Field Learning for Sparse-View CBCT Reconstruction

Authors: Zhentao Liu, Yu Fang, Changjian Li, Han Wu, Yuan Liu, Zhiming Cui, Dinggang Shen

Abstract: Cone Beam Computed Tomography (CBCT) is the most widely used imaging method in dentistry. As hundreds of X-ray projections are needed to reconstruct a high-quality CBCT image (i.e., the attenuation field) in traditional algorithms, sparse-view CBCT reconstruction has become a main focus to reduce radiation dose. Several attempts have been made to solve it while still suffering from insufficient da… ▽ More Cone Beam Computed Tomography (CBCT) is the most widely used imaging method in dentistry. As hundreds of X-ray projections are needed to reconstruct a high-quality CBCT image (i.e., the attenuation field) in traditional algorithms, sparse-view CBCT reconstruction has become a main focus to reduce radiation dose. Several attempts have been made to solve it while still suffering from insufficient data or poor generalization ability for novel patients. This paper proposes a novel attenuation field encoder-decoder framework by first encoding the volumetric feature from multi-view X-ray projections, then decoding it into the desired attenuation field. The key insight is when building the volumetric feature, we comply with the multi-view CBCT reconstruction nature and emphasize the view consistency property by geometry-aware spatial feature querying and adaptive feature fusing. Moreover, the prior knowledge information learned from data population guarantees our generalization ability when dealing with sparse view input. Comprehensive evaluations have demonstrated the superiority in terms of reconstruction quality, and the downstream application further validates the feasibility of our method in real-world clinics. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.08015 [pdf, ps, other]

Molecular Communication for Quorum Sensing Inspired Cooperative Drug Delivery

Authors: Yuting Fang, Stuart T. Johnston, Matt Faria, Xinyu Huang, Andrew W. Eckford, Jamie Evans

Abstract: A cooperative drug delivery system is proposed, where quorum sensing (QS), a density-dependent bacterial behavior coordination mechanism, is employed by synthetic bacterium-based nanomachines (B-NMs) for controllable drug delivery. In our proposed system, drug delivery is only triggered when there are enough QS molecules, which in turn only happens when there are enough B-NMs. This makes the propo… ▽ More A cooperative drug delivery system is proposed, where quorum sensing (QS), a density-dependent bacterial behavior coordination mechanism, is employed by synthetic bacterium-based nanomachines (B-NMs) for controllable drug delivery. In our proposed system, drug delivery is only triggered when there are enough QS molecules, which in turn only happens when there are enough B-NMs. This makes the proposed system can be used to achieve a high release rate of drug molecules from a high number of B-NMs when the population density of B-NMs may not be known. Analytical expressions for i) the expected activation probability of the B-NM due to randomly-distributed B-NMs and ii) the expected aggregate absorption rate of drug molecules due to randomly-distributed QS activated B-NMs are derived. Analytical results are verified by particle-based simulations. The derived results can help to predict and control the impact of environmental factors (e.g. diffusion coefficient and degradation rate) on the absorption rate of drug molecules since rigorous diffusion-based molecular channels are considered. Our results show that the activation probability at the B-NM increases as this B-NM is located closer to the center of the B-NM population and the aggregate absorption rate of the drug molecules non-linearly increases as the population density increases. △ Less

Submitted 14 February, 2023; originally announced March 2023.

Comments: 9 pages; 9 figures

arXiv:2302.09274 [pdf, ps, other]

doi 10.1109/TVT.2023.3245539

Low-Complexity Pareto-Optimal 3D Beamforming for the Full-Dimensional Multi-User Massive MIMO Downlink

Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, L. Hanzo

Abstract: Full-dimensional (FD) multi-user massive multiple input multiple output (m-MIMO) systems employ large two-dimensional (2D) rectangular antenna arrays to control both the azimuth and elevation angles of signal transmission. We introduce the sum of two outer products of the azimuth and elevation beamforming vectors having moderate dimensions as a new class of FD beamforming. We show that this low-co… ▽ More Full-dimensional (FD) multi-user massive multiple input multiple output (m-MIMO) systems employ large two-dimensional (2D) rectangular antenna arrays to control both the azimuth and elevation angles of signal transmission. We introduce the sum of two outer products of the azimuth and elevation beamforming vectors having moderate dimensions as a new class of FD beamforming. We show that this low-complexity class is capable of outperforming 2D beamforming relying on the single outer product of the azimuth and elevation beamforming vectors. It is also capable of performing close to its FD counterpart of massive dimensions in terms of either the users minimum rate or their geometric mean rate (GM-rate), or sum rate (SR). Furthermore, we also show that even FD beamforming may be outperformed by our outer product-based improper Gaussian signaling solution. Explicitly, our design is based on low-complexity algorithms relying on convex problems of moderate dimensions for max-min rate optimization or on closed-form expressions for GM-rate and SR maximization. △ Less

Submitted 18 February, 2023; originally announced February 2023.

arXiv:2301.11166 [pdf, ps, other]

Flex-Net: A Graph Neural Network Approach to Resource Management in Flexible Duplex Networks

Authors: Tharaka Perera, Saman Atapattu, Yuting Fang, Prathapasinghe Dharmawansa, Jamie Evans

Abstract: Flexible duplex networks allow users to dynamically employ uplink and downlink channels without static time scheduling, thereby utilizing the network resources efficiently. This work investigates the sum-rate maximization of flexible duplex networks. In particular, we consider a network with pairwise-fixed communication links. Corresponding combinatorial optimization is a non-deterministic polynom… ▽ More Flexible duplex networks allow users to dynamically employ uplink and downlink channels without static time scheduling, thereby utilizing the network resources efficiently. This work investigates the sum-rate maximization of flexible duplex networks. In particular, we consider a network with pairwise-fixed communication links. Corresponding combinatorial optimization is a non-deterministic polynomial (NP)-hard without a closed-form solution. In this respect, the existing heuristics entail high computational complexity, raising a scalability issue in large networks. Motivated by the recent success of Graph Neural Networks (GNNs) in solving NP-hard wireless resource management problems, we propose a novel GNN architecture, named Flex-Net, to jointly optimize the communication direction and transmission power. The proposed GNN produces near-optimal performance meanwhile maintaining a low computational complexity compared to the most commonly used techniques. Furthermore, our numerical results shed light on the advantages of using GNNs in terms of sample complexity, scalability, and generalization capability. △ Less

Submitted 15 March, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

arXiv:2211.17048 [pdf, other]

SNAF: Sparse-view CBCT Reconstruction with Neural Attenuation Fields

Authors: Yu Fang, Lanzhuju Mei, Changjian Li, Yuan Liu, Wen** Wang, Zhiming Cui, Dinggang Shen

Abstract: Cone beam computed tomography (CBCT) has been widely used in clinical practice, especially in dental clinics, while the radiation dose of X-rays when capturing has been a long concern in CBCT imaging. Several research works have been proposed to reconstruct high-quality CBCT images from sparse-view 2D projections, but the current state-of-the-arts suffer from artifacts and the lack of fine details… ▽ More Cone beam computed tomography (CBCT) has been widely used in clinical practice, especially in dental clinics, while the radiation dose of X-rays when capturing has been a long concern in CBCT imaging. Several research works have been proposed to reconstruct high-quality CBCT images from sparse-view 2D projections, but the current state-of-the-arts suffer from artifacts and the lack of fine details. In this paper, we propose SNAF for sparse-view CBCT reconstruction by learning the neural attenuation fields, where we have invented a novel view augmentation strategy to overcome the challenges introduced by insufficient data from sparse input views. Our approach achieves superior performance in terms of high reconstruction quality (30+ PSNR) with only 20 input views (25 times fewer than clinical collections), which outperforms the state-of-the-arts. We have further conducted comprehensive experiments and ablation analysis to validate the effectiveness of our approach. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2211.14576 [pdf, other]

CFNet: Conditional Filter Learning with Dynamic Noise Estimation for Real Image Denoising

Authors: Yifan Zuo, Jiacheng Xie, Yuming Fang, Yan Huang, Wenhui Jiang

Abstract: A mainstream type of the state of the arts (SOTAs) based on convolutional neural network (CNN) for real image denoising contains two sub-problems, i.e., noise estimation and non-blind denoising. This paper considers real noise approximated by heteroscedastic Gaussian/Poisson Gaussian distributions with in-camera signal processing pipelines. The related works always exploit the estimated noise prio… ▽ More A mainstream type of the state of the arts (SOTAs) based on convolutional neural network (CNN) for real image denoising contains two sub-problems, i.e., noise estimation and non-blind denoising. This paper considers real noise approximated by heteroscedastic Gaussian/Poisson Gaussian distributions with in-camera signal processing pipelines. The related works always exploit the estimated noise prior via channel-wise concatenation followed by a convolutional layer with spatially sharing kernels. Due to the variable modes of noise strength and frequency details of all feature positions, this design cannot adaptively tune the corresponding denoising patterns. To address this problem, we propose a novel conditional filter in which the optimal kernels for different feature positions can be adaptively inferred by local features from the image and the noise map. Also, we bring the thought that alternatively performs noise estimation and non-blind denoising into CNN structure, which continuously updates noise prior to guide the iterative feature denoising. In addition, according to the property of heteroscedastic Gaussian distribution, a novel affine transform block is designed to predict the stationary noise component and the signal-dependent noise component. Compared with SOTAs, extensive experiments are conducted on five synthetic datasets and three real datasets, which shows the improvement of the proposed CFNet. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2210.02416 [pdf, other]

A deep learning model for brain vessel segmentation in 3DRA with arteriovenous malformations

Authors: Camila García, Yibin Fang, Jianmin Liu, Ana Paula Narata, José Ignacio Orlando, Ignacio Larrabide

Abstract: Segmentation of brain arterio-venous malformations (bAVMs) in 3D rotational angiographies (3DRA) is still an open problem in the literature, with high relevance for clinical practice. While deep learning models have been applied for segmenting the brain vasculature in these images, they have never been used in cases with bAVMs. This is likely caused by the difficulty to obtain sufficiently annotat… ▽ More Segmentation of brain arterio-venous malformations (bAVMs) in 3D rotational angiographies (3DRA) is still an open problem in the literature, with high relevance for clinical practice. While deep learning models have been applied for segmenting the brain vasculature in these images, they have never been used in cases with bAVMs. This is likely caused by the difficulty to obtain sufficiently annotated data to train these approaches. In this paper we introduce a first deep learning model for blood vessel segmentation in 3DRA images of patients with bAVMs. To this end, we densely annotated 5 3DRA volumes of bAVM cases and used these to train two alternative 3DUNet-based architectures with different segmentation objectives. Our results show that the networks reach a comprehensive coverage of relevant structures for bAVM analysis, much better than what is obtained using standard methods. This is promising for achieving a better topological and morphological characterisation of the bAVM structures of interest. Furthermore, the models have the ability to segment venous structures even when missing in the ground truth labelling, which is relevant for planning interventional treatments. Ultimately, these results could be used as more reliable first initial guesses, alleviating the cumbersome task of creating manual labels. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: 9 pages, 4 figures, submitted to SIPAIM 2022, to be published in the SPIE Digital Library

Showing 1–50 of 99 results for author: Fang, Y