Search | arXiv e-print repository

Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

Authors: Haojun Jiang, Zhenguo Sun, Ning Jia, Meng Li, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

Abstract: Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme… ▽ More Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe movement guidance to assist less experienced sonographers in conducting freehand echocardiography. This system can enable non-experts, especially in primary departments and medically underserved areas, to perform cardiac ultrasound examinations, potentially improving global healthcare delivery. The core innovation lies in proposing a data-driven world model, named Cardiac Dreamer, for representing cardiac spatial structures. This world model can provide structure features of any cardiac planes around the current probe position in the latent space, serving as an precise navigation map for autonomous plane localization. We train our model with real-world ultrasound data and corresponding probe motion from 110 routine clinical scans with 151K sample pairs by three certified sonographers. Evaluations on three standard planes with 37K sample pairs demonstrate that the world model can reduce navigation errors by up to 33\% and exhibit more stable performance. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Early Accepted by MICCAI 2024

arXiv:2405.14802 [pdf, other]

Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation

Authors: Hongxu Jiang, Muhammad Imran, Linhai Ma, Teng Zhang, Yuyin Zhou, Muxuan Liang, Kuang Gong, Wei Shao

Abstract: Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensio… ▽ More Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensionality of medical images, which are often 3D or 4D. Training a diffusion model on medical images typically takes days to weeks, while sampling each image volume takes minutes to hours. To address this challenge, we introduce Fast-DDPM, a simple yet effective approach capable of improving training speed, sampling speed, and generation quality simultaneously. Unlike DDPM, which trains the image denoiser across 1,000 time steps, Fast-DDPM trains and samples using only 10 time steps. The key to our method lies in aligning the training and sampling procedures to optimize time-step utilization. Specifically, we introduced two efficient noise schedulers with 10 time steps: one with uniform time step sampling and another with non-uniform sampling. We evaluated Fast-DDPM across three medical image-to-image generation tasks: multi-image super-resolution, image denoising, and image-to-image translation. Fast-DDPM outperformed DDPM and current state-of-the-art methods based on convolutional networks and generative adversarial networks in all tasks. Additionally, Fast-DDPM reduced the training time to 0.2x and the sampling time to 0.01x compared to DDPM. Our code is publicly available at: https://github.com/mirthAI/Fast-DDPM. △ Less

Submitted 23 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.00542 [pdf, other]

UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement

Authors: Ruiquan Ge, Zhaojie Fang, Pengxue Wei, Zhanghao Chen, Hongyang Jiang, Ahmed Elazab, Wangting Li, Xiang Wan, Shaochong Zhang, Changmiao Wang

Abstract: Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO… ▽ More Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO). To mitigate potential adverse effects associated with injections, researchers have proposed the development of cross-modality medical image generation algorithms capable of converting UWF-SLO images into their UWF-FA counterparts. Current image generation techniques applied to fundus photography encounter difficulties in producing high-resolution retinal images, particularly in capturing minute vascular lesions. To address these issues, we introduce a novel conditional generative adversarial network (UWAFA-GAN) to synthesize UWF-FA from UWF-SLO. This approach employs multi-scale generators and an attention transmit module to efficiently extract both global structures and local lesions. Additionally, to counteract the image blurriness issue that arises from training with misaligned data, a registration module is integrated within this framework. Our method performs non-trivially on inception scores and details generation. Clinical user studies further indicate that the UWF-FA images generated by UWAFA-GAN are clinically comparable to authentic images in terms of diagnostic reliability. Empirical evaluations on our proprietary UWF image datasets elucidate that UWAFA-GAN outperforms extant methodologies. The code is accessible at https://github.com/Tinysqua/UWAFA-GAN. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00130 [pdf, other]

A Flexible 2.5D Medical Image Segmentation Approach with In-Slice and Cross-Slice Attention

Authors: Amarjeet Kumar, Hongxu Jiang, Muhammad Imran, Cyndi Valdes, Gabriela Leon, Dahyun Kang, Parvathi Nataraj, Yuyin Zhou, Michael D. Weiss, Wei Shao

Abstract: Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is f… ▽ More Deep learning has become the de facto method for medical image segmentation, with 3D segmentation models excelling in capturing complex 3D structures and 2D models offering high computational efficiency. However, segmenting 2.5D images, which have high in-plane but low through-plane resolution, is a relatively unexplored challenge. While applying 2D models to individual slices of a 2.5D image is feasible, it fails to capture the spatial relationships between slices. On the other hand, 3D models face challenges such as resolution inconsistencies in 2.5D images, along with computational complexity and susceptibility to overfitting when trained with limited data. In this context, 2.5D models, which capture inter-slice correlations using only 2D neural networks, emerge as a promising solution due to their reduced computational demand and simplicity in implementation. In this paper, we introduce CSA-Net, a flexible 2.5D segmentation model capable of processing 2.5D images with an arbitrary number of slices through an innovative Cross-Slice Attention (CSA) module. This module uses the cross-slice attention mechanism to effectively capture 3D spatial information by learning long-range dependencies between the center slice (for segmentation) and its neighboring slices. Moreover, CSA-Net utilizes the self-attention mechanism to understand correlations among pixels within the center slice. We evaluated CSA-Net on three 2.5D segmentation tasks: (1) multi-class brain MRI segmentation, (2) binary prostate MRI segmentation, and (3) multi-class prostate MRI segmentation. CSA-Net outperformed leading 2D and 2.5D segmentation methods across all three tasks, demonstrating its efficacy and superiority. Our code is publicly available at https://github.com/mirthAI/CSA-Net. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2403.12781 [pdf, other]

Large-Scale RIS Enabled Air-Ground Channels: Near-Field Modeling and Analysis

Authors: Hao Jiang, Wangqi Shi, Zaichen Zhang, Cunhua Pan, Qingqing Wu, Feng Shu, Ruiqi Liu, Jiangzhou Wang

Abstract: Existing works mainly rely on the far-field planar-wave-based channel model to assess the performance of reconfigurable intelligent surface (RIS)-enabled wireless communication systems. However, when the transmitter and receiver are in near-field ranges, this will result in relatively low computing accuracy. To tackle this challenge, we initially develop an analytical framework for sub-array parti… ▽ More Existing works mainly rely on the far-field planar-wave-based channel model to assess the performance of reconfigurable intelligent surface (RIS)-enabled wireless communication systems. However, when the transmitter and receiver are in near-field ranges, this will result in relatively low computing accuracy. To tackle this challenge, we initially develop an analytical framework for sub-array partitioning. This framework divides the large-scale RIS array into multiple sub-arrays, effectively reducing modeling complexity while maintaining acceptable accuracy. Then, we develop a beam domain channel model based on the proposed sub-array partition framework for large-scale RIS-enabled UAV-to-vehicle communication systems, which can be used to efficiently capture the sparse features in RIS-enabled UAV-to-vehicle channels in both near-field and far-field ranges. Furthermore, some important propagation characteristics of the proposed channel model, including the spatial cross-correlation functions (CCFs), temporal auto-correlation functions (ACFs), frequency correlation functions (CFs), and channel capacities with respect to the different physical features of the RIS and non-stationary properties of the channel model are derived and analyzed. Finally, simulation results are provided to demonstrate that the proposed framework is helpful to achieve a good tradeoff between model complexity and accuracy for investigating the channel propagation characteristics, and therefore providing highly-efficient communications in RIS-enabled UAV-to-vehicle wireless networks. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.02632 [pdf, other]

doi 10.1109/JIOT.2023.3243944

Human Activity Recognition with Low-Resolution Infrared Array Sensor Using Semi-supervised Cross-domain Neural Networks for Indoor Environment

Authors: Cunyi Yin, Xiren Miao, **g Chen, Hao Jiang, Deying Chen, Yixuan Tong, Shaocong Zheng

Abstract: Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, dom… ▽ More Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, domain discriminator and label classifier. In the feature extractor, the unlabeled and minimal labeled target domain data are trained for domain adaptation to achieve a map** of the source domain and target domain data. The domain discriminator employs the unsupervised learning to migrate data from the source domain to the target domain. The label classifier obtained from training the source domain data improves the recognition of target domain activities due to the semi-supervised learning utilized in training the target domain data. Experimental results show that the proposed method achieves 92.12\% accuracy for recognition of activities in the target domain by migrating the source and target domains. The proposed approach adapts superior to cross-domain scenarios compared to the existing deep learning methods, and it provides a low-cost yet highly adaptable solution for cross-domain scenarios. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01913 [pdf, other]

doi 10.1109/JIOT.2024.3369856

PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station

Authors: Cunyi Yin, Xiren Miao, **g Chen, Hao Jiang, Jianfei Yang, Yunjiao Zhou, Min Wu, Zhenghua Chen

Abstract: Safety monitoring of power operations in power stations is crucial for preventing accidents and ensuring stable power supply. However, conventional methods such as wearable devices and video surveillance have limitations such as high cost, dependence on light, and visual blind spots. WiFi-based human pose estimation is a suitable method for monitoring power operations due to its low cost, device-f… ▽ More Safety monitoring of power operations in power stations is crucial for preventing accidents and ensuring stable power supply. However, conventional methods such as wearable devices and video surveillance have limitations such as high cost, dependence on light, and visual blind spots. WiFi-based human pose estimation is a suitable method for monitoring power operations due to its low cost, device-free, and robustness to various illumination conditions.In this paper, a novel Channel State Information (CSI)-based pose estimation framework, namely PowerSkel, is developed to address these challenges. PowerSkel utilizes self-developed CSI sensors to form a mutual sensing network and constructs a CSI acquisition scheme specialized for power scenarios. It significantly reduces the deployment cost and complexity compared to the existing solutions. To reduce interference with CSI in the electricity scenario, a sparse adaptive filtering algorithm is designed to preprocess the CSI. CKDformer, a knowledge distillation network based on collaborative learning and self-attention, is proposed to extract the features from CSI and establish the map** relationship between CSI and keypoints. The experiments are conducted in a real-world power station, and the results show that the PowerSkel achieves high performance with a PCK@50 of 96.27%, and realizes a significant visualization on pose estimation, even in dark environments. Our work provides a novel low-cost and high-precision pose estimation solution for power operation. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.15634 [pdf, other]

Sense-Then-Train: A Novel Beam Training Design for Near-Field MIMO Systems

Authors: Hao Jiang, Zhaolin Wang, Yuanwei Liu

Abstract: A novel sense-then-train (STT) scheme is proposed for beam training in near-field multiple-input multiple-output (MIMO) systems. Compared to conventional codebook-based schemes, the proposed STT scheme is capable of not only addressing the complex spherical-wave propagation but also effectively exploiting the additional degrees-of-freedoms (DoFs). The STT scheme is tailored for both single-beam an… ▽ More A novel sense-then-train (STT) scheme is proposed for beam training in near-field multiple-input multiple-output (MIMO) systems. Compared to conventional codebook-based schemes, the proposed STT scheme is capable of not only addressing the complex spherical-wave propagation but also effectively exploiting the additional degrees-of-freedoms (DoFs). The STT scheme is tailored for both single-beam and multi-beam cases. 1) For the single-beam case, the STT scheme first utilizes a sensing phase to estimate a low-dimensional representation of the near-field MIMO channel in the wavenumber domain. Then, in the subsequent training phase, an online learning algorithm is proposed to obtain the optimal beam pair without predefined codebooks or training datasets. 2) For the multi-beam case, based on the single-beam STT, a Gram-Schmidt method is further utilized to guarantee the orthogonality between beams in the training phase. Numerical results unveil that 1) the proposed STT scheme can significantly enhance the beam training performance in the near field compared to the conventional far-field codebook-based schemes, and 2) the proposed STT scheme can perform fast and low-complexity beam training, while achieving a near-optimal performance without full channel state information in both cases. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Submitted to IEEE

arXiv:2401.05363 [pdf, other]

Generalizable Sleep Staging via Multi-Level Domain Alignment

Authors: Jiquan Wang, Sha Zhao, Haiteng Jiang, Shijian Li, Tao Li, Gang Pan

Abstract: Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging wh… ▽ More Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. Inspired by existing domain generalization methods, we adopt the feature alignment idea and propose a framework called SleepDG to solve it. Considering both of local salient features and sequential features are important for sleep staging, we propose a Multi-level Feature Alignment combining epoch-level and sequence-level feature alignment to learn domain-invariant feature representations. Specifically, we design an Epoch-level Feature Alignment to align the feature distribution of each single sleep epoch among different domains, and a Sequence-level Feature Alignment to minimize the discrepancy of sequential features among different domains. SleepDG is validated on five public datasets, achieving the state-of-the-art performance. △ Less

Submitted 27 January, 2024; v1 submitted 13 December, 2023; originally announced January 2024.

Comments: Accepted by the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

arXiv:2401.03626 [pdf, other]

Hybrid Vector Message Passing for Generalized Bilinear Factorization

Authors: Hao Jiang, Xiaojun Yuan, Qinghua Guo

Abstract: In this paper, we propose a new message passing algorithm that utilizes hybrid vector message passing (HVMP) to solve the generalized bilinear factorization (GBF) problem. The proposed GBF-HVMP algorithm integrates expectation propagation (EP) and variational message passing (VMP) via variational free energy minimization, yielding tractable Gaussian messages. Furthermore, GBF-HVMP enables vector/m… ▽ More In this paper, we propose a new message passing algorithm that utilizes hybrid vector message passing (HVMP) to solve the generalized bilinear factorization (GBF) problem. The proposed GBF-HVMP algorithm integrates expectation propagation (EP) and variational message passing (VMP) via variational free energy minimization, yielding tractable Gaussian messages. Furthermore, GBF-HVMP enables vector/matrix variables rather than scalar ones in message passing, resulting in a loop-free Bayesian network that improves convergence. Numerical results show that GBF-HVMP significantly outperforms state-of-the-art methods in terms of NMSE performance and computational complexity. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.07911 [pdf]

Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Authors: Yuxi Li, Hongzhi Jiang, Huijie Zhao, Xudong Li

Abstract: We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which con… ▽ More We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which contain complete information for a projector camera pair, and is a 4D data set. However, the capture of LTC is generally time consuming. The 4D LTC in pPSI are reduced to projection functions, thereby enabling a highly efficient data capture process. We introduce the local maximum constraint, which provides constraint for the location of candidate correspondence matching points when projections are captured. Local slice extension (LSE) method is introduced to accelerate the capture of projection functions. Optimization is conducted for pPSI under several situations. The number of projection functions required for pPSI is optimized and the influence of capture ratio in LSE on the accuracy of the correspondence matching points is investigated. Discussions and experiments include two typical kinds of global illuminations: inter-reflections and subsurface scattering. The proposed method is validated with several challenging scenarios, and outperforms the state-of-the-art methods. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 21 pages,13 figures

arXiv:2312.05256 [pdf, other]

Holistic Evaluation of GPT-4V for Biomedical Imaging

Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2311.15292 [pdf, other]

Active-Sensing-Based Beam Alignment for Near Field MIMO Communications

Authors: Hao Jiang, Zhaolin Wang, Yuanwei Liu

Abstract: An active-sensing-based learning algorithm is proposed to solve the near-field beam alignment problem with the aid of wavenumber-domain transform matrices (WTMs). Specifically, WTMs can transform the antenna-domain channel into a sparse representation in the wavenumber domain. The dimensions of WTMs can be further reduced by exploiting the dominance of line-of-sight (LoS) links. By employing these… ▽ More An active-sensing-based learning algorithm is proposed to solve the near-field beam alignment problem with the aid of wavenumber-domain transform matrices (WTMs). Specifically, WTMs can transform the antenna-domain channel into a sparse representation in the wavenumber domain. The dimensions of WTMs can be further reduced by exploiting the dominance of line-of-sight (LoS) links. By employing these lower-dimensional WTMs as map** functions, the active-sensing-based algorithm is executed in the wavenumber domain, resulting in an acceleration of convergence. Compared with the codebook-based beam alignment methods, the proposed method finds the optimal beam pair in a **-pong fashion, thus avoiding high training overheads caused by beam swee**. Finally, the numerical results validate the effectiveness of the proposed method. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.08075 [pdf, ps, other]

GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy

Authors: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu

Abstract: Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis… ▽ More Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of microangioma lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze map, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting microangioma lesions. Finally, a domain knowledge filter refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i.e., IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and enhances segmentation performance through fine-tuning using annotations. This study highlights the potential of GlanceSeg-based annotations for self-model optimization, leading to enduring performance advancements through continual learning. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 12 pages, 10 figures

arXiv:2311.07096 [pdf, other]

Optimal Configuration of Reconfigurable Intelligent Surfaces with Arbitrary Discrete Phase Shifts

Authors: Seyedkhashayar Hashemi, Hai Jiang, Masoud Ardakani

Abstract: We address the reflection optimization problem for a reconfigurable intelligent surface (RIS), where the RIS elements feature a set of non-uniformly spaced discrete phase shifts. This is motivated by the actual behavior of practical RIS elements, where it is shown that a uniform phase shift assumption is not realistic. A problem is formulated to find the optimal refection amplitudes and reflection… ▽ More We address the reflection optimization problem for a reconfigurable intelligent surface (RIS), where the RIS elements feature a set of non-uniformly spaced discrete phase shifts. This is motivated by the actual behavior of practical RIS elements, where it is shown that a uniform phase shift assumption is not realistic. A problem is formulated to find the optimal refection amplitudes and reflection phase shifts of the RIS elements such that the channel capacity of the target user is maximized. We first prove that in the optimal configuration, each RIS element is either turned off or operates at maximum amplitude. We then develop a method that finds the optimal reflection amplitudes and phases with complexity linear in the number of RIS elements. Some new and interesting insight into the reflection optimization problem is also provided. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2309.16075 [pdf, other]

A review of variable-pitch propellers and their control strategies in aerospace systems

Authors: Hanjie Jiang, Ye Zhou, Hann Woei Ho

Abstract: The relentless pursuit of aircraft flight efficiency has thrust variable-pitch propeller technology into the forefront of aviation innovation. This technology, rooted in the ancient power unit of propellers, has found renewed significance, particularly in the realms of unmanned aerial vehicles and urban air mobility. This underscores the profound interplay between visionary aviation concepts and t… ▽ More The relentless pursuit of aircraft flight efficiency has thrust variable-pitch propeller technology into the forefront of aviation innovation. This technology, rooted in the ancient power unit of propellers, has found renewed significance, particularly in the realms of unmanned aerial vehicles and urban air mobility. This underscores the profound interplay between visionary aviation concepts and the enduring utility of propellers. Variable-pitch propellers are poised to be pivotal in sha** the future of human aviation, offering benefits such as extended endurance, enhanced maneuverability, improved fuel economy, and prolonged engine life. However, with additional capabilities come new technical challenges. The development of an online adaptive control of variable-pitch propellers that does not depend on an accurate dynamic model stands as a critical imperative. Therefore, a comprehensive review and forward-looking analysis of this technology is warranted. This paper introduces the development background of variable-pitch aviation propeller technology, encompassing diverse pitch angle adjustment schemes and their integration with various engine types. It places a central focus on the latest research frontiers and emerging directions in pitch control strategies. Lastly, it delves into the research domain of constant speed pitch control, articulating the three main challenges confronting this technology: inadequacies in system modeling, the intricacies of propeller-engine compatibility, and the impact of external, time-varying factors. By shedding light on these multifaceted aspects of variable-pitch propeller technology, this paper serves as a resource for aviation professionals and researchers navigating the intricate landscape of future aircraft development. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.12552 [pdf, other]

Adaptive Model Predictive Control for Engine-Driven Ducted Fan Lift Systems using an Associated Linear Parameter Varying Model

Authors: Hanjie Jiang, Ye Zhou, Hann Woei Ho, Wenjie Hu

Abstract: Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper prop… ▽ More Ducted fan lift systems (DFLSs) powered by two-stroke aviation piston engines present a challenging control problem due to their complex multivariable dynamics. Current controllers for these systems typically rely on proportional-integral algorithms combined with data tables, which rely on accurate models and are not adaptive to handle time-varying dynamics or system uncertainties. This paper proposes a novel adaptive model predictive control (AMPC) strategy with an associated linear parameter varying (LPV) model for controlling the engine-driven DFLS. This LPV model is derived from a global network model, which is trained off-line with data obtained from a general mean value engine model for two-stroke aviation engines. Different network models, including multi-layer perceptron, Elman, and radial basis function (RBF), are evaluated and compared in this study. The results demonstrate that the RBF model exhibits higher prediction accuracy and robustness in the DFLS application. Based on the trained RBF model, the proposed AMPC approach constructs an associated network that directly outputs the LPV model parameters as an adaptive, robust, and efficient prediction model. The efficiency of the proposed approach is demonstrated through numerical simulations of a vertical take-off thrust preparation process for the DFLS. The simulation results indicate that the proposed AMPC method can effectively control the DFLS thrust with a relative error below 3.5%. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.08313 [pdf, other]

ECPC-IDS:A benchmark endometrail cancer PET/CT image dataset for evaluation of semantic segmentation and detection of hypermetabolic regions

Authors: Dechao Tang, Tianming Du, Deguo Ma, Zhiyu Ma, Hongzan Sun, Marcin Grzegorzek, Huiyan Jiang, Chen Li

Abstract: Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving… ▽ More Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving the accuracy and objectivity of diagnosis, as well as reducing the workload of doctors. However, the absence of publicly available endometrial cancer image datasets restricts the application of computer-assisted diagnostic techniques.In this paper, a publicly available Endometrial Cancer PET/CT Image Dataset for Evaluation of Semantic Segmentation and Detection of Hypermetabolic Regions (ECPC-IDS) are published. Specifically, the segmentation section includes PET and CT images, with a total of 7159 images in multiple formats. In order to prove the effectiveness of segmentation methods on ECPC-IDS, five classical deep learning semantic segmentation methods are selected to test the image segmentation task. The object detection section also includes PET and CT images, with a total of 3579 images and XML files with annotation information. Six deep learning methods are selected for experiments on the detection task.This study conduct extensive experiments using deep learning-based semantic segmentation and object detection methods to demonstrate the differences between various methods on ECPC-IDS. As far as we know, this is the first publicly available dataset of endometrial cancer with a large number of multiple images, including a large amount of information required for image and target detection. ECPC-IDS can aid researchers in exploring new algorithms to enhance computer-assisted technology, benefiting both clinical doctors and patients greatly. △ Less

Submitted 11 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 14 pages,6 figures

arXiv:2307.16518 [pdf, other]

Continuous-Time Channel Prediction Based on Tensor Neural Ordinary Differential Equation

Authors: Mingyao Cui, Hao Jiang, Yuhao Chen, Yang Du, Linglong Dai

Abstract: Channel prediction is critical to address the channel aging issue in mobile scenarios. Existing channel prediction techniques are mainly designed for discrete channel prediction, which can only predict the future channel in a fixed time slot per frame, while the other intra-frame channels are usually recovered by interpolation. However, these approaches suffer from a serious interpolation loss, es… ▽ More Channel prediction is critical to address the channel aging issue in mobile scenarios. Existing channel prediction techniques are mainly designed for discrete channel prediction, which can only predict the future channel in a fixed time slot per frame, while the other intra-frame channels are usually recovered by interpolation. However, these approaches suffer from a serious interpolation loss, especially for mobile millimeter wave communications. To solve this challenging problem, we propose a tensor neural ordinary differential equation (TN-ODE) based continuous-time channel prediction scheme to realize the direct prediction of intra-frame channels. Specifically, inspired by the recently developed continuous map** model named neural ODE in the field of machine learning, we first utilize the neural ODE model to predict future continuous-time channels. To improve the channel prediction accuracy and reduce computational complexity, we then propose the TN-ODE scheme to learn the structural characteristics of the high-dimensional channel by low dimensional learnable transform. Simulation results show that the proposed scheme is able to achieve higher intra-frame channel prediction accuracy than existing schemes. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: A tensor neural ODE based method is proposed to predict continuous-time wireless channels

arXiv:2307.07995 [pdf, other]

Channel Modeling for Heterogeneous Vehicular ISAC System with Shared Clusters

Authors: Zaichen Zhang, Yingmeng Ge, Haibo Wang, Hao Jiang, Liang Wu, Ziyang Zhang

Abstract: In this paper, we consider the channel modeling of a heterogeneous vehicular integrated sensing and communication (ISAC) system, where a dual-functional multi-antenna base station (BS) intends to communicate with a multi-antenna vehicular receiver (MR) and sense the surrounding environments simultaneously. The time-varying complex channel impulse responses (CIRs) of the sensing and communication c… ▽ More In this paper, we consider the channel modeling of a heterogeneous vehicular integrated sensing and communication (ISAC) system, where a dual-functional multi-antenna base station (BS) intends to communicate with a multi-antenna vehicular receiver (MR) and sense the surrounding environments simultaneously. The time-varying complex channel impulse responses (CIRs) of the sensing and communication channels are derived, respectively, in which the sensing and communication channels are correlated with shared clusters. The proposed models show great generality for the capability in covering both monostatic and bistatic sensing scenarios, and as well for considering both static clusters/targets and mobile clusters/targets. Important channel statistical characteristics, including time-varying spatial cross-correlation function (CCF) and temporal auto-correlation function (ACF), are derived and analyzed. Numerically results are provided to show the propagation characteristics of the proposed ISAC channel model. Finally, the proposed model is validated via the agreement between theoretical and simulated as well as measurement results. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: 6 pages, 3 figures. This work has been submitted to IEEE for possible publication

arXiv:2307.01665 [pdf]

Multicarrier Modulation-Based Digital Radio-over-Fibre System Achieving Unequal Bit Protection with Over 10 dB SNR Gain

Authors: Yicheng Xu, Yixiao Zhu, Xiaobo Zeng, Mengfan Fu, Hexun Jiang, Lilin Yi, Weisheng Hu, Qunbi Zhuge

Abstract: We propose a multicarrier modulation-based digital radio-over-fibre system achieving unequal bit protection by bit and power allocation for subcarriers. A theoretical SNR gain of 16.1 dB is obtained in the AWGN channel and the simulation results show a 13.5 dB gain in the bandwidth-limited case. We propose a multicarrier modulation-based digital radio-over-fibre system achieving unequal bit protection by bit and power allocation for subcarriers. A theoretical SNR gain of 16.1 dB is obtained in the AWGN channel and the simulation results show a 13.5 dB gain in the bandwidth-limited case. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.04202 [pdf, other]

Video Compression with Arbitrary Rescaling Network

Authors: Mengxi Guo, Shijie Zhao, Hao Jiang, Junlin Li, Li Zhang

Abstract: Most video platforms provide video streaming services with different qualities, and the quality of the services is usually adjusted by the resolution of the videos. So high-resolution videos need to be downsampled for compression. In order to solve the problem of video coding at different resolutions, we propose a rate-guided arbitrary rescaling network (RARN) for video resizing before encoding. T… ▽ More Most video platforms provide video streaming services with different qualities, and the quality of the services is usually adjusted by the resolution of the videos. So high-resolution videos need to be downsampled for compression. In order to solve the problem of video coding at different resolutions, we propose a rate-guided arbitrary rescaling network (RARN) for video resizing before encoding. To help the RARN be compatible with standard codecs and generate compression-friendly results, an iteratively optimized transformer-based virtual codec (TVC) is introduced to simulate the key components of video encoding and perform bitrate estimation. By iteratively training the TVC and the RARN, we achieved 5%-29% BD-Rate reduction anchored by linear interpolation under different encoding configurations and resolutions, exceeding the previous methods on most test videos. Furthermore, the lightweight RARN structure can process FHD (1080p) content at real-time speed (91 FPS) and obtain a considerable rate reduction. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted as a one-page poster by 2023 Data Compression Conference (DCC). This is the full paper

arXiv:2306.02682 [pdf, other]

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Authors: Yukang Liang, Kaitao Song, Shaoguang Mao, Huiqiang Jiang, Luna Qiu, Yuqing Yang, Dongsheng Li, Linli Xu, Lili Qiu

Abstract: Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propo… ▽ More Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments. Therefore, to address this problem, we propose a simple yet effective method, namely \underline{M}asked pre-training for \underline{P}ronunciation \underline{A}ssessment (MPA). Specifically, by incorporating a mask-predict strategy, our MPA supports end-to-end training without leveraging any aligning components and can solve misalignment issues to a large extent during prediction. Furthermore, we design two evaluation strategies to enable our model to conduct assessments in both unsupervised and supervised settings. Experimental results on SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. In spite of this, MPA still has some limitations, such as requiring more inference time and reference text. They expect to be addressed in future work. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted by InterSpeech 2023

arXiv:2305.19956 [pdf, other]

doi 10.1016/j.compmedimag.2024.102326

MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images

Authors: Hongxu Jiang, Muhammad Imran, Preethika Muralidharan, Anjali Patel, Jake Pensa, Muxuan Liang, Tarik Benidir, Joseph R. Grajo, Jason P. Joseph, Russell Terry, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao

Abstract: Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging… ▽ More Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. This paper presents MicroSegNet, a multi-scale annotation-guided transformer UNet model designed specifically to tackle these challenges. During the training process, MicroSegNet focuses more on regions that are hard to segment (hard regions), characterized by discrepancies between expert and non-expert annotations. We achieve this by proposing an annotation-guided binary cross entropy (AG-BCE) loss that assigns a larger weight to prediction errors in hard regions and a lower weight to prediction errors in easy regions. The AG-BCE loss was seamlessly integrated into the training process through the utilization of multi-scale deep supervision, enabling MicroSegNet to capture global contextual dependencies and local information at various scales. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.939 and a Hausdorff distance of 2.02 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. Our code is publicly available at https://github.com/mirthAI/MicroSegNet and our dataset is publicly available at https://zenodo.org/records/10475293. △ Less

Submitted 25 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Journal ref: Computerized Medical Imaging and Graphics (2024): 102326

arXiv:2305.16049 [pdf, other]

CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition

Authors: Lantian Li, Xiaolou Li, Haoyu Jiang, Chen Chen, Ruihai Hou, Dong Wang

Abstract: Audio-visual person recognition (AVPR) has received extensive attention. However, most datasets used for AVPR research so far are collected in constrained environments, and thus cannot reflect the true performance of AVPR systems in real-world scenarios. To meet the request for research on AVPR in unconstrained conditions, this paper presents a multi-genre AVPR dataset collected `in the wild', nam… ▽ More Audio-visual person recognition (AVPR) has received extensive attention. However, most datasets used for AVPR research so far are collected in constrained environments, and thus cannot reflect the true performance of AVPR systems in real-world scenarios. To meet the request for research on AVPR in unconstrained conditions, this paper presents a multi-genre AVPR dataset collected `in the wild', named CN-Celeb-AV. This dataset contains more than 419k video segments from 1,136 persons from public media. In particular, we put more emphasis on two real-world complexities: (1) data in multiple genres; (2) segments with partial information. A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research. The dataset also involves a development set that can be used to boost the performance of AVPR systems in real-life situations. The dataset is free for researchers and can be downloaded from http://cnceleb.org/. △ Less

Submitted 28 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: INTERSPEECH 2023

arXiv:2305.14997 [pdf, other]

3GPP-Like GBSM THz Channel Modeling for Indoor Office and Urban Microcellular Scenarios

Authors: Zhaowei Chang, Jianhua Zhang, Pan Tang, Lei Tian, Hao Jiang, Ximan Liu, and Guangyi Liu

Abstract: Terahertz (THz) communication is envisioned as one of the possible technologies for the sixth-generation (6G) communication system due to its rich spectrum. To evaluate the performance of THz communication, it is essential to propose THz channel models within the common framework of the geometry-based stochastic model (GBSM) in the 3rd Generation Partnership Project (3GPP). This paper focuses on 3… ▽ More Terahertz (THz) communication is envisioned as one of the possible technologies for the sixth-generation (6G) communication system due to its rich spectrum. To evaluate the performance of THz communication, it is essential to propose THz channel models within the common framework of the geometry-based stochastic model (GBSM) in the 3rd Generation Partnership Project (3GPP). This paper focuses on 3GPP-like GBSM THz channel modeling based on channel measurements. We first present channel measurements at 100 GHz in an indoor office scenario and at 132 GHz in an urban microcellular scenario. Subsequently, channel characteristics such as PL, delay spread, angle spread, K-factor, cluster characteristic, cross-correlations, and correlation distance are obtained and analyzed using the measurement data. Additionally, statistical values of the channel characteristics are extracted based on the statistical distribution of 3GPP channel models, which can be used to reconstruct the channel impulse response (CIR). Furthermore, these obtained values are compared with the default values in the 3GPP channel model, revealing discrepancies that indicate the default values cannot accurately characterize the THz channel. For instance, for the case of line-of-sight links in the indoor office, the measured cluster number is 4 while the default value is 15. Finally, the channel capacity at THz frequency band is evaluated by the reconstructed CIRs generated by the GBSM using the measured statistical values and the 3GPP default values. It is observed that the 3GPP default values overestimate the THz channel capacity, equivalent to more than 10 bps/Hz larger at a signal-to-noise ratio of 30 dB. Overall, these findings are helpful in understanding and modeling the THz channel, facilitating the application of THz communication techniques for 6G. △ Less

Submitted 22 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2304.12719 [pdf, ps, other]

Eye tracking guided deep multiple instance learning with dual cross-attention for fundus disease detection

Authors: Hongyang Jiang, **gqi Huang, Chen Tang, Xiaoqing Zhang, Mengdi Gao, Jiang Liu

Abstract: Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, hel** ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by l… ▽ More Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, hel** ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by leveraging ophthalmologists' eye-tracking information, which is more efficient and accurate. Concretely, the HITL CAD system was implemented on the multiple instance learning (MIL), where eye-tracking gaze maps were beneficial to cherry-pick diagnosis-related instances. Furthermore, the dual-cross-attention MIL (DCAMIL) network was utilized to curb the adverse effects of noisy instances. Meanwhile, both sequence augmentation module and domain adversarial module were introduced to enrich and standardize instances in the training bag, respectively, thereby enhancing the robustness of our method. We conduct comparative experiments on our newly constructed datasets (namely, AMD-Gaze and DR-Gaze), respectively for the AMD and early DR detection. Rigorous experiments demonstrate the feasibility of our HITL CAD system and the superiority of the proposed DCAMIL, fully exploring the ophthalmologists' eye-tracking information. These investigations indicate that physicians' gaze maps, as medical prior knowledge, is potential to contribute to the CAD systems of clinical diseases. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 10 pages, 9 figures

MSC Class: none

arXiv:2303.16024 [pdf, other]

Egocentric Auditory Attention Localization in Conversations

Authors: Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

Abstract: In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others. Recognizing who somebody is listening to in a conversation is essential for develo** technologies that can understand social behavior and devices that can augment human hearing by amplifying particular sound source… ▽ More In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others. Recognizing who somebody is listening to in a conversation is essential for develo** technologies that can understand social behavior and devices that can augment human hearing by amplifying particular sound sources. The computer vision and audio research communities have made great strides towards recognizing sound sources and speakers in scenes. In this work, we take a step further by focusing on the problem of localizing auditory attention targets in egocentric video, or detecting who in a camera wearer's field of view they are listening to. To tackle the new and challenging Selective Auditory Attention Localization problem, we propose an end-to-end deep learning approach that uses egocentric video and multichannel audio to predict the heatmap of the camera wearer's auditory attention. Our approach leverages spatiotemporal audiovisual features and holistic reasoning about the scene to make predictions, and outperforms a set of baselines on a challenging multi-speaker conversation dataset. Project page: https://fkryan.github.io/saal △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2302.08650 [pdf, other]

Gaussian-smoothed Imbalance Data Improves Speech Emotion Recognition

Authors: Xuefeng Liang, Hexin Jiang, Wenxin Xu, Ying Zhou

Abstract: In speech emotion recognition tasks, models learn emotional representations from datasets. We find the data distribution in the IEMOCAP dataset is very imbalanced, which may harm models to learn a better representation. To address this issue, we propose a novel Pairwise-emotion Data Distribution Smoothing (PDDS) method. PDDS considers that the distribution of emotional data should be smooth in rea… ▽ More In speech emotion recognition tasks, models learn emotional representations from datasets. We find the data distribution in the IEMOCAP dataset is very imbalanced, which may harm models to learn a better representation. To address this issue, we propose a novel Pairwise-emotion Data Distribution Smoothing (PDDS) method. PDDS considers that the distribution of emotional data should be smooth in reality, then applies Gaussian smoothing to emotion-pairs for constructing a new training set with a smoother distribution. The required new data are complemented using the mixup augmentation. As PDDS is model and modality agnostic, it is evaluated with three SOTA models on the IEMOCAP dataset. The experimental results show that these models are improved by 0.2\% - 4.8\% and 1.5\% - 5.9\% in terms of WA and UA. In addition, an ablation study demonstrates that the key advantage of PDDS is the reasonable data distribution rather than a simple data augmentation. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 5 pages

arXiv:2301.10624 [pdf, ps, other]

Energy-Delay Tradeoff in Helper-Assisted NOMA-MEC Systems: A Four-Sided Matching Algorithm

Authors: Mengmeng Ren, Long Yang, Hai Jiang, Jian Chen, Yuchen Zhou

Abstract: This paper designs a helper-assisted resource allocation strategy in non-orthogonal multiple access (NOMA)-enabled mobile edge computing (MEC) systems, in order to guarantee the quality of service (QoS) of the energy/delay-sensitive user equipments (UEs). To achieve a tradeoff between the energy consumption and the delay, we introduce a novel performance metric, called \emph{energy-delay tradeoff}… ▽ More This paper designs a helper-assisted resource allocation strategy in non-orthogonal multiple access (NOMA)-enabled mobile edge computing (MEC) systems, in order to guarantee the quality of service (QoS) of the energy/delay-sensitive user equipments (UEs). To achieve a tradeoff between the energy consumption and the delay, we introduce a novel performance metric, called \emph{energy-delay tradeoff}, which is defined as the weighted sum of energy consumption and delay. The joint optimization of user association, resource block (RB) assignment, power allocation, task assignment, and computation resource allocation is formulated as a mixed-integer nonlinear programming problem with the aim of minimizing the maximal energy-delay tradeoff. Due to the non-convexity of the formulated problem with coupled and 0-1 variables, this problem cannot be directly solved with polynomial complexity. To tackle this challenge, we first decouple the formulated problem into a power allocation, task assignment and computation resource allocation (PATACRA) subproblem. Then, with the solution obtained from the PATACRA subproblem, we equivalently reformulate the original problem as a discrete user association and RB assignment (DUARA) problem. For the PATACRA subproblem, an iterative parametric convex approximation (IPCA) algorithm is proposed. Then, based on the solution obtained from the PATACRA subproblem, we first model the DUARA problem as a four-sided matching problem, and then propose a low-complexity four-sided UE-RB-helper-server matching (FS-URHSM) algorithm. Theoretical analysis demonstrates that the proposed algorithms are guaranteed to converge to stable solutions with polynomial complexity. Finally, simulation results are provided to show the superior performance of our proposed algorithm in terms of the energy consumption and the delay. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.02184 [pdf, other]

Chat2Map: Efficient Scene Map** from Multi-Ego Conversations

Authors: Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu

Abstract: Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way? We seek to answer this question by proposing a new problem: efficiently building the map of a previously unseen 3D environment by exploiting shared information in the egocentric audio-visual observations of participants in a natural conversation. Our hypothesis is that as multi… ▽ More Can conversational videos captured from multiple egocentric viewpoints reveal the map of a scene in a cost-efficient way? We seek to answer this question by proposing a new problem: efficiently building the map of a previously unseen 3D environment by exploiting shared information in the egocentric audio-visual observations of participants in a natural conversation. Our hypothesis is that as multiple people ("egos") move in a scene and talk among themselves, they receive rich audio-visual cues that can help uncover the unseen areas of the scene. Given the high cost of continuously processing egocentric visual streams, we further explore how to actively coordinate the sampling of visual information, so as to minimize redundancy and reduce power use. To that end, we present an audio-visual deep reinforcement learning approach that works with our shared scene mapper to selectively turn on the camera to efficiently chart out the space. We evaluate the approach using a state-of-the-art audio-visual simulator for 3D scenes as well as real-world video. Our model outperforms previous state-of-the-art map** methods, and achieves an excellent cost-accuracy tradeoff. Project: http://vision.cs.utexas.edu/projects/chat2map. △ Less

Submitted 20 April, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: Accepted to CVPR 2023

arXiv:2212.08653 [pdf, other]

Attentive Mask CLIP

Authors: Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang

Abstract: Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incor… ▽ More Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incorrect pairing target in CLIP training. To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description. The correlation scores are computed in an online fashion using the EMA version of the visual encoder. Our experiments show that the proposed attentive masking approach performs better than the previous method of random token removal for CLIP training. The approach also makes it efficient to apply multiple augmentation views to the image, as well as introducing instance contrastive learning tasks between these views into the CLIP framework. Compared to other CLIP improvements that combine different pre-training targets such as SLIP and MaskCLIP, our method is not only more effective, but also much more efficient. Specifically, using ViT-B and YFCC-15M dataset, our approach achieves $43.9\%$ top-1 accuracy on ImageNet-1K zero-shot classification, as well as $62.7/42.1$ and $38.0/23.2$ I2T/T2I retrieval accuracy on Flickr30K and MS COCO, which are $+1.1\%$, $+5.5/+0.9$, and $+4.4/+1.3$ higher than the SLIP method, while being $2.30\times$ faster. An efficient version of our approach running $1.16\times$ faster than the plain CLIP model achieves significant gains of $+5.3\%$, $+11.3/+8.0$, and $+9.5/+4.9$ on these benchmarks. △ Less

Submitted 9 October, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2771-2781

arXiv:2212.07028 [pdf, other]

doi 10.1109/JSAC.2023.3240788

Rate-Splitting Multiple Access for Uplink Massive MIMO With Electromagnetic Exposure Constraints

Authors: Hanyu Jiang, Li You, Ahmed Elzanaty, Jue Wang, Wen** Wang, Xiqi Gao, Mohamed-Slim Alouini

Abstract: Over the past few years, the prevalence of wireless devices has become one of the essential sources of electromagnetic (EM) radiation to the public. Facing with the swift development of wireless communications, people are skeptical about the risks of long-term exposure to EM radiation. As EM exposure is required to be restricted at user terminals, it is inefficient to blindly decrease the transmit… ▽ More Over the past few years, the prevalence of wireless devices has become one of the essential sources of electromagnetic (EM) radiation to the public. Facing with the swift development of wireless communications, people are skeptical about the risks of long-term exposure to EM radiation. As EM exposure is required to be restricted at user terminals, it is inefficient to blindly decrease the transmit power, which leads to limited spectral efficiency and energy efficiency (EE). Recently, rate-splitting multiple access (RSMA) has been proposed as an effective way to provide higher wireless transmission performance, which is a promising technology for future wireless communications. To this end, we propose using RSMA to increase the EE of massive MIMO uplink while limiting the EM exposure of users. In particularly, we investigate the optimization of the transmit covariance matrices and decoding order using statistical channel state information (CSI). The problem is formulated as non-convex mixed integer program, which is in general difficult to handle. We first propose a modified water-filling scheme to obtain the transmit covariance matrices with fixed decoding order. Then, a greedy approach is proposed to obtain the decoding permutation. Numerical results verify the effectiveness of the proposed EM exposure-aware EE maximization scheme for uplink RSMA. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: to appear in IEEE Journal on Selected Areas in Communications

Journal ref: IEEE Journal on Selected Areas in Communications, vol. 41, no. 5, pp. 1383-1397, May 2023

arXiv:2211.09332 [pdf]

iNavFIter-M: Matrix Formulation of Functional Iteration for Inertial Navigation Computation

Authors: Hongyan Jiang, Maoran Zhu, Yanyan Fu, Yuanxin Wu

Abstract: The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper… ▽ More The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper proposes a computationally efficient matrix formulation of the functional iteration approach, named the iNavFIter-M. The Chebyshev polynomial coefficients in two consecutive iterations are explicitly connected through the matrix formulation, in contrast to the implicit iterative relationship in the original iNavFIter. By so doing, it allows a straightforward algorithmic implementation and a number of matrix factors can be pre-calculated for more efficient computation. Numerical results demonstrate that the proposed iNavFIter-M algorithm is able to achieve the same high computation accuracy as the original iNavFIter does, at the computational cost comparable to the typical two-sample algorithm. The iNavFIter-M algorithm is also implemented on a FPGA board to demonstrate its potential in real time applications. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 30 pages, 7 figures

arXiv:2209.15351 [pdf]

Efficient Ambient LoRa Backscatter with On-Off Keying Modulation

Authors: Xiuzhen Guo, Longfei Shangguan, Yuan He, Jia Zhang, Haotian Jiang, Awais Ahmad Siddiqi, Yunhao Liu

Abstract: Backscatter communication holds potential for ubiquitous and low-cost connectivity among low-power IoT devices. To avoid interference between the carrier signal and the backscatter signal, recent works propose a frequency-shifting technique to separate these two signals in the frequency domain. Such proposals, however, have to occupy the precious wireless spectrum that is already overcrowded, and… ▽ More Backscatter communication holds potential for ubiquitous and low-cost connectivity among low-power IoT devices. To avoid interference between the carrier signal and the backscatter signal, recent works propose a frequency-shifting technique to separate these two signals in the frequency domain. Such proposals, however, have to occupy the precious wireless spectrum that is already overcrowded, and increase the power, cost, and complexity of the backscatter tag. In this paper, we revisit the classic ON-OFF Keying (OOK) modulation and propose Aloba, a backscatter system that takes the ambient LoRa transmissions as the excitation and piggybacks the in-band OOK modulated signals over the LoRa transmissions. Our design enables the backsactter signal to work in the same frequency band of the carrier signal, meanwhile achieving flexible data rate at different transmission range. The key contributions of Aloba include: (1) the design of a low-power backscatter tag that can pick up the ambient LoRa signals from other signals. (2) a novel decoding algorithm to demodulate both the carrier signal and the backscatter signal from their superposition. We further adopt link coding mechanism and interleave operation to enhance the reliability of backscatter signal decoding. We implement Aloba and conduct head-to-head comparison with the state-of-the-art LoRa backscatter system PLoRa in various settings. The experiment results show Aloba can achieve 199.4 Kbps data rate at various distances, 52.4 times higher than PLoRa. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.15348 [pdf]

Saiyan: Design and Implementation of a Low-power Demodulator for LoRa Backscatter Systems

Authors: Xiuzhen Guo, Longfei Shangguan, Yuan He, Nan **g, Jiacheng Zhang, Haotian Jiang, Yunhao Liu

Abstract: The radio range of backscatter systems continues growing as new wireless communication primitives are continuously invented. Nevertheless, both the bit error rate and the packet loss rate of backscatter signals increase rapidly with the radio range, thereby necessitating the cooperation between the access point and the backscatter tags through a feedback loop. Unfortunately, the low-power nature o… ▽ More The radio range of backscatter systems continues growing as new wireless communication primitives are continuously invented. Nevertheless, both the bit error rate and the packet loss rate of backscatter signals increase rapidly with the radio range, thereby necessitating the cooperation between the access point and the backscatter tags through a feedback loop. Unfortunately, the low-power nature of backscatter tags limits their ability to demodulate feedback signals from a remote access point and scales down to such circumstances. This paper presents Saiyan, an ultra-low-power demodulator for long-range LoRa backscatter systems. With Saiyan, a backscatter tag can demodulate feedback signals from a remote access point with moderate power consumption and then perform an immediate packet retransmission in the presence of packet loss. Moreover, Saiyan enables rate adaption and channel hop**-two PHY-layer operations that are important to channel efficiency yet unavailable on long-range backscatter systems. We prototype Saiyan on a two-layer PCB board and evaluate its performance in different environments. Results show that Saiyan achieves 5 gain on the demodulation range, compared with state-of-the-art systems. Our ASIC simulation shows that the power consumption of Saiyan is around 93.2 uW. Code and hardware schematics can be found at: https://github.com/ZangJac/Saiyan. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.06779 [pdf, ps, other]

Efficient Planar Pose Estimation via UWB Measurements

Authors: Haodong Jiang, Wentao Wang, Yuan Shen, Xinghan Li, Xiaoqiang Ren, Biqiang Mu, Junfeng Wu

Abstract: State estimation is an essential part of autonomous systems. Integrating the Ultra-Wideband(UWB) technique has been shown to correct the long-term estimation drift and bypass the complexity of loop closure detection. However, few works on robotics adopt UWB as a stand-alone state estimation solution. The primary purpose of this work is to investigate planar pose estimation using only UWB range mea… ▽ More State estimation is an essential part of autonomous systems. Integrating the Ultra-Wideband(UWB) technique has been shown to correct the long-term estimation drift and bypass the complexity of loop closure detection. However, few works on robotics adopt UWB as a stand-alone state estimation solution. The primary purpose of this work is to investigate planar pose estimation using only UWB range measurements and study the estimator's statistical efficiency. We prove the excellent property of a two-step scheme, which says that we can refine a consistent estimator to be asymptotically efficient by one step of Gauss-Newton iteration. Grounded on this result, we design the GN-ULS estimator and evaluate it through simulations and collected datasets. GN-ULS attains millimeter and sub-degree level accuracy on our static datasets and attains centimeter and degree level accuracy on our dynamic datasets, presenting the possibility of using only UWB for real-time state estimation. △ Less

Submitted 27 February, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Update the content and improve consistency with the ICRA version

arXiv:2208.05122 [pdf, other]

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

Authors: Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Li** Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu

Abstract: Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-lang… ▽ More Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-language pathologists to make precise diagnoses. Existing methods for hypernasality estimation only conduct acoustic analysis based on low-resource cleft palate dataset, by using statistical or neural network-based features. In this paper, we propose a novel approach that uses automatic speech recognition model to improve hypernasality estimation. Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation. Benefiting from such design, our model for hypernasality estimation can enjoy the advantages of ASR model: 1) compared with low-resource cleft palate dataset, the ASR task usually includes large-scale speech data in the general domain, which enables better model generalization; 2) the text annotations in ASR dataset guide model to extract better acoustic features. Experimental results on two cleft palate datasets demonstrate that our method achieves superior performance compared with previous approaches. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted by InterSpeech 2022

arXiv:2207.05706 [pdf]

doi 10.1109/JLT.2022.3211869

Optical Field Recovery in Jones Space

Authors: Qi Wu, Yixiao Zhu, Hexun Jiang, Qunbi Zhuge, Weisheng Hu

Abstract: Optical full-field recovery makes it possible to compensate for fiber impairments such as chromatic dispersion and polarization mode dispersion (PMD) in the digital signal processing. For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the… ▽ More Optical full-field recovery makes it possible to compensate for fiber impairments such as chromatic dispersion and polarization mode dispersion (PMD) in the digital signal processing. For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the spectral efficiency for larger potential capacity. Polarization division multiplexing (PDM) can further double both the spectral efficiency and the system capacity of these SP carrier-assisted direct detection (DD) schemes. However, the so-called polarization fading phenomenon induced by random polarization rotation is a fundamental obstacle which prevents SP carrier-assisted DD systems from polarization diversity. In this paper, we propose a receiver of Jones-space field recovery (JSFR) to realize polarization diversity with SP carrier-assisted DD schemes in Jones space. Different receiver structures and simplified recovery procedures for JSFR are explored theoretically. The proposed JSFR pushes the SP DD schemes towards PDM without extra optical signal-to-noise ratio (OSNR) penalty. In addition, the JSFR shows good tolerance to PMD since the optical field recovery is conducted before polarization recovery. In the concept-of-proof experiment, we demonstrate 448-Gb/s reception over 80-km single-mode fiber using the proposed JSFR based on 22 couplers. Furthermore, we qualitatively compare the optical field recovery in Jones space and Stokes space from the perspective of the modulation dimension. Qualitatively, we compare the optical field recovery in the Jones space and Stokes space from the perspective of the modulation dimension. △ Less

Submitted 13 July, 2022; v1 submitted 22 June, 2022; originally announced July 2022.

Comments: 8 pages and 9 figures

arXiv:2207.00215

Polarized Color Image Denoising using Pocoformer

Authors: Zhuoxiao Li, Haiyang Jiang, Yinqiang Zheng

Abstract: Polarized color photography provides both visual textures and object surficial information in one single snapshot. However, the use of the directional polarizing filter array causes extremely lower photon count and SNR compared to conventional color imaging. Thus, the feature essentially leads to unpleasant noisy images and destroys polarization analysis performance. It is a challenge for traditio… ▽ More Polarized color photography provides both visual textures and object surficial information in one single snapshot. However, the use of the directional polarizing filter array causes extremely lower photon count and SNR compared to conventional color imaging. Thus, the feature essentially leads to unpleasant noisy images and destroys polarization analysis performance. It is a challenge for traditional image processing pipelines owing to the fact that the physical constraints exerted implicitly in the channels are excessively complicated. To address this issue, we propose a learning-based approach to simultaneously restore clean signals and precise polarization information. A real-world polarized color image dataset of paired raw short-exposed noisy and long-exposed reference images are captured to support the learning-based pipeline. Moreover, we embrace the development of vision Transformer and propose a hybrid transformer model for the Polarized Color image denoising, namely PoCoformer, for a better restoration performance. Abundant experiments demonstrate the effectiveness of proposed method and key factors that affect results are analyzed. △ Less

Submitted 1 March, 2023; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: New version is accpeted by CVPR 2023 and great modifications are taken

arXiv:2206.04992 [pdf, other]

Artificial Intelligence Enabled NOMA Towards Next Generation Multiple Access

Authors: Xiaoxia Xu, Yuanwei Liu, Xidong Mu, Qimei Chen, Hao Jiang, Zhiguo Ding

Abstract: This article focuses on the application of artificial intelligence (AI) in non-orthogonal multiple-access (NOMA), which aims to achieve automated, adaptive, and high-efficiency multi-user communications towards next generation multiple access (NGMA). First, the limitations of current scenario-specific multiple-antenna NOMA schemes are discussed, and the importance of AI for NGMA is highlighted. Th… ▽ More This article focuses on the application of artificial intelligence (AI) in non-orthogonal multiple-access (NOMA), which aims to achieve automated, adaptive, and high-efficiency multi-user communications towards next generation multiple access (NGMA). First, the limitations of current scenario-specific multiple-antenna NOMA schemes are discussed, and the importance of AI for NGMA is highlighted. Then, to achieve the vision of NGMA, a novel cluster-free NOMA framework is proposed for providing scenario-adaptive NOMA communications, and several promising machine learning solutions are identified. To elaborate further, novel centralized and distributed machine learning paradigms are conceived for efficiently employing the proposed cluster-free NOMA framework in single-cell and multi-cell networks, where numerical results are provided to demonstrate the effectiveness. Furthermore, the interplays between the proposed cluster-free NOMA and emerging wireless techniques are presented. Finally, several open research issues of AI enabled NGMA are discussed. △ Less

Submitted 13 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: This article has been accepted by IEEE Wireless Communications Magazine

arXiv:2206.04186 [pdf, other]

Reinforced Inverse Scattering

Authors: Hanyang Jiang, Yuehaw Khoo, Haizhao Yang

Abstract: Inverse wave scattering aims at determining the properties of an object using data on how the object scatters incoming waves. In order to collect information, sensors are put in different locations to send and receive waves from each other. The choice of sensor positions and incident wave frequencies determines the reconstruction quality of scatterer properties. This paper introduces reinforcement… ▽ More Inverse wave scattering aims at determining the properties of an object using data on how the object scatters incoming waves. In order to collect information, sensors are put in different locations to send and receive waves from each other. The choice of sensor positions and incident wave frequencies determines the reconstruction quality of scatterer properties. This paper introduces reinforcement learning to develop precision imaging that decides sensor positions and wave frequencies adaptive to different scatterers in an intelligent way, thus obtaining a significant improvement in reconstruction quality with limited imaging resources. Extensive numerical results will be provided to demonstrate the superiority of the proposed method over existing methods. △ Less

Submitted 2 November, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

MSC Class: 68Txx; 49MXX; 65N21

arXiv:2206.01408 [pdf, other]

MetaLR: Meta-tuning of Learning Rates for Transfer Learning in Medical Imaging

Authors: Yixiong Chen, Li Liu, **gxian Li, Hua Jiang, Chris Ding, Zongwei Zhou

Abstract: In medical image analysis, transfer learning is a powerful method for deep neural networks (DNNs) to generalize well on limited medical data. Prior efforts have focused on develo** pre-training algorithms on domains such as lung ultrasound, chest X-ray, and liver CT to bridge domain gaps. However, we find that model fine-tuning also plays a crucial role in adapting medical knowledge to target ta… ▽ More In medical image analysis, transfer learning is a powerful method for deep neural networks (DNNs) to generalize well on limited medical data. Prior efforts have focused on develo** pre-training algorithms on domains such as lung ultrasound, chest X-ray, and liver CT to bridge domain gaps. However, we find that model fine-tuning also plays a crucial role in adapting medical knowledge to target tasks. The common fine-tuning method is manually picking transferable layers (e.g., the last few layers) to update, which is labor-expensive. In this work, we propose a meta-learning-based LR tuner, named MetaLR, to make different layers automatically co-adapt to downstream tasks based on their transferabilities across domains. MetaLR learns appropriate LRs for different layers in an online manner, preventing highly transferable layers from forgetting their medical representation abilities and driving less transferable layers to adapt actively to new domains. Extensive experiments on various medical applications show that MetaLR outperforms previous state-of-the-art (SOTA) fine-tuning strategies. Codes are released. △ Less

Submitted 29 May, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: MICCAI 2023

arXiv:2205.04765 [pdf, other]

doi 10.1109/JSTSP.2022.3174701

Hybrid RIS and DMA Assisted Multiuser MIMO Uplink Transmission With Electromagnetic Exposure Constraints

Authors: Hanyu Jiang, Li You, Jue Wang, Wen** Wang, Xiqi Gao

Abstract: In the fifth-generation and beyond era, reconfigurable intelligent surface (RIS) and dynamic metasurface antennas (DMAs) are emerging metamaterials kee** up with the demand for high-quality wireless communication services, which promote the diversification of portable wireless terminals. However, along with the rapid expansion of wireless devices, the electromagnetic (EM) radiation increases unc… ▽ More In the fifth-generation and beyond era, reconfigurable intelligent surface (RIS) and dynamic metasurface antennas (DMAs) are emerging metamaterials kee** up with the demand for high-quality wireless communication services, which promote the diversification of portable wireless terminals. However, along with the rapid expansion of wireless devices, the electromagnetic (EM) radiation increases unceasingly and inevitably affects public health, which requires a limited exposure level in the transmission design. To reduce the EM radiation and preserve the quality of communication service, we investigate the spectral efficiency (SE) maximization with EM constraints for uplink transmission in hybrid RIS and DMA assisted multiuser multiple-input multiple-output systems. Specifically, alternating optimization is adopted to optimize the transmit covariance, RIS phase shift, and DMA weight matrices. We first figure out the water-filling solutions of transmit covariance matrices with given RIS and DMA parameters. Then, the RIS phase shift matrix is optimized via the weighted minimum mean square error, block coordinate descent and minorization-maximization methods. Furthermore, we solve the unconstrainted DMA weight matrix optimization problem in closed form and then design the DMA weight matrix to approach this performance under DMA constraints. Numerical results confirm the effectiveness of the EM aware SE maximization transmission scheme over the conventional baselines. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: 14 pages, 6 figures

Journal ref: IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 5, pp. 1055-1069, Aug. 2022

arXiv:2205.04120 [pdf, other]

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

Authors: Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang

Abstract: Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past… ▽ More Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: ACL 2022 camera ready

arXiv:2204.11840 [pdf, other]

Dynamic Ensemble Bayesian Filter for Robust Control of a Human Brain-machine Interface

Authors: Yu Qi, Xinyun Zhu, Kedi Xu, Feixiao Ren, Hongjie Jiang, Junming Zhu, Jianmin Zhang, Gang Pan, Yueming Wang

Abstract: Objective: Brain-machine interfaces (BMIs) aim to provide direct brain control of devices such as prostheses and computer cursors, which have demonstrated great potential for mobility restoration. One major limitation of current BMIs lies in the unstable performance in online control due to the variability of neural signals, which seriously hinders the clinical availability of BMIs. Method: To dea… ▽ More Objective: Brain-machine interfaces (BMIs) aim to provide direct brain control of devices such as prostheses and computer cursors, which have demonstrated great potential for mobility restoration. One major limitation of current BMIs lies in the unstable performance in online control due to the variability of neural signals, which seriously hinders the clinical availability of BMIs. Method: To deal with the neural variability in online BMI control, we propose a dynamic ensemble Bayesian filter (DyEnsemble). DyEnsemble extends Bayesian filters with a dynamic measurement model, which adjusts its parameters in time adaptively with neural changes. This is achieved by learning a pool of candidate functions and dynamically weighting and assembling them according to neural signals. In this way, DyEnsemble copes with variability in signals and improves the robustness of online control. Results: Online BMI experiments with a human participant demonstrate that, compared with the velocity Kalman filter, DyEnsemble significantly improves the control accuracy (increases the success rate by 13.9% and reduces the reach time by 13.5% in the random target pursuit task) and robustness (performs more stably over different experiment days). Conclusion: Our results demonstrate the superiority of DyEnsemble in online BMI control. Significance: DyEnsemble frames a novel and flexible framework for robust neural decoding, which is beneficial to different neural decoding applications. △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2204.02839 [pdf]

CCAT-NET: A Novel Transformer Based Semi-supervised Framework for Covid-19 Lung Lesion Segmentation

Authors: Mingyang Liu, Li Xiao, Huiqin Jiang, Qing He

Abstract: The spread of the novel coronavirus disease 2019 (COVID-19) has claimed millions of lives. Automatic segmentation of lesions from CT images can assist doctors with screening, treatment, and monitoring. However, accurate segmentation of lesions from CT images can be very challenging due to data and model limitations. Recently, Transformer-based networks have attracted a lot of attention in the area… ▽ More The spread of the novel coronavirus disease 2019 (COVID-19) has claimed millions of lives. Automatic segmentation of lesions from CT images can assist doctors with screening, treatment, and monitoring. However, accurate segmentation of lesions from CT images can be very challenging due to data and model limitations. Recently, Transformer-based networks have attracted a lot of attention in the area of computer vision, as Transformer outperforms CNN at a bunch of tasks. In this work, we propose a novel network structure that combines CNN and Transformer for the segmentation of COVID-19 lesions. We further propose an efficient semi-supervised learning framework to address the shortage of labeled data. Extensive experiments showed that our proposed network outperforms most existing networks and the semi-supervised learning framework can outperform the base network by 3.0% and 8.2% in terms of Dice coefficient and sensitivity. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2202.11279 [pdf]

An End-to-End Cascaded Image Deraining and Object Detection Neural Network

Authors: Kaige Wang, Tianming Wang, Jianchuang Qu, Huatao Jiang, Qing Li, Lin Chang

Abstract: While the deep learning-based image deraining methods have made great progress in recent years, there are two major shortcomings in their application in real-world situations. Firstly, the gap between the low-level vision task represented by rain removal and the high-level vision task represented by object detection is significant, and the low-level vision task can hardly contribute to the high-le… ▽ More While the deep learning-based image deraining methods have made great progress in recent years, there are two major shortcomings in their application in real-world situations. Firstly, the gap between the low-level vision task represented by rain removal and the high-level vision task represented by object detection is significant, and the low-level vision task can hardly contribute to the high-level vision task. Secondly, the quality of the deraining dataset needs to be improved. In fact, the rain lines in many baselines have a large gap with the real rain lines, and the resolution of the deraining dataset images is generally not ideally. Meanwhile, there are few common datasets for both the low-level vision task and the high-level vision task. In this paper, we explore the combination of the low-level vision task with the high-level vision task. Specifically, we propose an end-to-end object detection network for reducing the impact of rainfall, which consists of two cascaded networks, an improved image deraining network and an object detection network, respectively. We also design the components of the loss function to accommodate the characteristics of the different sub-networks. We then propose a dataset based on the KITTI dataset for rainfall removal and object detection, on which our network surpasses the state-of-the-art with a significant improvement in metrics. Besides, our proposed network is measured on driving videos collected by self-driving vehicles and shows positive results for rain removal and object detection. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2202.05126 [pdf, other]

doi 10.1016/j.media.2022.102691

Deep Learning for Computational Cytology: A Survey

Authors: Hao Jiang, Yanning Zhou, Yi Lin, Ronald CK Chan, Jiang Liu, Hao Chen

Abstract: Computational cytology is a critical, rapid-develo**, yet challenging topic in the field of medical image computing which analyzes the digitized cytology image by computer-aided technologies for cancer screening. Recently, an increasing number of deep learning (DL) algorithms have made significant progress in medical image analysis, leading to the boosting publications of cytological studies. To… ▽ More Computational cytology is a critical, rapid-develo**, yet challenging topic in the field of medical image computing which analyzes the digitized cytology image by computer-aided technologies for cancer screening. Recently, an increasing number of deep learning (DL) algorithms have made significant progress in medical image analysis, leading to the boosting publications of cytological studies. To investigate the advanced methods and comprehensive applications, we survey more than 120 publications of DL-based cytology image analysis in this article. We first introduce various deep learning methods, including fully supervised, weakly supervised, unsupervised, and transfer learning. Then, we systematically summarize the public datasets, evaluation metrics, versatile cytology image analysis applications including classification, detection, segmentation, and other related tasks. Finally, we discuss current challenges and potential research directions of computational cytology. △ Less

Submitted 16 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Journal ref: Medical Image Analysis, Nov 2022

arXiv:2201.03186 [pdf, other]

MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining Three-Sequence Cardiac Magnetic Resonance Images

Authors: Lei Li, Fu** Wu, Sihan Wang, Xinzhe Luo, Carlos Martin-Isla, Shuwei Zhai, Jianpeng Zhang, Yanfei Liu7, Zhen Zhang, Markus J. Ankenbrand, Haochuan Jiang, Xiaoran Zhang, Linhong Wang, Tewodros Weldebirhan Arega, Elif Altunok, Zhou Zhao, Feiyan Li, Jun Ma, ** Yang, Elodie Puybareau, Ilkay Oksuz, Stephanie Bricq, Weisheng Li, Kumaradevan Punithakumar, Sotirios A. Tsaftaris , et al. (7 additional authors not shown)

Abstract: Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which… ▽ More Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore potential of solutions, as well as to provide a benchmark for future research. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. Note that MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/). △ Less

Submitted 10 January, 2022; originally announced January 2022.

Showing 1–50 of 89 results for author: Jiang, H