Search | arXiv e-print repository

Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we employed a Transformer encoder structure to learn long-range dependencies, thereby enhancing the model's performance and generalization ability. Our method leverages a multimodal data fusion approach, integrating pre-trained audio and video backbones for feature extraction, followed by TCN-based spatiotemporal encoding and Transformer-based temporal information capture. Experimental results demonstrate the effectiveness of our approach, achieving competitive performance in VA estimation on the AffWild2 dataset. △ Less

Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 8 pages,3 figures

arXiv:2402.18856 [pdf, other]

Anatomy-guided fiber trajectory distribution estimation for cranial nerves tractography

Authors: Lei Xie, Qingrun Zeng, Huajun Zhou, Guoqiang Xie, Mingchu Li, Jiahao Huang, Jianan Cui, Hao Chen, Yuan**g Feng

Abstract: Diffusion MRI tractography is an important tool for identifying and analyzing the intracranial course of cranial nerves (CNs). However, the complex environment of the skull base leads to ambiguous spatial correspondence between diffusion directions and fiber geometry, and existing diffusion tractography methods of CNs identification are prone to producing erroneous trajectories and missing true po… ▽ More Diffusion MRI tractography is an important tool for identifying and analyzing the intracranial course of cranial nerves (CNs). However, the complex environment of the skull base leads to ambiguous spatial correspondence between diffusion directions and fiber geometry, and existing diffusion tractography methods of CNs identification are prone to producing erroneous trajectories and missing true positive connections. To overcome the above challenge, we propose a novel CNs identification framework with anatomy-guided fiber trajectory distribution, which incorporates anatomical shape prior knowledge during the process of CNs tracing to build diffusion tensor vector fields. We introduce higher-order streamline differential equations for continuous flow field representations to directly characterize the fiber trajectory distribution of CNs from the tract-based level. The experimental results on the vivo HCP dataset and the clinical MDM dataset demonstrate that the proposed method reduces false-positive fiber production compared to competing methods and produces reconstructed CNs (i.e. CN II, CN III, CN V, and CN VII/VIII) that are judged to better correspond to the known anatomy. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2310.13250 [pdf, other]

Diagnosis-oriented Medical Image Compression with Efficient Transfer Learning

Authors: Guangqi Xie, Xin Li, Xiaohan Pan, Zhibo Chen

Abstract: Remote medical diagnosis has emerged as a critical and indispensable technique in practical medical systems, where medical data are required to be efficiently compressed and transmitted for diagnosis by either professional doctors or intelligent diagnosis devices. In this process, a large amount of redundant content irrelevant to the diagnosis is subjected to high-fidelity coding, leading to unnec… ▽ More Remote medical diagnosis has emerged as a critical and indispensable technique in practical medical systems, where medical data are required to be efficiently compressed and transmitted for diagnosis by either professional doctors or intelligent diagnosis devices. In this process, a large amount of redundant content irrelevant to the diagnosis is subjected to high-fidelity coding, leading to unnecessary transmission costs. To mitigate this, we propose diagnosis-oriented medical image compression, a special semantic compression task designed for medical scenarios, targeting to reduce the compression cost without compromising the diagnosis accuracy. However, collecting sufficient medical data to optimize such a compression system is significantly expensive and challenging due to privacy issues and the lack of professional annotation. In this study, we propose DMIC, the first efficient transfer learning-based codec, for diagnosis-oriented medical image compression, which can be effectively optimized with only few-shot annotated medical examples, by reusing the knowledge in the existing reinforcement learning-based task-driven semantic coding framework, i.e., HRLVSC [1]. Concretely, we focus on tuning only the partial parameters of the policy network for bit allocation within HRLVSC, which enables it to adapt to the medical images. In this work, we validate our DMIC with the typical medical task, Coronary Artery Segmentation. Extensive experiments have demonstrated that our DMIC can achieve 47.594%BD-Rate savings compared to the HEVC anchor, by tuning only the A2C module (2.7% parameters) of the policy network with only 1 medical sample. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Accepted by IEEE VCIP

arXiv:2307.04296 [pdf, other]

K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

Authors: Guoyang Xie, **bao Wang, Yawen Huang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu **

Abstract: The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROS… ▽ More The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location, together with a tumor encoder for representing features, such as texture details and brightness intensities. To further reflect the frequency-specific information from the magnetic resonance imaging principles, both k-space features and vision features are obtained and employed in our comprehensive encoders with a frequency reconstruction penalty. The structure-shared encoders are designed and constrained with a similarity loss to capture the intrinsic common structural information for both modalities. As a consequence, the features learned from lesion regions, k-space, and anatomical structures are all captured, which serve as our quality evaluators. We evaluate the performance by constructing a large-scale cross-modality neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist judgments. Extensive experiments demonstrate that the proposed method outperforms other metrics, especially in comparison with the radiologists on NIRPS. △ Less

Submitted 9 February, 2024; v1 submitted 9 July, 2023; originally announced July 2023.

arXiv:2304.04760 [pdf, other]

SAR2EO: A High-resolution Image Translation Framework with Denoising Enhancement

Authors: Jun Yu, Shenshen Du, Guochen Xie, Renjie Lu, Pengwei Li, Zhongpeng Cai, Keda Lu

Abstract: Synthetic Aperture Radar (SAR) to electro-optical (EO) image translation is a fundamental task in remote sensing that can enrich the dataset by fusing information from different sources. Recently, many methods have been proposed to tackle this task, but they are still difficult to complete the conversion from low-resolution images to high-resolution images. Thus, we propose a framework, SAR2EO, ai… ▽ More Synthetic Aperture Radar (SAR) to electro-optical (EO) image translation is a fundamental task in remote sensing that can enrich the dataset by fusing information from different sources. Recently, many methods have been proposed to tackle this task, but they are still difficult to complete the conversion from low-resolution images to high-resolution images. Thus, we propose a framework, SAR2EO, aiming at addressing this challenge. Firstly, to generate high-quality EO images, we adopt the coarse-to-fine generator, multi-scale discriminators, and improved adversarial loss in the pix2pixHD model to increase the synthesis quality. Secondly, we introduce a denoising module to remove the noise in SAR images, which helps to suppress the noise while preserving the structural information of the images. To validate the effectiveness of the proposed framework, we conduct experiments on the dataset of the Multi-modal Aerial View Imagery Challenge (MAVIC), which consists of large-scale SAR and EO image pairs. The experimental results demonstrate the superiority of our proposed framework, and we win the first place in the MAVIC held in CVPR PBVS 2023. △ Less

Submitted 25 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2208.11529 [pdf, other]

Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

Authors: Guangqi Xie, Xin Li, Shiqi Lin, Li Zhang, Kai Zhang, Yue Li, Zhibo Chen

Abstract: The rapid development of intelligent tasks, e.g., segmentation, detection, classification, etc, has brought an urgent need for semantic compression, which aims to reduce the compression cost while maintaining the original semantic information. However, it is impractical to directly integrate the semantic metric into the traditional codecs since they cannot be optimized in an end-to-end manner. To… ▽ More The rapid development of intelligent tasks, e.g., segmentation, detection, classification, etc, has brought an urgent need for semantic compression, which aims to reduce the compression cost while maintaining the original semantic information. However, it is impractical to directly integrate the semantic metric into the traditional codecs since they cannot be optimized in an end-to-end manner. To solve this problem, some pioneering works have applied reinforcement learning to implement image-wise semantic compression. Nevertheless, video semantic compression has not been explored since its complex reference architectures and compression modes. In this paper, we take a step forward to video semantic compression and propose the Hierarchical Reinforcement Learning based task-driven Video Semantic Coding, named as HRLVSC. Specifically, to simplify the complex mode decision of video semantic coding, we divided the action space into frame-level and CTU-level spaces in a hierarchical manner, and then explore the best mode selection for them progressively with the cooperation of frame-level and CTU-level agents. Moreover, since the modes of video semantic coding will exponentially increase with the number of frames in a Group of Pictures (GOP), we carefully investigate the effects of different mode selections for video semantic coding and design a simple but effective mode simplification strategy for it. We have validated our HRLVSC on the video segmentation task with HEVC reference software HM16.19. Extensive experimental results demonstrated that our HRLVSC can achieve over 39% BD-rate saving for video semantic coding under the Low Delay P configuration. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: Accepted by VCIP2022

arXiv:2203.05847 [pdf]

Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology

Authors: Yang Nan, Fengyi Li, Peng Tang, Guyue Zhang, Caihong Zeng, Guotong Xie, Zhihong Liu, Guang Yang

Abstract: Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task. In this paper, we introduce a scheme to recognize fine-grained glomeruli lesions from whole slide images. First, a focal instance structural similarity loss is proposed to drive the mo… ▽ More Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task. In this paper, we introduce a scheme to recognize fine-grained glomeruli lesions from whole slide images. First, a focal instance structural similarity loss is proposed to drive the model to locate all types of glomeruli precisely. Then an Uncertainty Aided Apportionment Network is designed to carry out the fine-grained visual classification without bounding-box annotations. This double branch-shaped structure extracts common features of the child class from the parent class and produces the uncertainty factor for reconstituting the training dataset. Results of slide-wise evaluation illustrate the effectiveness of the entire scheme, with an 8-22% improvement of the mean Average Precision compared with remarkable detection methods. The comprehensive results clearly demonstrate the effectiveness of the proposed method. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 33 pages, 6 figures, accepted by the Pattern Recognition journal

arXiv:2202.13522 [pdf, other]

Pursuit-evasion differential games of players with different speeds in spaces of different dimensions

Authors: Shuai Li, Chen Wang, Guangming Xie

Abstract: We study pursuit-evasion differential games between a faster pursuer moving in 3D space and an evader moving in a plane. We first extend the well-known Apollonius circle to 3D space, by which we construct the isochron for the considered two players. Then both cases with and without a static target are considered and the corresponding optimal strategies are derived using the concept of isochron. In… ▽ More We study pursuit-evasion differential games between a faster pursuer moving in 3D space and an evader moving in a plane. We first extend the well-known Apollonius circle to 3D space, by which we construct the isochron for the considered two players. Then both cases with and without a static target are considered and the corresponding optimal strategies are derived using the concept of isochron. In order to guarantee the optimality of the proposed strategies, the value functions are given and are further proved to be the solution of Hamilton-Jacobi-Isaacs equation. Simulations with comparison between the proposed strategies and other classical strategies are carried out and the results show the optimality of the proposed strategies. △ Less

Submitted 27 February, 2022; originally announced February 2022.

arXiv:2202.06997 [pdf, other]

Cross-Modality Neuroimage Synthesis: A Survey

Authors: Guoyang Xie, Yawen Huang, **bao Wang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu **

Abstract: Multi-modality imaging improves disease diagnosis and reveals distinct deviations in tissues with anatomical properties. The existence of completely aligned and paired multi-modality neuroimaging data has proved its effectiveness in brain research. However, collecting fully aligned and paired data is expensive or even impractical, since it faces many difficulties, including high cost, long acquisi… ▽ More Multi-modality imaging improves disease diagnosis and reveals distinct deviations in tissues with anatomical properties. The existence of completely aligned and paired multi-modality neuroimaging data has proved its effectiveness in brain research. However, collecting fully aligned and paired data is expensive or even impractical, since it faces many difficulties, including high cost, long acquisition time, image corruption, and privacy issues. An alternative solution is to explore unsupervised or weakly supervised learning methods to synthesize the absent neuroimaging data. In this paper, we provide a comprehensive review of cross-modality synthesis for neuroimages, from the perspectives of weakly supervised and unsupervised settings, loss functions, evaluation metrics, imaging modalities, datasets, and downstream applications based on synthesis. We begin by highlighting several opening challenges for cross-modality neuroimage synthesis. Then, we discuss representative architectures of cross-modality synthesis methods under different supervisions. This is followed by a stepwise in-depth analysis to evaluate how cross-modality neuroimage synthesis improves the performance of its downstream tasks. Finally, we summarize the existing research findings and point out future research directions. All resources are available at https://github.com/M-3LAB/awesome-multimodal-brain-image-systhesis △ Less

Submitted 21 September, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

arXiv:2201.12589 [pdf, other]

FedMed-ATL: Misaligned Unpaired Brain Image Synthesis via Affine Transform Loss

Authors: **bao Wang, Guoyang Xie, Yawen Huang, Yefeng Zheng, Yaochu **, Feng Zheng

Abstract: The existence of completely aligned and paired multi-modal neuroimaging data has proved its effectiveness in the diagnosis of brain diseases. However, collecting the full set of well-aligned and paired data is impractical, since the practical difficulties may include high cost, long time acquisition, image corruption, and privacy issues. Previously, the misaligned unpaired neuroimaging data (terme… ▽ More The existence of completely aligned and paired multi-modal neuroimaging data has proved its effectiveness in the diagnosis of brain diseases. However, collecting the full set of well-aligned and paired data is impractical, since the practical difficulties may include high cost, long time acquisition, image corruption, and privacy issues. Previously, the misaligned unpaired neuroimaging data (termed as MUD) are generally treated as noisy label. However, such a noisy label-based method fail to accomplish well when misaligned data occurs distortions severely. For example, the angle of rotation is different. In this paper, we propose a novel federated self-supervised learning (FedMed) for brain image synthesis. An affine transform loss (ATL) was formulated to make use of severely distorted images without violating privacy legislation for the hospital. We then introduce a new data augmentation procedure for self-supervised training and fed it into three auxiliary heads, namely auxiliary rotation, auxiliary translation and auxiliary scaling heads. The proposed method demonstrates the advanced performance in both the quality of our synthesized results under a severely misaligned and unpaired data setting, and better stability than other GAN-based algorithms. The proposed method also reduces the demand for deformable registration while encouraging to leverage the misaligned and unpaired data. Experimental results verify the outstanding performance of our learning paradigm compared to other state-of-the-art approaches. △ Less

Submitted 16 July, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

Comments: arXiv admin note: text overlap with arXiv:2201.08953

arXiv:2112.04744 [pdf, other]

Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks

Authors: Jun Wang, Zhou**g Li, Yixuan Qiao, Qiming Qin, Peng Gao, Guotong Xie

Abstract: Building damage detection after natural disasters like earthquakes is crucial for initiating effective emergency response actions. Remotely sensed very high spatial resolution (VHR) imagery can provide vital information due to their ability to map the affected buildings with high geometric precision. Many approaches have been developed to detect damaged buildings due to earthquakes. However, littl… ▽ More Building damage detection after natural disasters like earthquakes is crucial for initiating effective emergency response actions. Remotely sensed very high spatial resolution (VHR) imagery can provide vital information due to their ability to map the affected buildings with high geometric precision. Many approaches have been developed to detect damaged buildings due to earthquakes. However, little attention has been paid to exploiting rich features represented in VHR images using Deep Neural Networks (DNN). This paper presents a novel superpixel based approach combining DNN and a modified segmentation method, to detect damaged buildings from VHR imagery. Firstly, a modified Fast Scanning and Adaptive Merging method is extended to create initial over-segmentation. Secondly, the segments are merged based on the Region Adjacent Graph (RAG), considered an improved semantic similarity criterion composed of Local Binary Patterns (LBP) texture, spectral, and shape features. Thirdly, a pre-trained DNN using Stacked Denoising Auto-Encoders called SDAE-DNN is presented, to exploit the rich semantic features for building damage detection. Deep-layer feature abstraction of SDAE-DNN could boost detection accuracy through learning more intrinsic and discriminative features, which outperformed other methods using state-of-the-art alternative classifiers. We demonstrate the feasibility and effectiveness of our method using a subset of WorldView-2 imagery, in the complex urban areas of Bhaktapur, Nepal, which was affected by the Nepal Earthquake of April 25, 2015. △ Less

Submitted 30 September, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

arXiv:2109.11572 [pdf, other]

SAME: Deformable Image Registration based on Self-supervised Anatomical Embeddings

Authors: Fengze Liu, Ke Yan, Adam Harrison, Dazhou Guo, Le Lu, Alan Yuille, Lingyun Huang, Guotong Xie, **g Xiao, Xianghua Ye, Dakai **

Abstract: In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep d… ▽ More In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep deformable registration. Using SAM embeddings, we enhance these steps by finding more coherent correspondences, and providing features and a loss function with better semantic guidance. We collect a multi-phase chest computed tomography dataset with 35 annotated organs for each patient and conduct inter-subject registration for quantitative evaluation. Results show that SAME outperforms widely-used traditional registration techniques (Elastix FFD, ANTs SyN) and learning based VoxelMorph method by at least 4.7% and 2.7% in Dice scores for two separate tasks of within-contrast-phase and across-contrast-phase registration, respectively. SAME achieves the comparable performance to the best traditional registration method, DEEDS (from our evaluation), while being orders of magnitude faster (from 45 seconds to 1.2 seconds). △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2109.09271 [pdf, ps, other]

DeepStationing: Thoracic Lymph Node Station Parsing in CT Scans using Anatomical Context Encoding and Key Organ Auto-Search

Authors: Dazhou Guo, Xianghua Ye, Jia Ge, Xing Di, Le Lu, Lingyun Huang, Guotong Xie, **g Xiao, Zhongjie Liu, Ling Peng, Senxiang Yan, Dakai **

Abstract: Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is sever… ▽ More Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is severely limited. LNS is highly context-dependent - LNS boundaries are constrained by anatomical organs - we formulate it as a deep spatial and contextual parsing problem via encoded anatomical organs. This permits the deep network to better learn from both CT appearance and organ context. We develop a stratified referencing organ segmentation protocol that divides the organs into anchor and non-anchor categories and uses the former's predictions to guide the later segmentation. We further develop an auto-search module to identify the key organs that opt for the optimal LNS parsing performance. Extensive four-fold cross-validation experiments on a dataset of 98 esophageal cancer patients (with the most comprehensive set of 12 LNSs + 22 organs in thoracic region to date) are conducted. Our LNS parsing model produces significant performance improvements, with an average Dice score of 81.1% +/- 6.1%, which is 5.0% and 19.2% higher over the pure CT-based deep model and the previous representative approach, respectively. △ Less

Submitted 19 September, 2021; originally announced September 2021.

arXiv:2108.11623 [pdf, other]

Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Authors: Baiyu Peng, **gliang Duan, Jianyu Chen, Shengbo Eben Li, Gen** Xie, Congsheng Zhang, Yang Guan, Yao Mu, Enxin Sun

Abstract: Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address thes… ▽ More Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. We first review the constrained policy optimization process from a feedback control perspective, which regards the penalty weight as the control input and the safe probability as the control output. Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller. We then unify them and present a proportional-integral Lagrangian method to get both their merits, with an integral separation technique to limit the integral value in a reasonable range. To accelerate training, the gradient of safe probability is computed in a model-based manner. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation. To prove its practicality, we also apply our method to a real-world mobile robot navigation task, where our robot successfully avoids a moving obstacle with highly uncertain or even aggressive behaviors. △ Less

Submitted 26 August, 2021; originally announced August 2021.

arXiv:2105.01828 [pdf, other]

Lesion Segmentation and RECIST Diameter Prediction via Click-driven Attention and Dual-path Connection

Authors: Youbao Tang, Ke Yan, **zheng Cai, Lingyun Huang, Guotong Xie, **g Xiao, **g**g Lu, Gigin Lin, Le Lu

Abstract: Measuring lesion size is an important step to assess tumor growth and monitor disease progression and therapy response in oncology image analysis. Although it is tedious and highly time-consuming, radiologists have to work on this task by using RECIST criteria (Response Evaluation Criteria In Solid Tumors) routinely and manually. Even though lesion segmentation may be the more accurate and clinica… ▽ More Measuring lesion size is an important step to assess tumor growth and monitor disease progression and therapy response in oncology image analysis. Although it is tedious and highly time-consuming, radiologists have to work on this task by using RECIST criteria (Response Evaluation Criteria In Solid Tumors) routinely and manually. Even though lesion segmentation may be the more accurate and clinically more valuable means, physicians can not manually segment lesions as now since much more heavy laboring will be required. In this paper, we present a prior-guided dual-path network (PDNet) to segment common types of lesions throughout the whole body and predict their RECIST diameters accurately and automatically. Similar to [1], a click guidance from radiologists is the only requirement. There are two key characteristics in PDNet: 1) Learning lesion-specific attention matrices in parallel from the click prior information by the proposed prior encoder, named click-driven attention; 2) Aggregating the extracted multi-scale features comprehensively by introducing top-down and bottom-up connections in the proposed decoder, named dual-path connection. Experiments show the superiority of our proposed PDNet in lesion segmentation and RECIST diameter prediction using the DeepLesion dataset and an external test set. PDNet learns comprehensive and representative deep image features for our tasks and produces more accurate results on both lesion segmentation and RECIST diameter prediction. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2105.01218 [pdf, other]

Weakly-Supervised Universal Lesion Segmentation with Regional Level Set Loss

Authors: Youbao Tang, **zheng Cai, Ke Yan, Lingyun Huang, Guotong Xie, **g Xiao, **g**g Lu, Gigin Lin, Le Lu

Abstract: Accurately segmenting a variety of clinically significant lesions from whole body computed tomography (CT) scans is a critical task on precision oncology imaging, denoted as universal lesion segmentation (ULS). Manual annotation is the current clinical practice, being highly time-consuming and inconsistent on tumor's longitudinal assessment. Effectively training an automatic segmentation model is… ▽ More Accurately segmenting a variety of clinically significant lesions from whole body computed tomography (CT) scans is a critical task on precision oncology imaging, denoted as universal lesion segmentation (ULS). Manual annotation is the current clinical practice, being highly time-consuming and inconsistent on tumor's longitudinal assessment. Effectively training an automatic segmentation model is desirable but relies heavily on a large number of pixel-wise labelled data. Existing weakly-supervised segmentation approaches often struggle with regions nearby the lesion boundaries. In this paper, we present a novel weakly-supervised universal lesion segmentation method by building an attention enhanced model based on the High-Resolution Network (HRNet), named AHRNet, and propose a regional level set (RLS) loss for optimizing lesion boundary delineation. AHRNet provides advanced high-resolution deep image features by involving a decoder, dual-attention and scale attention mechanisms, which are crucial to performing accurate lesion segmentation. RLS can optimize the model reliably and effectively in a weakly-supervised fashion, forcing the segmentation close to lesion boundary. Extensive experimental results demonstrate that our method achieves the best performance on the publicly large-scale DeepLesion dataset and a hold-out test set. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2103.13482 [pdf, other]

Semi-Supervised Learning for Bone Mineral Density Estimation in Hip X-ray Images

Authors: Kang Zheng, Yirui Wang, Xiaoyun Zhou, Fakai Wang, Le Lu, Chihung Lin, Lingyun Huang, Guotong Xie, **g Xiao, Chang-Fu Kuo, Shun Miao

Abstract: Bone mineral density (BMD) is a clinically critical indicator of osteoporosis, usually measured by dual-energy X-ray absorptiometry (DEXA). Due to the limited accessibility of DEXA machines and examinations, osteoporosis is often under-diagnosed and under-treated, leading to increased fragility fracture risks. Thus it is highly desirable to obtain BMDs with alternative cost-effective and more acce… ▽ More Bone mineral density (BMD) is a clinically critical indicator of osteoporosis, usually measured by dual-energy X-ray absorptiometry (DEXA). Due to the limited accessibility of DEXA machines and examinations, osteoporosis is often under-diagnosed and under-treated, leading to increased fragility fracture risks. Thus it is highly desirable to obtain BMDs with alternative cost-effective and more accessible medical imaging examinations such as X-ray plain films. In this work, we formulate the BMD estimation from plain hip X-ray images as a regression problem. Specifically, we propose a new semi-supervised self-training algorithm to train the BMD regression model using images coupled with DEXA measured BMDs and unlabeled images with pseudo BMDs. Pseudo BMDs are generated and refined iteratively for unlabeled images during self-training. We also present a novel adaptive triplet loss to improve the model's regression accuracy. On an in-house dataset of 1,090 images (819 unique patients), our BMD estimation method achieves a high Pearson correlation coefficient of 0.8805 to ground-truth BMDs. It offers good feasibility to use the more accessible and cheaper X-ray imaging for opportunistic osteoporosis screening. △ Less

Submitted 19 May, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

arXiv:2006.14421 [pdf, other]

doi 10.1088/1748-3190/abb86c

Artificial Lateral Line Based Relative State Estimation for Two Adjacent Robotic Fish

Authors: Xingwen Zheng, Wei Wang, Liang Li, Guangming Xie

Abstract: The lateral line enables fish to efficiently sense the surrounding environment, thus assisting flow-related fish behaviours. Inspired by this phenomenon, varieties of artificial lateral line systems (ALLSs) have been developed and applied to underwater robots. This article focuses on using the pressure sensor arrays based on ALLS-measured hydrodynamic pressure variations (HPVs) for estimating the… ▽ More The lateral line enables fish to efficiently sense the surrounding environment, thus assisting flow-related fish behaviours. Inspired by this phenomenon, varieties of artificial lateral line systems (ALLSs) have been developed and applied to underwater robots. This article focuses on using the pressure sensor arrays based on ALLS-measured hydrodynamic pressure variations (HPVs) for estimating the relative state between two adjacent robotic fish with leader-follower formation. The relative states include the relative oscillating frequency, amplitude, and offset of the upstream robotic fish to the downstream robotic fish, the relative vertical distance, the relative yaw angle, the relative pitch angle, and the relative roll angle between the two adjacent robotic fish. Regression model between the ALLS-measured and the mentioned relative states is investigated, and regression model-based relative state estimation is conducted. Specifically, two criteria are proposed firstly to investigate not only the sensitivity of each pressure sensor to the variations of relative state but also the insufficiency and redundancy of the pressure sensors. And thus the pressure sensors used for regression analysis are determined. Then four typical regression methods, including random forest algorithm, support vector regression, back propagation neural network, and multiple linear regression method are used for establishing regression models between the ALLS-measured HPVs and the relative states. Then regression effects of the four methods are compared and discussed. Finally, random forest-based method, which has the best regression effect, is used to estimate relative yaw angle and oscillating amplitude using the ALLS-measured HPVs and exhibits excellent estimation performance. This work contributes to local relative estimation for a group of underwater robots, which has always been a challenge. △ Less

Submitted 22 May, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: Accepted by Bioinspiration & Biomimetics

arXiv:2006.14420 [pdf, other]

Three-Dimensional Dynamic Modeling and Motion Analysis for an Active-Tail-Actuated Robotic Fish with Barycentre Regulating Mechanism

Authors: Xingwen Zheng, Minglei Xiong, Junzheng Zheng, Manyi Wang, Runyu Tian, Guangming Xie

Abstract: Dynamic modeling has been capturing attention for its fundamentality in precise locomotion analyses and control of underwater robots. However, the existing researches have mainly focused on investigating two-dimensional motion of underwater robots, and little attention has been paid to three-dimensional dynamic modeling, which is just what we focus on. In this article, a three-dimensional dynamic… ▽ More Dynamic modeling has been capturing attention for its fundamentality in precise locomotion analyses and control of underwater robots. However, the existing researches have mainly focused on investigating two-dimensional motion of underwater robots, and little attention has been paid to three-dimensional dynamic modeling, which is just what we focus on. In this article, a three-dimensional dynamic model of an active-tail-actuated robotic fish with a barycentre regulating mechanism is built by combining Newton's second law for linear motion and Euler's equation for angular motion. The model parameters are determined by three-dimensional computer-aided design (CAD) software SolidWorks, HyperFlow-based computational fluid dynamics (CFD) simulation, and grey-box model estimation method. Both kinematic experiments with a prototype and numerical simulations are applied to validate the accuracy of the dynamic model mutually. Based on the dynamic model, multiple three-dimensional motions, including rectilinear motion, turning motion, gliding motion, and spiral motion, are analyzed. The experimental and simulation results demonstrate the effectiveness of the proposed model in evaluating the trajectory, attitude, and motion parameters, including the velocity, turning radius, angular velocity, etc., of the robotic fish. △ Less

Submitted 22 May, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

arXiv:2006.09808 [pdf]

Positive Contrast Susceptibility MR Imaging Using GPU-based Primal-Dual Algorithm

Authors: Haifeng Wang, Fang Cai, Caiyun Shi, **g Cheng, Shi Su, Zhilang Qiu, Guoxi Xie, Hanwei Chen, Xin Liu, Dong Liang

Abstract: The susceptibility-based positive contrast MR technique was applied to estimate arbitrary magnetic susceptibility distributions of the metallic devices using a kernel deconvolution algorithm with a regularized L-1 minimization.Previously, the first-order primal-dual (PD) algorithm could provide a faster reconstruction time to solve the L-1 minimization, compared with other methods. Here, we propos… ▽ More The susceptibility-based positive contrast MR technique was applied to estimate arbitrary magnetic susceptibility distributions of the metallic devices using a kernel deconvolution algorithm with a regularized L-1 minimization.Previously, the first-order primal-dual (PD) algorithm could provide a faster reconstruction time to solve the L-1 minimization, compared with other methods. Here, we propose to accelerate the PD algorithm of the positive contrast image using the multi-core multi-thread feature of graphics processor units (GPUs). The some experimental results showed that the GPU-based PD algorithm could achieve comparable accuracy of the metallic interventional devices in positive contrast imaging with less computational time. And the GPU-based PD approach was 4~15 times faster than the previous CPU-based scheme. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 4 pages, 6 figures, Accepted at the 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2020)

arXiv:1905.06773 [pdf, other]

Input Modeling and Uncertainty Quantification for Improving Volatile Residential Load Forecasting

Authors: Guangrui Xie, Xi Chen, Yang Weng

Abstract: Load forecasting has long been recognized as an important building block for all utility operational planning efforts. Over the recent years, it has become ever more challenging to make accurate forecasts due to the proliferation of distributed energy resources, despite the abundance of existing load forecasting methods. In this paper, we identify one drawback suffered by most load forecasting met… ▽ More Load forecasting has long been recognized as an important building block for all utility operational planning efforts. Over the recent years, it has become ever more challenging to make accurate forecasts due to the proliferation of distributed energy resources, despite the abundance of existing load forecasting methods. In this paper, we identify one drawback suffered by most load forecasting methods: neglect to thoroughly address the impact of input errors on load forecasts. As a potential solution, we propose to incorporate input modeling and uncertainty quantification to improve load forecasting performance via a two-stage approach. The proposed two-stage approach has the following merits. (1) It provides input modeling and quantifies the impact of input errors, rather than neglecting or mitigating the impact, a prevalent practice of existing methods. (2) It propagates the impact of input errors into the ultimate point and interval predictions for the target customer's load to improve predictive performance. (3) A variance-based global sensitivity analysis method is further proposed for input-space dimensionality reduction in both stages to enhance the computational efficiency. Numerical experiments show that the proposed two-stage approach outperforms competing load forecasting methods in terms of both point predictive accuracy and coverage ability of the predictive intervals. △ Less

Submitted 16 May, 2019; originally announced May 2019.

Comments: 9 pages, 4 figures, journal

Showing 1–21 of 21 results for author: Xie, G