Search | arXiv e-print repository

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang Miao

Abstract: The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction a… ▽ More The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: The code is available at https://github.com/mk-runner/SEI-Temp or https://github.com/mk-runner/SEI

arXiv:2405.09586 [pdf, other]

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

Abstract: The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal a… ▽ More The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal alignment. Additionally, existing methods for similar historical cases retrieval face suboptimal performance owing to the modal gap issue. In response, this paper introduces a novel method, named Factual Serialization Enhancement (FSE), for chest X-ray report generation. FSE begins with the structural entities approach to eliminate presentation-style vocabulary in reports, providing specific input for our model. Then, uni-modal features are learned through cross-modal alignment between images and factual serialization in reports. Subsequently, we present a novel approach to retrieve similar historical cases from the training set, leveraging aligned image features. These features implicitly preserve semantic similarity with their corresponding reference reports, enabling us to calculate similarity solely among aligned features. This effectively eliminates the modal gap issue for knowledge retrieval without the requirement for disease labels. Finally, the cross-modal fusion network is employed to query valuable information from these cases, enriching image features and aiding the text decoder in generating high-quality reports. Experiments on MIMIC-CXR and IU X-ray datasets from both specific and general scenarios demonstrate the superiority of FSE over state-of-the-art approaches in both natural language generation and clinical efficacy metrics. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2311.04942 [pdf, other]

CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation

Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Xiaoxi Du, Kaifeng Pang, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

Abstract: A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D meth… ▽ More A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D methods disregard crucial volumetric information. Insufficient work has been done on 2.5D methods, in which 2D convolution is mainly used in concert with volumetric information. These models focus on learning the relationship across slices, but typically have many parameters to train. We offer a Cross-Slice Attention Module (CSAM) with minimal trainable parameters, which captures information across all the slices in the volume by applying semantic, positional, and slice attention on deep feature maps at different scales. Our extensive experiments using different network architectures and tasks demonstrate the usefulness and generalizability of CSAM. Associated code is available at https://github.com/aL3x-O-o-Hung/CSAM. △ Less

Submitted 26 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2302.12434 [pdf, other]

Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

Authors: Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu

Abstract: Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling t… ▽ More Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling the features of the source speaker from a converted audio is challenging since the voice conversion operation intends to disentangle the original features and infuse the features of the target speaker. To fulfill our goal, we develop Revelio, a representation learning model, which learns to effectively extract the voiceprint of the source speaker from converted audio samples. We equip Revelio with a carefully-designed differential rectification algorithm to eliminate the influence of the target speaker by removing the representation component that is parallel to the voiceprint of the target speaker. We have conducted extensive experiments to evaluate the capability of Revelio in restoring voiceprint from audios converted by VQVC, VQVC+, AGAIN, and BNE. The experiments verify that Revelio is able to rebuild voiceprints that can be traced to the source speaker by speaker verification and identification systems. Revelio also exhibits robust performance under inter-gender conversion, unseen languages, and telephony networks. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted by USENIX Security Symposium 2023. Please cite this paper as "Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu. Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion. In 32nd USENIX Security Symposium (USENIX Security 23)."

arXiv:2203.15163 [pdf, other]

CAT-Net: A Cross-Slice Attention Transformer Model for Prostate Zonal Segmentation in MRI

Authors: Alex Ling Yu Hung, Haoxin Zheng, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

Abstract: Prostate cancer is the second leading cause of cancer death among men in the United States. The diagnosis of prostate MRI often relies on the accurate prostate zonal segmentation. However, state-of-the-art automatic segmentation methods often fail to produce well-contained volumetric segmentation of the prostate zones since certain slices of prostate MRI, such as base and apex slices, are harder t… ▽ More Prostate cancer is the second leading cause of cancer death among men in the United States. The diagnosis of prostate MRI often relies on the accurate prostate zonal segmentation. However, state-of-the-art automatic segmentation methods often fail to produce well-contained volumetric segmentation of the prostate zones since certain slices of prostate MRI, such as base and apex slices, are harder to segment than other slices. This difficulty can be overcome by accounting for the cross-slice relationship of adjacent slices, but current methods do not fully learn and exploit such relationships. In this paper, we propose a novel cross-slice attention mechanism, which we use in a Transformer module to systematically learn the cross-slice relationship at different scales. The module can be utilized in any existing learning-based segmentation framework with skip connections. Experiments show that our cross-slice attention is able to capture the cross-slice information in prostate zonal segmentation and improve the performance of current state-of-the-art methods. Our method improves segmentation accuracy in the peripheral zone, such that the segmentation results are consistent across all the prostate slices (apex, mid-gland, and base). △ Less

Submitted 16 June, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2104.09079 [pdf, other]

doi 10.1016/j.ymssp.2021.108616

A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings

Authors: Yifei Ding, Min** Jia, Qiuhua Miao, Yudong Cao

Abstract: The scope of data-driven fault diagnosis models is greatly extended through deep learning (DL). However, the classical convolution and recurrent structure have their defects in computational efficiency and feature representation, while the latest Transformer architecture based on attention mechanism has not yet been applied in this field. To solve these problems, we propose a novel time-frequency… ▽ More The scope of data-driven fault diagnosis models is greatly extended through deep learning (DL). However, the classical convolution and recurrent structure have their defects in computational efficiency and feature representation, while the latest Transformer architecture based on attention mechanism has not yet been applied in this field. To solve these problems, we propose a novel time-frequency Transformer (TFT) model inspired by the massive success of vanilla Transformer in sequence processing. Specially, we design a fresh tokenizer and encoder module to extract effective abstractions from the time-frequency representation (TFR) of vibration signals. On this basis, a new end-to-end fault diagnosis framework based on time-frequency Transformer is presented in this paper. Through the case studies on bearing experimental datasets, we construct the optimal Transformer structure and verify its fault diagnosis performance. The superiority of the proposed method is demonstrated in comparison with the benchmark models and other state-of-the-art methods. △ Less

Submitted 4 December, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Journal ref: Mech. Syst. Signal Process., vol. 168, p. 108616, Apr. 2022

arXiv:1801.09350 [pdf, ps, other]

Estimating Distances via Received Signal Strength and Connectivity in Wireless Sensor Networks

Authors: Qing Miao, Baoqi Huang, Bing Jia

Abstract: Distance estimation is vital for localization and many other applications in wireless sensor networks (WSNs). Particularly, it is desirable to implement distance estimation as well as localization without using specific hardware in low-cost WSNs. As such, both the received signal strength (RSS) based approach and the connectivity based approach have gained much attention. The RSS based approach is… ▽ More Distance estimation is vital for localization and many other applications in wireless sensor networks (WSNs). Particularly, it is desirable to implement distance estimation as well as localization without using specific hardware in low-cost WSNs. As such, both the received signal strength (RSS) based approach and the connectivity based approach have gained much attention. The RSS based approach is suitable for estimating short distances, whereas the connectivity based approach obtains relatively good performance for estimating long distances. Considering the complementary features of these two approaches, we propose a fusion method based on the maximum-likelihood estimator (MLE) to estimate the distance between any pair of neighboring nodes in a WSN through efficiently fusing the information from the RSS and local connectivity. Additionally, the method is reported under the practical log-normal shadowing model, and the associated Cramer-Rao lower bound (CRLB) is also derived for performance analysis. Both simulations and experiments based on practical measurements are carried out, and demonstrate that the proposed method outperforms any single approach and approaches to the CRLB as well. △ Less

Submitted 28 January, 2018; originally announced January 2018.

Showing 1–7 of 7 results for author: Miao, Q