-
Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation
Authors:
Kang Liu,
Zhuoqi Ma,
Xiaolu Kang,
Zhusi Zhong,
Zhicheng Jiao,
Grayson Baird,
Harrison Bai,
Qiguang Miao
Abstract:
The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction a…
▽ More
The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation
Authors:
Kang Liu,
Zhuoqi Ma,
Mengmeng Liu,
Zhicheng Jiao,
Xiaolu Kang,
Qiguang Miao,
Kun Xie
Abstract:
The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal a…
▽ More
The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal alignment. Additionally, existing methods for similar historical cases retrieval face suboptimal performance owing to the modal gap issue. In response, this paper introduces a novel method, named Factual Serialization Enhancement (FSE), for chest X-ray report generation. FSE begins with the structural entities approach to eliminate presentation-style vocabulary in reports, providing specific input for our model. Then, uni-modal features are learned through cross-modal alignment between images and factual serialization in reports. Subsequently, we present a novel approach to retrieve similar historical cases from the training set, leveraging aligned image features. These features implicitly preserve semantic similarity with their corresponding reference reports, enabling us to calculate similarity solely among aligned features. This effectively eliminates the modal gap issue for knowledge retrieval without the requirement for disease labels. Finally, the cross-modal fusion network is employed to query valuable information from these cases, enriching image features and aiding the text decoder in generating high-quality reports. Experiments on MIMIC-CXR and IU X-ray datasets from both specific and general scenarios demonstrate the superiority of FSE over state-of-the-art approaches in both natural language generation and clinical efficacy metrics.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation
Authors:
Alex Ling Yu Hung,
Haoxin Zheng,
Kai Zhao,
Xiaoxi Du,
Kaifeng Pang,
Qi Miao,
Steven S. Raman,
Demetri Terzopoulos,
Kyunghyun Sung
Abstract:
A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D meth…
▽ More
A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D methods disregard crucial volumetric information. Insufficient work has been done on 2.5D methods, in which 2D convolution is mainly used in concert with volumetric information. These models focus on learning the relationship across slices, but typically have many parameters to train. We offer a Cross-Slice Attention Module (CSAM) with minimal trainable parameters, which captures information across all the slices in the volume by applying semantic, positional, and slice attention on deep feature maps at different scales. Our extensive experiments using different network architectures and tasks demonstrate the usefulness and generalizability of CSAM. Associated code is available at https://github.com/aL3x-O-o-Hung/CSAM.
△ Less
Submitted 26 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion
Authors:
Jiangyi Deng,
Yanjiao Chen,
Yinan Zhong,
Qianhao Miao,
Xueluan Gong,
Wenyuan Xu
Abstract:
Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling t…
▽ More
Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling the features of the source speaker from a converted audio is challenging since the voice conversion operation intends to disentangle the original features and infuse the features of the target speaker. To fulfill our goal, we develop Revelio, a representation learning model, which learns to effectively extract the voiceprint of the source speaker from converted audio samples. We equip Revelio with a carefully-designed differential rectification algorithm to eliminate the influence of the target speaker by removing the representation component that is parallel to the voiceprint of the target speaker. We have conducted extensive experiments to evaluate the capability of Revelio in restoring voiceprint from audios converted by VQVC, VQVC+, AGAIN, and BNE. The experiments verify that Revelio is able to rebuild voiceprints that can be traced to the source speaker by speaker verification and identification systems. Revelio also exhibits robust performance under inter-gender conversion, unseen languages, and telephony networks.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
CAT-Net: A Cross-Slice Attention Transformer Model for Prostate Zonal Segmentation in MRI
Authors:
Alex Ling Yu Hung,
Haoxin Zheng,
Qi Miao,
Steven S. Raman,
Demetri Terzopoulos,
Kyunghyun Sung
Abstract:
Prostate cancer is the second leading cause of cancer death among men in the United States. The diagnosis of prostate MRI often relies on the accurate prostate zonal segmentation. However, state-of-the-art automatic segmentation methods often fail to produce well-contained volumetric segmentation of the prostate zones since certain slices of prostate MRI, such as base and apex slices, are harder t…
▽ More
Prostate cancer is the second leading cause of cancer death among men in the United States. The diagnosis of prostate MRI often relies on the accurate prostate zonal segmentation. However, state-of-the-art automatic segmentation methods often fail to produce well-contained volumetric segmentation of the prostate zones since certain slices of prostate MRI, such as base and apex slices, are harder to segment than other slices. This difficulty can be overcome by accounting for the cross-slice relationship of adjacent slices, but current methods do not fully learn and exploit such relationships. In this paper, we propose a novel cross-slice attention mechanism, which we use in a Transformer module to systematically learn the cross-slice relationship at different scales. The module can be utilized in any existing learning-based segmentation framework with skip connections. Experiments show that our cross-slice attention is able to capture the cross-slice information in prostate zonal segmentation and improve the performance of current state-of-the-art methods. Our method improves segmentation accuracy in the peripheral zone, such that the segmentation results are consistent across all the prostate slices (apex, mid-gland, and base).
△ Less
Submitted 16 June, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Authors:
Yifei Ding,
Min** Jia,
Qiuhua Miao,
Yudong Cao
Abstract:
The scope of data-driven fault diagnosis models is greatly extended through deep learning (DL). However, the classical convolution and recurrent structure have their defects in computational efficiency and feature representation, while the latest Transformer architecture based on attention mechanism has not yet been applied in this field. To solve these problems, we propose a novel time-frequency…
▽ More
The scope of data-driven fault diagnosis models is greatly extended through deep learning (DL). However, the classical convolution and recurrent structure have their defects in computational efficiency and feature representation, while the latest Transformer architecture based on attention mechanism has not yet been applied in this field. To solve these problems, we propose a novel time-frequency Transformer (TFT) model inspired by the massive success of vanilla Transformer in sequence processing. Specially, we design a fresh tokenizer and encoder module to extract effective abstractions from the time-frequency representation (TFR) of vibration signals. On this basis, a new end-to-end fault diagnosis framework based on time-frequency Transformer is presented in this paper. Through the case studies on bearing experimental datasets, we construct the optimal Transformer structure and verify its fault diagnosis performance. The superiority of the proposed method is demonstrated in comparison with the benchmark models and other state-of-the-art methods.
△ Less
Submitted 4 December, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Estimating Distances via Received Signal Strength and Connectivity in Wireless Sensor Networks
Authors:
Qing Miao,
Baoqi Huang,
Bing Jia
Abstract:
Distance estimation is vital for localization and many other applications in wireless sensor networks (WSNs). Particularly, it is desirable to implement distance estimation as well as localization without using specific hardware in low-cost WSNs. As such, both the received signal strength (RSS) based approach and the connectivity based approach have gained much attention. The RSS based approach is…
▽ More
Distance estimation is vital for localization and many other applications in wireless sensor networks (WSNs). Particularly, it is desirable to implement distance estimation as well as localization without using specific hardware in low-cost WSNs. As such, both the received signal strength (RSS) based approach and the connectivity based approach have gained much attention. The RSS based approach is suitable for estimating short distances, whereas the connectivity based approach obtains relatively good performance for estimating long distances. Considering the complementary features of these two approaches, we propose a fusion method based on the maximum-likelihood estimator (MLE) to estimate the distance between any pair of neighboring nodes in a WSN through efficiently fusing the information from the RSS and local connectivity. Additionally, the method is reported under the practical log-normal shadowing model, and the associated Cramer-Rao lower bound (CRLB) is also derived for performance analysis. Both simulations and experiments based on practical measurements are carried out, and demonstrate that the proposed method outperforms any single approach and approaches to the CRLB as well.
△ Less
Submitted 28 January, 2018;
originally announced January 2018.