-
XCAT-3.0: A Comprehensive Library of Personalized Digital Twins Derived from CT Scans
Authors:
Lavsen Dahal,
Mobina Ghojoghnejad,
Dhrubajyoti Ghosh,
Yubraj Bhandari,
David Kim,
Fong Chi Ho,
Fakrul Islam Tushar,
Sheng Luoa,
Kyle J. Lafata,
Ehsan Abadi,
Ehsan Samei,
Joseph Y. Lo,
W. Paul Segars
Abstract:
Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VIT. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hampers…
▽ More
Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VIT. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and diversity. Insufficient representation of the population hampers accurate assessment of imaging technologies across different patient groups. Traditionally, phantoms were created by manual segmentation, which is a laborious and time-consuming task, impeding the expansion of phantom libraries. This study presents a framework for realistic computational phantom modeling using a suite of four deep learning segmentation models, followed by three forms of automated organ segmentation quality control. Over 2500 computational phantoms with up to 140 structures illustrating a sophisticated approach to detailed anatomical modeling are released. Phantoms are available in both voxelized and surface mesh formats. The framework is aggregated with an in-house CT scanner simulator to produce realistic CT images. The framework can potentially advance virtual imaging trials, facilitating comprehensive and reliable evaluations of medical imaging technologies. Phantoms may be requested at https://cvit.duke.edu/resources/, code, model weights, and sample CT images are available at https://xcat-3.github.io.
△ Less
Submitted 1 June, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation
Authors:
Kaouther Mouheb,
Mobina Ghojogh Nejad,
Lavsen Dahal,
Ehsan Samei,
Kyle J. Lafata,
W. Paul Segars,
Joseph Y. Lo
Abstract:
Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent…
▽ More
Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent advancements in geometric deep learning and denoising diffusion probabilistic models to refine the segmentation results of the large intestine. We begin by representing the organ as point clouds sampled from the surface of the 3D segmentation mask. Subsequently, we employ a hierarchical variational autoencoder to obtain global and local latent representations of the organ's shape. We train two conditional denoising diffusion models in the hierarchical latent space to perform shape refinement. To further enhance our method, we incorporate a state-of-the-art surface reconstruction model, allowing us to generate smooth meshes from the obtained complete point clouds. Experimental results demonstrate the effectiveness of our approach in capturing both the global distribution of the organ's shape and its fine details. Our complete refinement pipeline demonstrates remarkable enhancements in surface representation compared to the initial segmentation, reducing the Chamfer distance by 70%, the Hausdorff distance by 32%, and the Earth Mover's distance by 6%. By combining geometric deep learning, denoising diffusion models, and advanced surface reconstruction techniques, our proposed method offers a promising solution for accurately modeling the large intestine's surface and can easily be extended to other anatomical structures.
△ Less
Submitted 20 May, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging
Authors:
Fakrul Islam Tushar,
Lavsen Dahal,
Saman Sotoudeh-Paima,
Ehsan Abadi,
W. Paul Segars,
Ehsan Samei,
Joseph Y. Lo
Abstract:
The credibility of AI models in medical imaging is often challenged by reproducibility issues and obscured clinical insights, a reality highlighted during the COVID-19 pandemic by many reports of near-perfect artificial intelligence (AI) models that all failed to generalize. To address these concerns, we propose a virtual imaging trial framework, employing a diverse collection of medical images th…
▽ More
The credibility of AI models in medical imaging is often challenged by reproducibility issues and obscured clinical insights, a reality highlighted during the COVID-19 pandemic by many reports of near-perfect artificial intelligence (AI) models that all failed to generalize. To address these concerns, we propose a virtual imaging trial framework, employing a diverse collection of medical images that are both clinical and simulated. In this study, COVID-19 serves as a case example to unveil the intrinsic and extrinsic factors influencing AI performance. Our findings underscore a significant impact of dataset characteristics on AI efficacy. Even when trained on large, diverse clinical datasets with thousands of patients, AI performance plummeted by up to 20% in generalization. However, virtual imaging trials offer a robust platform for objective assessment, unveiling nuanced insights into the relationships between patient- and physics-based factors and AI performance. For instance, disease extent markedly influenced AI efficacy, computed tomography (CT) out-performed chest radiography (CXR), while imaging dose exhibited minimal impact. Using COVID-19 as a case study, this virtual imaging trial study verified that radiology AI models often suffer from a reproducibility crisis. Virtual imaging trials not only offered a solution for objective performance assessment but also extracted several clinical insights. This study illuminates the path for leveraging virtual imaging to augment the reliability, transparency, and clinical relevance of AI in medical imaging.
△ Less
Submitted 31 March, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Virtual vs. Reality: External Validation of COVID-19 Classifiers using XCAT Phantoms for Chest Computed Tomography
Authors:
Fakrul Islam Tushar,
Ehsan Abadi,
Saman Sotoudeh-Paima,
Rafael B. Fricks,
Maciej A. Mazurowski,
W. Paul Segars,
Ehsan Samei,
Joseph Y. Lo
Abstract:
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID…
▽ More
Research studies of artificial intelligence models in medical imaging have been hampered by poor generalization. This problem has been especially concerning over the last year with numerous applications of deep learning for COVID-19 diagnosis. Virtual imaging trials (VITs) could provide a solution for objective evaluation of these models. In this work utilizing the VITs, we created the CVIT-COVID dataset including 180 virtually imaged computed tomography (CT) images from simulated COVID-19 and normal phantom models under different COVID-19 morphology and imaging properties. We evaluated the performance of an open-source, deep-learning model from the University of Waterloo trained with multi-institutional data and an in-house model trained with the open clinical dataset called MosMed. We further validated the model's performance against open clinical data of 305 CT images to understand virtual vs. real clinical data performance. The open-source model was published with nearly perfect performance on the original Waterloo dataset but showed a consistent performance drop in external testing on another clinical dataset (AUC=0.77) and our simulated CVIT-COVID dataset (AUC=0.55). The in-house model achieved an AUC of 0.87 while testing on the internal test set (MosMed test set). However, performance dropped to an AUC of 0.65 and 0.69 when evaluated on clinical and our simulated CVIT-COVID dataset. The VIT framework offered control over imaging conditions, allowing us to show there was no change in performance as CT exposure was changed from 28.5 to 57 mAs. The VIT framework also provided voxel-level ground truth, revealing that performance of in-house model was much higher at AUC=0.87 for diffuse COVID-19 infection size >2.65% lung volume versus AUC=0.52 for focal disease with <2.65% volume. The virtual imaging framework enabled these uniquely rigorous analyses of model performance.
△ Less
Submitted 6 March, 2022;
originally announced March 2022.
-
Quality or Quantity: Toward a Unified Approach for Multi-organ Segmentation in Body CT
Authors:
Fakrul Islam Tushar,
Husam Nujaim,
Wanyi Fu,
Ehsan Abadi,
Maciej A. Mazurowski,
Ehsan Samei,
William P. Segars,
Joseph Y. Lo
Abstract:
Organ segmentation of medical images is a key step in virtual imaging trials. However, organ segmentation datasets are limited in terms of quality (because labels cover only a few organs) and quantity (since case numbers are limited). In this study, we explored the tradeoffs between quality and quantity. Our goal is to create a unified approach for multi-organ segmentation of body CT, which will f…
▽ More
Organ segmentation of medical images is a key step in virtual imaging trials. However, organ segmentation datasets are limited in terms of quality (because labels cover only a few organs) and quantity (since case numbers are limited). In this study, we explored the tradeoffs between quality and quantity. Our goal is to create a unified approach for multi-organ segmentation of body CT, which will facilitate the creation of large numbers of accurate virtual phantoms. Initially, we compared two segmentation architectures, 3D-Unet and DenseVNet, which were trained using XCAT data that is fully labeled with 22 organs, and chose the 3D-Unet as the better performing model. We used the XCAT-trained model to generate pseudo-labels for the CT-ORG dataset that has only 7 organs segmented. We performed two experiments: First, we trained 3D-UNet model on the XCAT dataset, representing quality data, and tested it on both XCAT and CT-ORG datasets. Second, we trained 3D-UNet after including the CT-ORG dataset into the training set to have more quantity. Performance improved for segmentation in the organs where we have true labels in both datasets and degraded when relying on pseudo-labels. When organs were labeled in both datasets, Exp-2 improved Average DSC in XCAT and CT-ORG by 1. This demonstrates that quality data is the key to improving the model's performance.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
iPhantom: a framework for automated creation of individualized computational phantoms and its application to CT organ dosimetry
Authors:
Wanyi Fu,
Shobhit Sharma,
Ehsan Abadi,
Alexandros-Stavros Iliopoulos,
Qi Wang,
Joseph Y. Lo,
Xiaobai Sun,
William P. Segars,
Ehsan Samei
Abstract:
Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: From patient CT images, iPhantom segments selected anchor organs (e.g. liver, bones, pancreas)…
▽ More
Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: From patient CT images, iPhantom segments selected anchor organs (e.g. liver, bones, pancreas) using a learning-based model developed for multi-organ CT segmentation. Organs challenging to segment (e.g. intestines) are incorporated from a matched phantom template, using a diffeomorphic registration model developed for multi-organ phantom-voxels. The resulting full-patient phantoms are used to assess organ doses during routine CT exams. Result: iPhantom was validated on both the XCAT (n=50) and an independent clinical (n=10) dataset with similar accuracy. iPhantom precisely predicted all organ locations with good accuracy of Dice Similarity Coefficients (DSC) >0.6 for anchor organs and DSC of 0.3-0.9 for all other organs. iPhantom showed less than 10% dose errors for the majority of organs, which was notably superior to the state-of-the-art baseline method (20-35% dose errors). Conclusion: iPhantom enables automated and accurate creation of patient-specific phantoms and, for the first time, provides sufficient and automated patient-specific dose estimates for CT dosimetry. Significance: The new framework brings the creation and application of CHPs to the level of individual CHPs through automation, achieving a wider and precise organ localization, paving the way for clinical monitoring, and personalized optimization, and large-scale research.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Classification of Multiple Diseases on Body CT Scans using Weakly Supervised Deep Learning
Authors:
Fakrul Islam Tushar,
Vincent M. D'Anniballe,
Rui Hou,
Maciej A. Mazurowski,
Wanyi Fu,
Ehsan Samei,
Geoffrey D. Rubin,
Joseph Y. Lo
Abstract:
Purpose: To design multi-disease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports.Materials & Methods: This retrospective study included a total of 12,092 patients (mean age 57 +- 18; 6,172 women) for model development and testing (from 2012-2017). Rule-based algorithms were used to extract 19,225 disease labels from 1…
▽ More
Purpose: To design multi-disease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports.Materials & Methods: This retrospective study included a total of 12,092 patients (mean age 57 +- 18; 6,172 women) for model development and testing (from 2012-2017). Rule-based algorithms were used to extract 19,225 disease labels from 13,667 body CT scans from 12,092 patients. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura; liver and gallbladder; and kidneys and ureters. For each organ, a three-dimensional convolutional neural network classified no apparent disease versus four common diseases for a total of 15 different labels across all three models. Testing was performed on a subset of 2,158 CT volumes relative to 2,875 manually derived reference labels from 2133 patients (mean age 58 +- 18;1079 women). Performance was reported as receiver operating characteristic area under the curve (AUC) with 95% confidence intervals by the DeLong method. Results: Manual validation of the extracted labels confirmed 91% to 99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were: atelectasis 0.77 (95% CI: 0.74, 0.81), nodule 0.65 (0.61, 0.69), emphysema 0.89 (0.86, 0.92), effusion 0.97 (0.96, 0.98), and no apparent disease 0.89 (0.87, 0.91). AUCs for liver and gallbladder were: hepatobiliary calcification 0.62 (95% CI: 0.56, 0.67), lesion 0.73 (0.69, 0.77), dilation 0.87 (0.84, 0.90), fatty 0.89 (0.86, 0.92), and no apparent disease 0.82 (0.78, 0.85). AUCs for kidneys and ureters were: stone 0.83 (95% CI: 0.79, 0.87), atrophy 0.92 (0.89, 0.94), lesion 0.68 (0.64, 0.72), cyst 0.70 (0.66, 0.73), and no apparent disease 0.79 (0.75, 0.83). Conclusion: Weakly-supervised deep learning models were able to classify diverse diseases in multiple organ systems.
△ Less
Submitted 16 November, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Automatic phantom test pattern classification through transfer learning with deep neural networks
Authors:
Rafael B. Fricks,
Justin Solomon,
Ehsan Samei
Abstract:
Imaging phantoms are test patterns used to measure image quality in computer tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex) provides test patterns for estimating the task transfer function (TTF) or noise power spectrum (NPF) and simulates different patient sizes. Determining which image slices are suitable for analysis currently requires manual annotation of these patter…
▽ More
Imaging phantoms are test patterns used to measure image quality in computer tomography (CT) systems. A new phantom platform (Mercury Phantom, Gammex) provides test patterns for estimating the task transfer function (TTF) or noise power spectrum (NPF) and simulates different patient sizes. Determining which image slices are suitable for analysis currently requires manual annotation of these patterns by an expert, as subtle defects may make an image unsuitable for measurement. We propose a method of automatically classifying these test patterns in a series of phantom images using deep learning techniques. By adapting a convolutional neural network based on the VGG19 architecture with weights trained on ImageNet, we use transfer learning to produce a classifier for this domain. The classifier is trained and evaluated with over 3,500 phantom images acquired at a university medical center. Input channels for color images are successfully adapted to convey contextual information for phantom images. A series of ablation studies are employed to verify design aspects of the classifier and evaluate its performance under varying training conditions. Our solution makes extensive use of image augmentation to produce a classifier that accurately classifies typical phantom images with 98% accuracy, while maintaining as much as 86% accuracy when the phantom is improperly imaged.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.