11institutetext: Department of Medical Imaging, Radboudumc, Nijmegen, The Netherlands 22institutetext: Fraunhofer MEVIS, Lübeck, Germany 33institutetext: Institut für Medizinische Informatik, Universität zu Lübeck, Germany 44institutetext: Department of Radiology, University Medical Center Groningen, The Netherlands 55institutetext: Department of Radiology, Netherlands Cancer Institute, The Netherlands 66institutetext: Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Norway’s

Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis

Alessa Hering 1122    Sarah de Boer 11    Anindo Saha 11    Jasper J. Twilt 11    Mattias P. Heinrich 33    Derya Yakar 4455    Maarten de Rooij 11    Henkjan Huisman 1166    Joeran S. Bosma 1133
Abstract

The PI-CAI (Prostate Imaging: Cancer AI) challenge led to expert-level diagnostic algorithms for clinically significant prostate cancer detection. The algorithms receive biparametric MRI scans as input, which consist of T2-weighted and diffusion-weighted scans. These scans can be misaligned due to multiple factors in the scanning process. Image registration can alleviate this issue by predicting the deformation between the sequences. We investigate the effect of image registration on the diagnostic performance of AI-based prostate cancer diagnosis. First, the image registration algorithm, developed in MeVisLab, is analyzed using a dataset with paired lesion annotations. Second, the effect on diagnosis is evaluated by comparing case-level cancer diagnosis performance between using the original dataset, rigidly aligned diffusion-weighted scans, or deformably aligned diffusion-weighted scans. Rigid registration showed no improvement. Deformable registration demonstrated a substantial improvement in lesion overlap (+10% median Dice score) and a positive yet non-significant improvement in diagnostic performance (+0.3% AUROC, p=0.18). Our investigation shows that a substantial improvement in lesion alignment does not directly lead to a significant improvement in diagnostic performance. Qualitative analysis indicated that jointly develo** image registration methods and diagnostic AI algorithms could enhance diagnostic accuracy and patient outcomes.

Keywords:
Image Registration Prostate Cancer Artificial Intelligence MRI.

1 Introduction

Prostate cancer (PCa) has 1.4 million new cases each year [21], a high incidence-to-mortality ratio and risks associated with treatment and biopsy; making non-invasive diagnosis of clinically significant prostate cancer (csPCa) crucial to reduce both overtreatment and unnecessary (confirmatory) biopsies [20]. MRI scans provide the best non-invasive diagnosis for prostate cancer [5], for which a 47% increase in demand is expected by 2040 [21]. Due to the world-wide shortage of diagnostic personnel [10], workload efficiency optimization is necessary to maintain healthcare accessibility in high-income countries and improve accessibility in low and middle-income countries.

Computer-aided diagnosis (CAD) can assist radiologists to diagnose csPCa and reduce the radiology workload [23], but the observed workload reduction is limited. Larger workload reduction can be achieved through autonomous operation of diagnostic algorithms. Recent advances resulted in expert-level diagnostic performance for csPCa detection algorithms using biparmetric MRI [18].

Biparametric MRI (bpMRI) consists of T2-weighted (T2W) and diffusion-weighted imaging (DWI), and the DWI is used to calculate the apparent diffusion coefficient (ADC) and typically also the high b-value (HBV) map. T2W and DWI scans are usually acquired in immediate succession in about 15-30 minutes, but slight patient movement and processes like bladder filling can lead to misalignment between sequences [12]. This misalignment results in lesion image features being misaligned between the sequences. For an accurate csPCa diagnosis, the information of both sequences are necessary to consider [22], meaning that csPCa detection algorithms have to combine information from different spatial locations when misalignment occurs. Current state-of-the-art csPCa algorithms use an early fusion strategy for the combination of the different sequences, which may lead to challenges in accurate lesion detection and characterization when the lesion image features are not well aligned [18].

To address this, misalignment in the Prostate Imaging – Cancer Artificial Intelligence (PI-CAI) dataset was manually corrected (85/1000 (8.5%) of the test cases and 54/9107 (0.6%) of training cases), and algorithms were subsequently trained and evaluated on these manually aligned MRI studies [18]. However, manual alignment is labor-intensive, potentially undermining the efficiency gains offered by automated csPCa diagnostic methods when required during inference. Consequently, the efficacy of these algorithms in scenarios where sequences are not manually aligned remains uncertain and might be limited.

During inference, image registration can address the issue of misaligned sequences, by providing a plausible estimation of the patient movement and deformation and thus replacing the manual alignment step. Although the prostate cancer detection research field is vibrant, there has been limited focus on the registration of prostate MRI sequences. To address the issue of global misalignment, [19] proposed an affine registration approach based on prostate gland segmentation and [3] presented a rigid registration based on Mutual Information. For compensating local deformations, both [15] and [14], employed the SimpleITK non-rigid B-Spline registration using Mutual Information. However, the focus of these studies was not on the evaluation of registration performance, resulting in only [14] examining this using the Dice Score of automatically generated prostate segmentations. In contrast, the other studies have assessed registration performance through visual examination of registered ADC images. To the best of our knowledge, only recently [12] explored the impact of image registration on prostate cancer detection performance of algorithms using bpMRI. The results show that the B-Spline registration, which is based on [14], improves the overlap of manually annotated lesions as measured by the Dice score. Additionally, the performance of the downstream task of patient-level csPCa diagnosis measured by the AUROC showed a non-significant improvement from 0.76 to 0.79. These results suggest that registration is a useful preprocessing step in an automated prostate cancer diagnosis pipeline. However, due to limited sample size (only 46 positive cases in the test set) and the lack of external testing, the ability to draw definitive conclusions is hindered.

In this study, we conduct a comprehensive analysis of the impact of image registration on the clinical downstream task of case-level csPCa diagnosis, utilizing two extensive evaluation datasets. Registration accuracy is assessed through the measurement of lesion alignment across an independent dataset comprising 473 cases, each annotated with paired lesions per modality. Further, we evaluate the downstream diagnostic efficacy on an external testing set consisting of 546 cases.

Refer to caption
Figure 1: Overview of our method. The T2W scan is used as fixed image and the ADC map as moving image to find the displacement field using the registration method. The displacement field is applied to the ADC and HBV maps. The registered and original scans are used as input for the PI-CAI AI system (see Section 2.3) to detect clinically significant prostate cancer. The case-level diagnosis performance of the end-to-end pipeline is evaluated and used as a measure of effectiveness.

2 Materials and methods

2.1 Registration

The aim of the image registration approach is to align the DWI maps (ADC and HBV) with the T2W scan (see Figure 1). Since the csPCa detection algorithms resample the DWI maps to the T2W scan, we chose the T2W scan as the fixed image and the ADC map as the moving image for the registration. The image registration algorithm is developed in the MeVisLab framework using the RegLib. We adopt a two-step approach which consists of a rigid registration and a deformable registration. Hereby, the registration pipeline starts with robust methods with fewer degrees of freedom and moves on to more precise, but less robust methods, which require better starting points due to their higher degrees of freedom. The calculated rigid and deformable transformation are applied to both DWI maps.

Let ,:3:superscript3\mathcal{F},\mathcal{M}:\mathbb{R}^{3}\to\mathbb{R}caligraphic_F , caligraphic_M : blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT → blackboard_R denote the fixed image and moving image, respectively, and let Ω3Ωsuperscript3\Omega\subset\mathbb{R}^{3}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT be a domain modeling the field of view of \mathcal{F}caligraphic_F. The registration method aims to compute a deformation y:Ω3:𝑦Ωsuperscript3y:~{}\Omega\to~{}\mathbb{R}^{3}italic_y : roman_Ω → blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT that aligns the fixed image \mathcal{F}caligraphic_F and the moving image \mathcal{M}caligraphic_M on the field of view ΩΩ\Omegaroman_Ω such that (x)𝑥\mathcal{F}(x)caligraphic_F ( italic_x ) and (y(x))𝑦𝑥\mathcal{M}(y(x))caligraphic_M ( italic_y ( italic_x ) ) are similar for xΩ𝑥Ωx\in\Omegaitalic_x ∈ roman_Ω.

Rigid Registration

The rigid registration adopts the method of [17]. We use the normalized gradient field distance measure [8],

𝒟NGF(,(y))=Ω1(y(x)),(x)ϵ2(y(x))ϵ2(x)ϵ2dx,subscript𝒟𝑁𝐺𝐹ysubscriptΩ1subscriptsuperscripty𝑥𝑥2italic-ϵsubscriptsuperscriptnormy𝑥2italic-ϵsubscriptsuperscriptnorm𝑥2italic-ϵd𝑥\mathcal{D}_{NGF}(\mathcal{F},\mathcal{M}(\text{y}))=\int_{\Omega}1-\frac{% \langle\nabla\mathcal{M}(\text{y}(x)),\nabla\mathcal{F}(x)\rangle^{2}_{% \epsilon}}{\|\nabla\mathcal{M}(\text{y}(x))\|^{2}_{\epsilon}\|\nabla\mathcal{F% }(x)\|^{2}_{\epsilon}}\,\text{d}x,caligraphic_D start_POSTSUBSCRIPT italic_N italic_G italic_F end_POSTSUBSCRIPT ( caligraphic_F , caligraphic_M ( y ) ) = ∫ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT 1 - divide start_ARG ⟨ ∇ caligraphic_M ( y ( italic_x ) ) , ∇ caligraphic_F ( italic_x ) ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG start_ARG ∥ ∇ caligraphic_M ( y ( italic_x ) ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT ∥ ∇ caligraphic_F ( italic_x ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG d italic_x ,

with f,gϵ:=j=13(fjgj+ϵ2)assignsubscript𝑓𝑔italic-ϵsuperscriptsubscript𝑗13subscript𝑓𝑗subscript𝑔𝑗superscriptitalic-ϵ2\langle f,g\rangle_{\epsilon}:=\sum_{j=1}^{3}(f_{j}g_{j}+\epsilon^{2})⟨ italic_f , italic_g ⟩ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_f start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ϵ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), fϵ:=f,fϵassignsubscriptnorm𝑓italic-ϵsubscript𝑓𝑓italic-ϵ\|f\|_{\epsilon}:=\sqrt{\langle f,f\rangle_{\epsilon}}∥ italic_f ∥ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT := square-root start_ARG ⟨ italic_f , italic_f ⟩ start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT end_ARG, that focuses on the alignment of image gradients of the fixed image \mathcal{F}caligraphic_F and the deformed moving image (y)𝑦\mathcal{M}(y)caligraphic_M ( italic_y ). The edge hyper-parameter ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 is used to suppress small image noise, without affecting image edges. The optimization problem is solved using a Gauss-Newton optimization scheme and is embedded into a multi-level scheme with two levels.

Deformable Registration

We deploy the matrix-free deformable registration of [11]. The deformation is defined as a minimizer of the cost function

miny𝒟(,(y))+α(y),subscript𝑦𝒟𝑦𝛼𝑦\min_{y}\mathcal{D}(\mathcal{F},\mathcal{M}(y))+\alpha\mathcal{R}(y),roman_min start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT caligraphic_D ( caligraphic_F , caligraphic_M ( italic_y ) ) + italic_α caligraphic_R ( italic_y ) ,

with the normalized gradient field distance measure 𝒟NGFsubscript𝒟NGF\mathcal{D}_{\text{NGF}}caligraphic_D start_POSTSUBSCRIPT NGF end_POSTSUBSCRIPT. To focus the registration to inside the prostate, we restrict ΩΩ\Omegaroman_Ω to the support of the prostate segmentation of the fixed image, which is automatically generated with the prostate segmentation algorithm provided by [18]. The second-order curvature regularizer curvsuperscript𝑐𝑢𝑟𝑣\mathcal{R}^{curv}caligraphic_R start_POSTSUPERSCRIPT italic_c italic_u italic_r italic_v end_POSTSUPERSCRIPT  [7] enforces smooth deformation by penalizing spatial derivatives. The parameter α𝛼\alphaitalic_α is a weighting factor of the regularizer. The optimization problem is solved using the limited-memory Broyden-Fletcher-Goldfarb-Shannon (L-BFGS) optimization scheme [13]. Optimization was performed in a multi-level scheme with two levels on images with successively declining levels of smoothing to guide registration from larger structures to smaller refinements. During each registration level a grid size of the displacement field of 31×31×3131313131\times 31\times 3131 × 31 × 31 is used to warp the moving image using trilinear interpolation. The deformable registration uses the output of the rigid registration as an initial starting point. Hyperparameters of the registration method were experimentally set using the first ten cases of the PI-CAI public training dataset.

2.2 Data

Three datasets with bpMRI scans (axial T2W, ADC and HBV (b \geq 1000) imaging) for prostate cancer detection were used. For each dataset, the reference standard was set by histopathology, with clinically significant prostate cancer defined as ISUP 2-5 (intermediate to very high risk) [6]. Informed consent was waived, given the retrospective scientific use of deidentified patient data. Scan characteristics are given in the supplementary material.

PI-CAI:

For csPCa detection model development, 10,207 cases of 9129 patients from 10 Dutch hospitals and 1 Norwegian hospital were used [18]. Cases were acquired using 1.5 or 3-Tesla MRI scanners between 2012 and 2021 from patients with suspicion of harboring prostate cancer. Exclusion criteria included prior prostate-specific treatment, prior ISUP 2absent2\geq 2≥ 2 findings, incomplete studies, and diagnostically insufficient image quality. Manual voxel-level annotations were available for 1175 positive training cases (1323 lesions) and for an additional 892 positive training cases (1037 lesions) AI-derived voxel-level annotations were provided.

PCNN:

For testing of the registration algorithm, cases from the PI-CAI training set with manual voxel-level annotations per modality (T2W and ADC) were included. This selected 473 cases of 438 patients from Prostaat Centrum Noord-Nederland (PCNN) (8 hospitals).

PROMIS:

For external testing, 546 cases of 546 patients from 11 United Kingdom hospitals were included [2]. Cases were acquired using 1.5-Tesla MRI scanners between 2012 and 2015 from patients with suspicion of harboring prostate cancer. Exclusion criteria included prior prostate treatment, prior biopsies, incomplete studies, and diagnostically insufficient image quality. No manual voxel-level annotations were available.

2.3 PI-CAI AI system

The PI-CAI AI system was developed in the PI-CAI challenge. The algorithm is the ensemble of the top 5 submissions, selected based on testing with 1000 cases. The models were trained using a dataset of 9107 cases. The algorithm uses the axial T2W, ADC and HBV scans and clinical variables (e.g. PSA density). The U-Net backbone was predominantly used, with early fusion of the scans. For additional details on the data and each of the top 5 submissions, see [18]. No retraining of the AI system was performed in this study.

3 Experiments

The aim of this study is to evaluate the effect of image registration on the clinical downstream task of case-level csPCa diagnosis. To quantify the algorithm’s performance degradation under severe and extreme misalignment conditions, we artificially misaligned the T2W and DWI images in two severity steps.

On the original data, we evaluated the registration performance by measuring lesion alignment and the plausibility of the displacement field. Subsequently, we employed the csPCa detection algorithms developed in the PI-CAI challenge for the diagnostic evaluation. For both experiments, we compare the results on three dataset variants: the original dataset, the dataset with rigidly aligned T2W and DWI scans, and the dataset with deformably aligned T2W and DWI scans.

3.1 Impact of synthetic misalignment

To investigate the impact of MRI sequence misalignment on the performance of a clinically significant prostate cancer detection algorithm, we conducted two synthetic misalignment tests:

Severe misalignment: DWI scans were translated in the z-direction by a random selection from {2,1,0,1,2}21012\{-2,-1,0,1,2\}{ - 2 , - 1 , 0 , 1 , 2 } slices and in-plane by {5,4,,5}545\{-5,-4,\dots,5\}{ - 5 , - 4 , … , 5 } voxels in both the x𝑥xitalic_x and y𝑦yitalic_y directions.

Extreme misalignment: DWI scans were translated by a random selection from 55-5- 5 or +55+5+ 5 slices in the z-direction and by 1010-10- 10 or +1010+10+ 10 voxels in both the x𝑥xitalic_x and y𝑦yitalic_y directions.

3.2 Registration performance

The evaluation of registration performance was conducted using the PCNN validation dataset, chosen for its availability of lesion annotations across both T2W and ADC scans. The hyperparameters for the registration method were manually fine-tuned using only the first 10 cases from the PI-CAI Public Training and Development dataset, which did not overlap with this PCNN dataset. Therefore, the PCNN dataset serves as an independent evaluation set for assessing the registration performance.

To quantitatively assess the quality of image registration in the absence of reference displacement fields, we employed two surrogate metrics. The Dice coefficient was utilized to quantify the overlap of lesion segmentations between T2W scans and ADC maps. Although we recognize that the Dice coefficient may not be the perfect metric for assessing the registration performance [16], its usage is justified in this context given the critical importance of accurate lesion alignment in T2W scans and ADC maps for the reliable performance of csPCa detection algorithms. The choice of the Dice score, therefore, aligns with our objective to prioritize lesion alignment in the evaluation of registration effectiveness. Smooth deformations within the prostate are important to preserve diagnostic features, therefore we evaluated the plausibility of the displacement field by examining the percentage of voxels exhibiting folding within the prostate region of the predicted deformation field.

csPCa detection performance

Diagnostic performance is assessed using the area under the receiver operator characteristic curve (AUROC). For case-level risk estimation of significant cancer, we utilized voxel-level detection maps generated by the PI-CAI AI system on the external PROMIS test dataset.

Additionally, we evaluated diagnostic performance using the PCNN dataset, to facilitate an evaluation of diagnostic performance in relation to the registration accuracy. We note that this dataset is not independent for the diagnostic algorithms, since this is a subset of the training data of the algorithms.

Since the PROMIS dataset contains scans with very large field-of-views with anatomical structures not present in the PI-CAI training dataset, we filtered out lesion predictions further than 3 mm away from the prostate segmentation. Following the approach used in the PI-CAI challenge, each algorithm’s case-level prediction was the maximum lesion-level prediction, and the AI system’s case-level prediction was the equally-weighted prediction of each algorithm.

Statistical analysis

The diagnostic performance differences on the external testing set were statistically analyzed. The performance with the deformably and rigidly aligned images are compared against the performance with the original dataset. To determine the probability of one configuration outperforming another, we performed DeLong’s test [4]. Multiplicity was corrected for using the Holm-Bonferroni method, with a base alpha value of 0.05. For details, see the pre-defined statistical analysis plan online [9].

4 Results

4.1 Impact of misalignment

In Figure 2, the impact of misalignment on the performance of the clinically significant prostate cancer detection algorithm is illustrated. When a severe misalignment is introduced, the AUC decreases from 0.793 to 0.720. In the case of extreme misalignment, the AUC further drops to 0.487.

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption T2W ADC misaligned
Figure 2: This figure demonstrates the impact of synthetic misalignment. (left) the diagnostic performance of the PI-CAI AI system is shown on the PROMIS dataset. When a severe misalignment is introduced, the AUC decreases from 0.793 to 0.720. In the case of extreme misalignment, the AUC further drops to 0.487. (right) shows the T2W and misaligned ADC (top: severe misalignment, bottom: extreme misalignment) images, with   prostate gland contour of the T2W scan.

4.2 Quantitative results

Refer to caption
Figure 3: Quantitative registration results. (left) Distribution of Dice scores between the lesion annotation on the T2W and ADC scans for the original, rigidly aligned, and deformably aligned PCNN datasets. (right) Model performance for the PI-CAI AI system with the original, rigidly aligned and deformably aligned PROMIS datasets.
Registration performance

The median Dice score improved to 0.58 with deformable registration, compared to 0.48 for the original dataset and 0.47 with rigid registration (Figure 3).

For one case, 1% of voxels in the prostate were folded. For all other cases, no foldings occurred in the deformation field.

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption   

Label: ISUP 1

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption   

Label: ISUP 2

T2W ADC original ADC deformable PM original PM deformable
Figure 4: Qualitative registration results showing two exemplary cases with   prostate gland,   lesion annotated on T2,   lesion annotated on ADC. In the last two columns, the prediction maps (PM) generated with the original dataset and the deformably aligned dataset are overlayed on the T2W scan.
csPCa diagnosis performance

For the PROMIS external testing dataset, the PI-CAI AI system showed a positive yet non-significant improvement in diagnostic performance (+0.3% AUROC, p=0.18) with deformably aligned scans compared to the original dataset. A comprehensive qualitative analysis of representative cases is given in Section 4.3.

4.3 Qualitative results

In this section, we present qualitative results of the image registration and subsequent csPCa detection algorithm. Results for the PCNN dataset are shown in Figure 4, showing the case with the largest improvement in Dice score (first row) and the largest decrease in Dice score (second row) for the deformably aligned dataset, compared to the original dataset alongside the clinical interpretation for each case.

The first row shows the images with mild benign prostatic hyperplasia (BPH) in the transition zone. BPH is a benign condition, which grows over time. A typical transition zone with BPH shows so-called ‘organized chaos’, with multiple nodules with variable imaging appearance, often with diffusion restriction and enhancement. In transition zone tumors, the typical encapsulation is lost, and the organized aspect changes to a homogeneous low T2W signal with marked diffusion restriction and vivid enhancement. In the left transition zone of this patient, an encapsulated BPH nodule is annotated in yellow on T2W, with low T2W signal intensity. On the ADC map an area with marked diffusion restriction is annotated in red. A notable discrepancy is observed in the alignment between the T2W and ADC imaging, leading to misalignment between the lesion’s features on the T2W and ADC scans. The encapsulated nature on T2W of this nodule is a non-suspicious sign in the transition zone. Consequently, the PI-CAI AI system with the original scans classifies the lesion in the middle of the transition zone instead of within an encapsulated nodule more laterally due to the misalignment, which suggests a higher risk level (prediction=0.63). The deformable image registration method aligned the two modalities, and the PI-CAI AI system with the deformably aligned scans assigned a lower risk level (prediction=0.47). Targeted biopsies revealed ISUP 1 in the left transition zone, which is an indolent prostate cancer that is often invisible on prostate MRI.

The second row shows the images for a 66-year-old man with a PSA level of 13 ng/mL and a PSA density of 0.11 ng/mL/cc. The images show a tumor suspicious area ventral in the apex of the prostate close to the anterior fibromusclar stroma (AFMS) ventral to the transition zone of the prostate. The delineation of the lesion mask was guided by the image features observed in the ADC scan and was subsequently adopted for the T2W scan as well. Upon reconsideration of the lesion segmentation with two radiologists, it appears that the extension into the AFMS is due to oversegmentation, rather than the lesion infiltrating the AFMS. As such, the model predictions capture the lesion extent very well. The prediction with the original dataset had a bit higher confidence (0.62 vs 0.56) for this positive case. Targeted biopsy of this area revealed ISUP 2 prostate cancer.

In Figure 5, qualitative results on the PROMIS dataset are shown, which are explained in the following in more detail.

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption   

Label: ISUP 2

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption   

Label: ISUP 1

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption   

Label: ISUP 3

T2W ADC original ADC deformable PM original PM deformable
Figure 5: Qualitative results on the PROMIS data set. The T2, ADC, and deformably aligned ADC are shown with   prostate gland. In the last two columns, the prediction maps (PM) generated with the original dataset and the deformbly aligned dataset are overlayed on the T2W image. The label shows the ISUP grade, where 1 is indolent cancer (negative), and 2absent2\geq 2≥ 2 is intermediate to high-risk cancer (positive). The first two cases were selected to have the largest prediction increase and decrease for the deformably aligned dataset, compared to the original datasets, for cases with a case-level prediction above 0.3, respectively. The third case was a failure case with the deformably aligned scans.

The first row shows images with a well-defined lesion in the left peripheral zone midprostate, with low signal intensity on T2W, and low signal intensity on the ADC map, consistent with a suspicious lesion (PI-RADS 4). The T2W and ADC map are misaligned, both in-plane and through-plane, resulting in the diffusion restriction on the original ADC map to be misaligned with the lesion features on the T2W sequence. The PI-CAI AI system identified the lesion with both variants of the dataset. With the deformably aligned dataset the algorithm confidence increased to 0.51, from a prediction of 0.36 before. Histopathological evaluation confirmed the aggressive nature of this lesion (ISUP 2).

The second row shows a T2W and ADC map that are misaligned in-plane. Consequently, a substantial part of the prostate on the ADC map appears outside of the prostate region of the T2W sequence. The ADC map shows diffusion restriction (low signal intensity; darker appearance) in the right transition zone midprostate. Due to the misalignment, this darker area on the original ADC map appears to be in the right peripheral zone on the T2W scan, and therefore misclassification can occur. After deformable alignment, the darker area on the ADC map aligns with the transition zone instead of peripheral zone. This is reflected in the lesion detection of the PI-CAI AI system, which predicts a lesion with confidence of 0.44 with the original scans and with a confidence of 0.18 with the deformably aligned scans. Targeted biopsies revealed ISUP 1 in the right transition zone, which is an indolent prostate cancer that is often invisible on prostate MRI. No aggressive PCa was detected.

The third row shows images with a small lesion in the right peripheral zone midprostate. The lesion appears as a well circumscribed area with low signal intensity (dark) on T2W images and the ADC map, suspicious for clinically significant cancer (PI-RADS 4). For this case, the deformable image registration slightly misaligned the T2W and ADC image features of the lesion, which resulted in the detection algorithm to decrease its lesion prediction from 0.49 to 0.39. Histopathological evaluation confirmed the aggressive nature of this lesion (ISUP 3).

5 Discussion and conclusion

In this study, we investigated the effect of image registration on the clinical downstream task of case-level csPCa diagnosis when integrated at the inference stage. Deformable registration demonstrated a substantial improvement in lesion overlap on the validation dataset (+9% average Dice score) which is even slightly more than the one reported in [12] (+6% average Dice score). However, since different datasets were used, a direct comparison is not possible. Moreover, the Dice score achieved with deformable registration on the validation dataset was 0.58. This performance aligns closely with the inter-rater agreement typically observed in prostate tumor segmentation. Specifically, between two radiologists independently segmenting csPCa lesions using the same modalities, the observed Dice score was 0.60 [1].

Additionally, we showed a positive yet non-significant improvement in diagnostic performance on the PROMIS test dataset (+0.3% AUROC, p=0.18) with deformably aligned scans. Our investigation shows that a substantial improvement in lesion alignment does not directly equal a significant improvement in diagnostic performance. To illustrate the impact of misalignment on the algorithmic results, we present detailed visualizations and analyses of several PCNN and PROMIS cases in Section 4.3. These results showed that the PI-CAI AI system demonstrated robustness to minor misalignments, particularly when these misalignments did not result in lesions being misrepresented in incorrect zones. Additionally, we anticipate a comparable number of misaligned cases in the PROMIS dataset as observed in the PI-CAI dataset, where the incidence was low. Therefore, the expected improvement in AUROC is limited. The positive yet non-significant improvement in diagnostic performance might be the result from those cases.

Our method had limitations. The deformable registration method potentially introduced unrecognized artifacts into the images which might result in worse diagnostic performance. Addressing this through retraining the csPCa algorithms to adapt to registration-induced image variations represents a promising strategy. It is crucial to note that the registration method avoided the generation of physiological unrealistic deformations. This is achieved by applying a high regularization weight to obtain smooth and plausible displacement fields. Another critical aspect is the choice of resampling strategy. This factor considerably impacts the smoothing of ADC values, especially for small lesions, and influences the diagnostic quality of images through the effects of multiple resamples. Merging all resampling steps into one would visibly increase the quality, but is only possible in an end-to-end approach.

The relevance of the PROMIS dataset in present-day analyses has been a subject of debate, particularly among radiologists. The diagnostic quality of MRI scans has markedly improved since the trial finished in 2015. Additionally, the PROMIS dataset contains acquired high b-value scans, while contemporary protocols calculate this based on acquired lower b-value scans, which results in less noise and better diagnostic quality. Despite these limitations, the relevance of the PROMIS dataset should not be understated. This dataset can serve as a benchmark for scenarios where access to high-end, expensive scanners is limited. This situation is a common reality for many institutions, highlighting the importance of develo** algorithms that can perform well across a range of image acquisition methods.

In conclusion, our study shows that while image registration can substantially improve lesion overlap in csPCa diagnosis, it does not directly lead to a significant improvement in diagnostic performance. However, the qualitative analysis showed promising results and indicate that joint development of image registration methods and diagnostic AI algorithms could enhance diagnostic accuracy and patient outcome.

{credits}

5.0.1 Acknowledgements

This research is funded by the European Union under HORIZON-HLTH-2022: COMFORT (101079894), Health Holland (LSHM20103), European Union H2020: ProCAncer-I project (952159), European Union H2020: PANCAIM project (101016851), and Siemens Healthineers (CID: C00225450). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Health and Digital Executive Agency (HADEA). Neither the European Union nor the granting authority can be held responsible for them.

The PROMIS data used in the analysis for this manuscript were provided from the PROMIS study, led by University College London (UCL). PROMIS was funded by the UK Government Department of Health, National Institute of Health Research–Health Technology Assessment Programme, (Project number 09/22/67). Support was also provided by National Institute for Health Research (NIHR) UCLH/UCL Biomedical Research Centre, National Institute for Health Research (NIHR) The Royal Marsden and Institute for Cancer Research Biomedical Research Centre and National Institute for Health Research (NIHR) Imperial Biomedical Research Centre. The original PROMIS study was coordinated by the Medical Research Council Clinical Trials Unit (MRC CTU) at UCL and sponsored by UCL. The PROMIS Biobank was funded by Prostate Cancer UK (PG10-17). The PROMIS dataset and the biobank is under the research governance of the ReIMAGINE Risk Trial Management Group (funded by Medical Research Council (UKRI) and Cancer Research UK: MR/R014043/1).

References

  • [1] Adams, L.C., Makowski, M.R., Engel, G., Rattunde, M., Busch, F., Asbach, P., Niehues, S.M., Vinayahalingam, S., van Ginneken, B., Litjens, G., et al.: Prostate158-an expert-annotated 3t mri dataset and algorithm for prostate cancer detection. Computers in Biology and Medicine 148, 105817 (2022)
  • [2] Ahmed, H.U., Bosaily, A.E.S., Brown, L.C., Gabe, R., Kaplan, R., Parmar, M.K., Collaco-Moraes, Y., Ward, K., Hindley, R.G., Freeman, A., et al.: Diagnostic accuracy of multi-parametric MRI and TRUS biopsy in prostate cancer (PROMIS): a paired validating confirmatory study. The Lancet 389(10071), 815–822 (2017)
  • [3] De Vente, C., Vos, P., Hosseinzadeh, M., Pluim, J., Veta, M.: Deep learning regression for prostate cancer detection and grading in bi-parametric MRI. IEEE Transactions on Biomedical Engineering 68(2), 374–383 (2020)
  • [4] DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics pp. 837–845 (1988)
  • [5] Eldred-Evans, D., Burak, P., Connor, M.J., Day, E., Evans, M., Fiorentino, F., Gammon, M., Hosking-Jervis, F., Klimowska-Nassar, N., McGuire, W., Padhani, A.R., Prevost, A.T., Price, D., Sokhi, H., Tam, H.H., Winkler, M., Ahmed, H.U.: Population-based prostate cancer screening with magnetic resonance imaging or ultrasonography. JAMA Oncology 7, 395 – 402 (2021)
  • [6] Epstein, J.I., Egevad, L., Amin, M.B., Delahunt, B., Srigley, J.R., Humphrey, P.A.: The 2014 international society of urological pathology (ISUP) consensus conference on gleason grading of prostatic carcinoma: Definition of grading patterns and proposal for a new grading system. The American Journal of Surgical Pathology 40, 244–252 (2016)
  • [7] Fischer, B., Modersitzki, J.: Curvature based image registration. Journal of Mathematical Imaging and Vision 18(1), 81–85 (2003)
  • [8] Haber, E., Modersitzki, J.: Intensity gradient based registration and fusion of multi-modal images. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2006. vol. 3216, pp. 591–598 (2006)
  • [9] Hering, A., de Boer, S., Saha, A., Twilt, J.J., Heinrich, M.P., Yaker, D., de Rooij, M., Huisman, H., Bosma, J.S.: Statistical Analysis Plan - Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis (Jun 2024). https://doi.org/10.5281/zenodo.12170878, https://doi.org/10.5281/zenodo.12170878
  • [10] Hricak, H., Abdel-Wahab, M., Atun, R., Lette, M.M., Paez, D., Brink, J.A., Donoso-Bach, L., Frija, G., Hierath, M., Holmberg, O., et al.: Medical imaging and nuclear medicine: a lancet oncology commission. The Lancet Oncology 22(4), e136–e172 (2021)
  • [11] König, L., Rühaak, J., Derksen, A., Lellmann, J.: A matrix-free approach to parallel and memory-efficient deformable image registration. SIAM Journal on Scientific Computing 40(3), B858–B888 (2018)
  • [12] Kovacs, B., Netzer, N., Baumgartner, M., Schrader, A., Isensee, F., Weißer, C., Wolf, I., Görtz, M., Jaeger, P.F., Schütz, V., et al.: Addressing image misalignments in multi-parametric prostate MRI for enhanced computer-aided diagnosis of prostate cancer. Scientific Reports 13(1), 19805 (2023)
  • [13] Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Mathematical programming 45(1-3), 503–528 (1989)
  • [14] Netzer, N., Weißer, C., Schelb, P., Wang, X., Qin, X., Görtz, M., Schütz, V., Radtke, J.P., Hielscher, T., Schwab, C., et al.: Fully automatic deep learning in bi-institutional prostate magnetic resonance imaging: effects of cohort size and heterogeneity. Investigative radiology 56(12), 799–808 (2021)
  • [15] Pellicer-Valero, O.J., Marenco Jimenez, J.L., Gonzalez-Perez, V., Casanova Ramon-Borja, J.L., Martin Garcia, I., Barrios Benito, M., Pelechano Gomez, P., Rubio-Briones, J., Rupérez, M.J., Martín-Guerrero, J.D.: Deep learning for fully automatic detection, segmentation, and gleason grade estimation of prostate cancer in multiparametric magnetic resonance images. Scientific reports 12(1),  2975 (2022)
  • [16] Rohlfing, T.: Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE transactions on medical imaging 31(2), 153–163 (2011)
  • [17] Rühaak, J., König, L., Tramnitzke, F., Köstler, H., Modersitzki, J.: A matrix-free approach to efficient affine-linear image registration on CPU and GPU. Journal of Real-Time Image Processing 13, 205–225 (2017)
  • [18] Saha, A., Bosma, J.S., Twilt, J.J., van Ginneken, B., Bjartell, A., Padhani, A.R., Bonekamp, D., Villeirs, G., Salomon, G., Giannarini, G., Kalpathy-Cramer, J., Barentsz, J., Maier-Hein, K.H., Rusu, M., Rouvière, O., van den Bergh, R., Panebianco, V., Kasivisvanathan, V., Obuchowski, N.A., Yakar, D., Elschot, M., Veltman, J., Fütterer, J.J., de Rooij, M., Huisman, H., the PI-CAI consortium: Artificial intelligence and radiologists in prostate cancer detection on mri (pi-cai): an international, paired, non-inferiority, confirmatory study. The Lancet Oncology (2024). https://doi.org/https://doi.org/10.1016/S1470-2045(24)00220-1, https://www.sciencedirect.com/science/article/pii/S1470204524002201
  • [19] Sanyal, J., Banerjee, I., Hahn, L., Rubin, D.: An automated two-step pipeline for aggressive prostate lesion detection from multi-parametric MR sequence. AMIA Summits on Translational Science Proceedings 2020,  552 (2020)
  • [20] Stavrinides, V., Giganti, F., Emberton, M., Moore, C.M.: MRI in active surveillance: a critical review. Prostate Cancer and Prostatic Diseases 22(1), 5–15 (2019)
  • [21] Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., Bray, F.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians 71(3), 209–249 (2021)
  • [22] Weinreb, J.C., Barentsz, J.O., Choyke, P.L., Cornud, F., Haider, M.A., Macura, K.J., Margolis, D., Schnall, M.D., Shtern, F., Tempany, C.M., et al.: Pi-rads prostate imaging–reporting and data system: 2015, version 2. European urology 69(1), 16–40 (2016)
  • [23] Winkel, D.J., Tong, A., Lou, B., Kamen, A., Comaniciu, D., Disselhorst, J.A., Rodríguez-Ruiz, A., Huisman, H., Szolar, D., Shabunin, I., et al.: A novel deep learning based computer-aided diagnosis system improves the accuracy and efficiency of radiologists in reading biparametric magnetic resonance images of the prostate: results of a multireader, multicase study. Investigative radiology 56(10), 605–613 (2021)

Appendices

Appendix A Scan characteristics

Table A: Scan characteristics showing the median, (95% confidence interval) and [min-max] in voxels or mm/voxel.
PI-CAI PCNN PROMIS
T2W in-plane size 640 (320, 1024) [256, 1078] 1024 (296, 1024) [256, 1024] 512 (256, 512) [256, 640]
T2W number of slices 21 (19, 29) [15, 45] 27 (20, 35) [15, 45] 26 (23, 38) [15, 94]
T2W in-plane resolution 0.3 (0.3, 0.6) [0.2, 0.8] 0.3 (0.2, 0.7) [0.2, 0.8] 0.4 (0.4, 0.8) [0.4, 0.9]
T2W slice thickness 3.6 (3.0, 3.6) [1.3, 5.0] 3.0 (3.0, 4.8) [2.2, 4.8] 3.3 (3.3, 3.6) [0.8, 6.5]
ADC in-plane size 128 (102, 256) [70, 336] 240 (114, 256) [108, 336] 172 (128, 172) [126, 256]
ADC number of slices 21 (19, 29) [11, 41] 27 (11, 33) [11, 41] 13 (11, 19) [11, 24]
ADC in-plane resolution 2.0 (1.4, 2.0) [0.9, 2.6] 1.4 (0.9, 1.9) [0.9, 2.0] 1.5 (1.5, 1.7) [1.1, 2.0]
ADC slice thickness 3.6 (3.0, 3.6) [3.0, 5.8] 3.0 (3.0, 5.5) [3.0, 5.8] 5.0 (5.0, 5.5) [4.0, 6.0]

Appendix B Diagnostic performance

Refer to caption
Figure A: (left) Distribution of Dice scores between the lesion annotation on the T2W and ADC scans for the original, rigidly and deformably aligned PCNN dataset. The median Dice score for the original dataset was 0.48, with 2.5% and 97.5% quantiles of the distribution of Dice scores being 0.03 and 0.90. For the rigidly aligned dataset these metrics were 0.47 [0.01, 0.82], and for the deformably aligned dataset 0.58 [0.10, 0.81]. (right) Model performance for the Ensemble of 3 PI-CAI algorithms with the original, rigidly aligned and deformably aligned PCNN datasets. AUROC = area under the receiver operator characteristic curve.
Refer to caption
Figure B: The prediction distribution for the PCNN dataset of each PI-CAI algorithm, their ensemble, and the ensemble of 3 PI-CAI algorithms (BDAV_Y, DataScientX and HeviAI) “ensemble-subset".
Refer to caption
Figure C: The prediction distribution for the PROMIS dataset of each PI-CAI algorithm, their ensemble, and the ensemble of 3 PI-CAI algorithms (BDAV_Y, DataScientX and HeviAI) “ensemble-subset".