¹¹institutetext: University of Moratuwa, Moratuwa, Sri Lanka ²²institutetext: University of Peradeniya, Peradeniya, Sri Lanka ³³institutetext: University of Colombo, Colombo, Sri Lanka
⁴⁴institutetext: Zone24x7 Inc., USA

LiverUSRecon: Automatic 3D Reconstruction and Volumetry of the Liver with a Few Partial Ultrasound Scan

Kaushalya Sivayogaraj 11 Sahan T. Guruge 22 Udari Liyanage 22 Jeevani Udupihille 33 Saroj Jayasinghe 22 Gerard Fernando 44 Ranga Rodrigo 11 M. Rukshani Liyanaarachchi 11

Abstract

3D reconstruction of the liver for volumetry is important for qualitative analysis and disease diagnosis. Liver volumetry using ultrasound (US) scans, although advantageous due to less acquisition time and safety, is challenging due to the inherent noisiness in US scans, blurry boundaries, and partial liver visibility. We address these challenges by using the segmentation masks of a few incomplete sagittal-plane US scans of the liver in conjunction with a statistical shape model (SSM) built using a set of CT scans of the liver. We compute the shape parameters needed to warp this canonical SSM to fit the US scans through a parametric regression network. The resulting 3D liver reconstruction is accurate and leads to automatic liver volume calculation. We evaluate the accuracy of the estimated liver volumes with respect to CT segmentation volumes using RMSE. Our volume computation is statistically much closer to the volume estimated using CT scans than the volume computed using Childs’ method by radiologists: p-value of $0.094\;(>0.05)$ says that there is no significant difference between CT segmentation volumes and ours in contrast to Childs’ method. We validate our method using investigations (ablation studies) on the US image resolution, the number of CT scans used for SSM, the number of principal components, and the number of input US scans. To the best of our knowledge, this is the first automatic liver volumetry system using a few incomplete US scans given a set of CT scans of livers for SSM.

Keywords:

Liver volumetry Ultrasound (US) TransUNet 3D reconstruction Statistical Shape Modeling (SSM).

1 Introduction

3D reconstruction of the liver for volume measurement and 3D visual shape analysis using an accessible medical imaging modality like ultrasound (US) imaging is important. It helps clinicians to analyse subject-specific liver morphology and accurately estimate liver volume in real-time. 3D reconstruction from segmentation of 3D scans (slice based 2D image stacks) such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) scans, although still demanding, is generally straightforward [13, 20, 11, 9, 7, 2]. However, the well-known disadvantages of MRI and CT modalities—long acquisition time, cost, and the use of ionizing radiation in CT—make 3D reconstruction using US images attractive.

3D reconstruction of organs using a few 2D US scans acquired at various angles in different planes is possible [14, 16]. However, this technique requires full view of the organ in the scan and uses several input image slices. More crucially, it requires image acquisition location information (pose of the probe) which is difficult to annotate when performing a clinical scan. If liver volume calculation is the only requirement, extracting measurements and training a regression model can lead to an estimate of the volumes from left lobe and right lobe of the liver (called Childs’ method by radiologists [3]). However, measuring lengths from low contrast and noisy US images is subjective, time-consuming, and prone to inter-observer variability; and visual shape analysis is not possible. Moreover, US scans usually do not have the full view of the liver in one image. Thus, 3D reconstruction of the liver using several partial US scans is useful, and current methods are still unable to do so.

3D reconstruction using a few slices where the organ of interest is full in view is not novel. We can examine CT and MRI liver scans as 3D volumes for a qualitative understanding and volumetry using tools such as 3D Slicer [6] and ITK-SNAP [18]. Reconstructing with CT slices of the left ventricle of the heart has been partially explored by Yuan et al. [17] given an atlas (left ventricle of the heart) and full visibility in CT slices. However, 3D reconstruction of the liver—a large organ in the human body with a complex 3D structure—is challenging due to the partial visibility of the liver resulting from the limited field of view of the US probe, noisiness, and artefacts.

If the reconstruction must go beyond visualizing CT or MRI volumes, 3D reconstruction from slices is important. There have been approaches that use one or few 2D slices for 3D reconstruction, such as Instantiation-Net [15] for MRI ventricle, liver reconstruction using an X-ray image by Tong et al. [12], and cardiac 3D reconstruction [1] for MRI. Yuan et al. [17] use a few 2D CT slices and combine segmentation and 3D reconstruction to reconstruct the left ventricle using a Statistical Shape Model (SSM). Tong et al. [12] too use an SSM. However, all these methods use X-ray, CT, or MRI images, and the reconstruction is less challenging due to the high contrast and well-defined boundaries as opposed to US. Therefore, there is no method for reconstructing a large organ like the liver using a few US slices without 3D probe coordinates.

In this paper, we create the 3D reconstruction of the liver using just three sagittal plane US slices where the liver is only partially visible with the aid of an SSM. We create the SSM using a population of liver meshes obtained from CT segmentations. The SSM extracts meaningful information and captures the underlying shape variation within the liver population and provides the mean liver model and principal components. Using just three slices is advantageous due to the ability to quickly acquire them. A deep network segments the three slices and a Multi-Layer Perceptron (MLP) regressor generates the shape parameters which, in turn, warp the SSM to create a patient-specific 3D reconstruction of the liver. This enables us to accurately estimate the patient-specific liver volume. Our volume estimates are more accurate, i.e., statistically closer to the ground truth (radiologist-segmented CT liver volumes) than the volumes estimated by radiologists using the Childs’ method. To the best of our knowledge, this is the first automated deep learning method that calculates the liver volume from three incomplete 2D US scans. Further, we introduce a new US liver database with parallel, annotated CT scans comprising 134 scans. Our contributions are

–

3D liver reconstruction and volume estimation using three US scans acquired from mid-line, mid-clavicular line, and anterior auxiliary line of the sagittal plane, where the liver is partially visible,
–

a database of paired US scans and radiologist-annotated CT scans that comprises 134 such scans, and
–

surpassing the volume computation accuracy obtained by radiologists using the Childs’ method on US images.

Our contributions open up an avenue to use less-expensive, noisy, partial US scans of organs for 3D reconstruction and volumetry. This, in our opinion, will make scan-based accurate estimation common place for better diagnosis.

2 Methodology

The aim of our framework is to accurately reconstruct the 3D model of the organ that matches with the noisy, possibly partial US scans as few as two or three¹¹1We mention as “three” in subsequent discussions for brevity.. The resulting 3D model is useful for visualization and volumetry in the clinic. There are three main modules in our 3D reconstruction framework: Statistical Shape Model (SSM) creation, US segmentation, and the 3D reconstruction itself. The SSM module takes a set of manually segmented 3D CT scans of the same organ of multiple subjects after a registration step and produces the mean mesh and principal components. The segmentation module uses TransUNet [2] to segment the three US images and generates binary masks which guide the final 3D reconstruction module. The 3D reconstruction module is a parametric regression model that warps the average 3D model to match the segmented US images. The average model is the mean of aligned meshes, which has equal number of vertices and faces as other organ models generated from 3D CT segmentation. The final result is a 3D model of the organ that matches with the three US scans.

Fig. 1 describes this framework, which calculates the liver volume from the reconstructed liver model. 3D reconstruction of the liver is possible by SSM which uses a 3D liver model atlas generated by manually segmented 3D CT scans. Principal Components Analysis (PCA) constructs the parameter space from the generated liver atlas. Raw US slices and their masks train the TransUNet [2] segmentation network to generate liver masks of the three input US slices. The masks and their shape parameters are the input that train the parametric regression MLP. This MLP, during test time, generates the shape parameters to reconstruct the 3D liver model by war** the SSM. Finally, we calculate the liver volume from the 3D liver mesh.

Figure 1: The proposed framework: binary masks of the three US slices generate the shape parameters through the parametric regression MLP. These warp the SSM to generate the 3D liver reconstruction.

Statistical Shape Model (SSM): The purpose of this model is to produce the 3D liver model that matches the three US scans of a liver. An SSM describes a set of semantically similar objects—3D liver models in our case—using a set of few parameters. It is a fundamental technique in vision and medical image processing invaluable in semantic segmentation and 3D reconstruction [8]. An SSM has the mean shape of the dataset. This mean shape, combined with principal components that represent the key variations, forms the backbone of the SSM.

We carry out the SSM process introduced in [19]. We use a set of $N$ 3D liver meshes generated from CT liver segmentation done by radiologists as the input population. Each liver model has different number of vertices and faces. As the first step, we carry out a non-rigid registration to fit each 3D liver mesh to the first 3D liver mesh (reference model) to obtain 3D models with the same topology (to make the vertices and faces of each mesh equal in number). Then, we align the fitted meshes rigidly to avoid translational and rotational variations. Then we perform PCA as the final step to the generated liver atlas $S={S_{1},S_{2},\cdots,S_{N}}$ , $S_{i}\in\mathbb{R}^{3\times M},i\in 1,\cdots,N$ , where $M$ is the number of vertices in the reference model, to create the principal components. Each liver shape $S_{i}$ is mapped to a vector ${s_{i}}^{T}\in\mathbb{R}^{3M}$ . Let $S_{\mathrm{map}}=[s_{1},s_{2},\cdots,s_{N}]^{T}\in\mathbb{R}^{N\times{3M}}$ and the mean mesh

\bar{s}=\frac{1}{N}\sum_{i=1}^{N}s_{i}

(1)

Using Singular Value Decomposition (SVD), $S_{\mathrm{map}}=U\Sigma V^{T}$ , we can represent each liver model using singular vectors $v_{k}$ (each column of $V$ , $V\in\mathbb{R}^{3M\times K}$ ) as

L=\mathrm{reshape}\left(\bar{s}+\sum_{i=k}^{K}v_{k}\alpha_{k}\right)

(2)

where $L\in\mathbb{R}^{3\times M}$ is a reconstructed liver model. Shape parameters $\alpha_{k}$ are the variables that represent the liver parametric model. We choose $K=50$ components. In our system, the combination of the segmentation network and parametric regression MLP predicts the shape parameters. We train this network using the three US segmentation masks and ground truth shape parameters. As a result, given a number of US images and CT liver modules, and the trained segmentation and parametric regression MLP, we can generate the 3D liver model that matches the masks obtained from the three US scans.

Segmentation Model for US Liver Segmentation: In our 3D liver reconstruction the binary masks that result from the segmentation of the three US images guide the final 3D reconstruction. We use TransUNet [2] built based on ResNet50 and ViT (trained on ImageNet [5]), and fine-tuned on Synapse multi-organ segmentation dataset and automated cardiac diagnosis challenge dataset [2]. We fine-tune it using our US liver segmentation dataset. Our dataset comprises three US images each of 134 patients segmented by radiologists. We augmented the dataset using random operations (rotation, translation, flip**, and crop**) when fine-tuning TransUNet. This step prevents overfitting and improves generalization. We do not alter any other hyper-parameters of TansUNet. The ViT based TransUNet is important for the segmentation as the partial views of the livers in our US images benefit from the long-range attention available in ViT. In particular, TransUNet adopts a hybrid architectural approach that fuses the strengths of both CNNs and transformers. This hybrid approach combines the fine-grained, high-resolution spatial information inherent in CNN features with the broader global context captured by the transformers. To establish this point, we also used the standard U-Net, different variants of U-Net [13] to evaluate their performance on this segmentation task comparison with TransUNet (Table 1). In summary, our TransUNet based segmenter accurately segments the noisy, partial US scans of the liver. We feed the segmentation masks to the 3D reconstruction model.

3D Reconstruction Model for Liver Model Reconstruction: Generating shape parameters ( $\alpha_{k}$ ) from the segmented masks described above is the next step following the segmentation. In this study, we approach the challenge of 3D reconstruction of liver from multiple views of sagittal plane US images. Our objective is to predict the model parameters by directly utilizing the slice-masks as input data. The 3D reconstruction model uses these masks to generate the shape parameters required for 3D liver model reconstruction. The system uses shape parameters ( $\alpha_{k}$ s), average liver model ( $\bar{s}$ ), normalized principal components ( $v_{k}$ s), and normalization parameters ( $E(v_{k})$ ) and $\mathrm{std}(v_{k})$ ) to generate the 3D liver model. To achieve this, we employ a parametric regression MLP that receives the stack of three US slice-masks as its input. So, $\alpha=\mathrm{regression~{}network}(\mathrm{US~{}binary~{}masks})$ , where $\alpha=\{\alpha_{1},\dots,\alpha_{k},\dots,\alpha_{K}\}$ . Our parametric regression MLP has two layers.

Liver Volume Calculation: Following the 3D reconstruction, we are able to estimate the liver volume. We save the 3D reconstruction as an obj file and estimate its volume using trimesh [4] in $\mathrm{cm}^{3}$ . We have verified the accuracy of trimesh by comparing the volume against the volume computed by 3D Slicer [6]. Our dataset comprises the liver volume estimated by radiologists: 1. using the CT segmentations, and 2. using Child’s method on US slices. We compare the volume we computed using the proposed method with these two methods and statistically analyze.

3 Experiments and Results

Data Acquisition: We obtained US (three per patient) and corresponding CT scans (for SSM and corresponding 3D reconstruction comparison) of 134 healthy patients²²2We plan to make this liver dataset of three annotated US slices and liver annotated CT volumes available for the benefit of the community.. We captured the three US slices ( $1470\times 2316$ ) at the mid-line, mid-clavicular line, and anterior auxiliary line of the sagittal plane. An experienced radiologist segmented the liver in US images using ITK-SNAP[18] to be used for training, and relevant slices of abdomen CT using 3D-Slicer [6] to be used for SSM and volume comparison (considered as ground truth). We stacked together the segmented 2D CT slices to reconstruct the 3D liver mesh to be used in the SSM and for comparison. We resized the US images to $192\times 192$ or $384\times 384$ (for ablation). Out of 134 subjects, we allocated 99 for training and 35 for testing.

US Liver Segmentation Results: We used FCN [10], UNet [13], UNet++ [20] with EfficientNetB7 encoder, and TransUNet [2] for segmenting US scans for US liver segmentation (Table 1). TransUNet achieved the best Accuracy (Acc.), Dice Score Coefficient (DSC), Intersection over Union (IoU), and Hausdorff distance (HD) for unseen data. This is because TransUNet uses transformers to encode tokenized image patches from a CNN feature map. Thus, the input sequence captures global contexts [2]. We used UNet as the decoder to decode the hidden feature for generating the final segmentation masks. 2D liver predictions overlap well with ground truth liver labels. This, in turn, leads to an accurate liver volume calculation. Ours is the first method that uses a transformer network in US liver segmentation. Following this result, we used TransUNet for all other experiments.

Refer to caption — Figure 2: US segmentation and 3D reconstruction results: Three input US sagittal plane images, corresponding segmentations, and 3D liver reconstructions using the shape parameters for three subjects.

Table 1: Segmentation accuracy: TransUNet performs better and, hence, was selected for subsequent experiments.

\ast

represents the usage of EfficientNet-B7 as an encoder. 3D reconstruction accuracy: CD and MSD are less when we combine TransUNet with Param. Regress. MLP than UNet.

Segmentation	FCN	UNet	UNet++ $\ast$	TransUNet
Acc. (%) $\uparrow$	93.2	95.4	94.4	97.5
DSC (%) $\uparrow$	38.5	65.6	68.1	91.3
HD (mm) $\downarrow$	5.5	4.8	4.5	3.6
IoU (%) $\uparrow$	24.1	50.2	52.7	84.4

Recon.	TransUNet	UNet
Accuracy	+ Recon.	+ Recon.
MSD (mm) $\downarrow$	6.6	6.8
CD (mm) $\downarrow$	12.8	13.1

Figure 4: Box plot of liver volumes calculated from Childs’ method, CT segmentation, and proposed method: Childs’ method has outliers, but the proposed method has no outliers and it’s liver volume distribution falls within CT segmentation’s liver volume distribution.

Table 2: Main result: Statistical analysis: RMSE is less in estimated volumes from our method. Paired

t

-test shows that there is no significant difference in volumes between CT and our method (

p>0.05

). Our method is statistically more accurate.

\mu

: mean difference, SEM: Standard Error of the Mean.

Vol. Compar. RMSE Pair $\mu$ std. SEM 95% CI of $\mu$ diff. (Lower, Upper) $t$ df Signi. 2-tailed CT & Childs’ 306.9 1 -201.5 234.8 39.7 (-282.1, -120.8) -5.1 34 .000 CT & Ours 275.8 2 78.1 268.4 45.4 (-14.1, 170.3) 1.7 34 .094

3D Reconstruction Results: We send the liver masks obtained from the above segmentation process to the 3D reconstruction model to generate the shape parameters to reconstruct the 3D liver shape model. As the problem at hand is a slice mask based shape reconstruction, we use the reconstruction method in Yuan et al. [17]. Table 1 illustrates the accuracy of the 3D reconstruction method on test data. We use Chamfer Distance (CD) and Mean Surface Distance (MSD) to compare the generated 3D reconstruction with ground truth 3D liver models (see Table 1). The combination of TransUNet and parametric regression MLP obtained less CD and MSD compared to using UNet with the same setup. Front and back views of 3D reconstructed liver models are in Fig. 3; Our system can generate liver models retaining their complex shape. Further, Fig. 3 provides visualization of results, where predicted liver models highly overlap with ground truth. We calculated the Root Mean Square Error (RMSE) to compare with the ground truth CT volumes with the volumes obtained by radiologists using the Childs’ method and the proposed method; Our liver volumes are closer to the ground truth as shown in Table 2. Box plot in Fig. 4 shows the descriptive statistics of each liver volume calculation method. We performed patient-wise paired sample $t$ -test as shown in Table 2. We can conclude that there is a significant difference in liver volumes calculated between CT segmentation ( $\mathrm{mean}=1162.4,\mathrm{std.}=275.7$ ) and Childs’ method ( $\mathrm{mean}=960.9,\mathrm{std.}=257.9$ ); $t(34)=-5.1,p=.000$ . In contrast, there is no significant difference in liver volumes calculated between CT segmentation ( $\mathrm{mean}=1162.4,\mathrm{std.}=275.7$ ) and our method ( $\mathrm{mean}=1240.6,\mathrm{std.}=133.1$ ); $t(34)=1.7,p=0.094$ .

Ablation Study: Table 3 shows our ablation studies on the effect of resolution of the three US slices, no. of slices used to compute the shape parameters, the number of principal components used for SSM, and the no. of CT scans used for SSM. We have chosen to use the resolution of $192\times 192$ , 3 slices, 50 principal components, and 100 CT scans for SSM. The choices do not drastically affect the final results.

Table 3: Effect of US resolution, no. of US scans used, no. of principal components used for the SSM, and the no. of CT scans used for the SSM.

\ast

indicates what we used for the final results. These choices do not affect the final results drastically.

US res.	RMSE
$192^{2}$ $\ast$	275.83
$384^{2}$	271.63

No. slices	RMSE
2	281.65
3 $\ast$	275.83

No. Comp.	RMSE
10	252.35
20	249.90
40	254.53
50 $\ast$	275.83
70	306.82

CT scans in SSM	RMSE
50 (1st)	291.70
50 (2nd)	285.52
60	279.96
80	275.18
100 $\ast$	275.83

{credits}

3.0.1 Acknowledgements

K. Sivayogaraj acknowledges the support received from the Chancellor’s scholarship donated by Zone24x7 (Pvt.) Ltd. R. Rodrigo acknowledges the support received from the University of Moratuwa Senate Research Committee grant SRC/LT/2021/20.

References

[1] Chang, Q., Yan, Z., Zhou, M., Liu, D., Sawalha, K., Ye, M., Zhangli, Q., Kanski, M., Al’Aref, S., Axel, L., Metaxas, D.: Deeprecon: Joint 2D cardiac segmentation and 3D volume reconstruction via a structure-specific generative method. In: Medical Image Computing and Computer Assisted Intervention. pp. 567–577 (2022)
[2] Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: TransUNet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
[3] Childs, J., Esterman, A., Thoirs, K., Turner, R.: Ultrasound in the assessment of hepatomegaly: A simple technique to determine an enlarged liver using reliable and valid measurements. Sonography 3, 47–52 (03 2016)
[4] Dawson-Haggerty et al.: trimesh, https://trimesh.org/
[5] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (06 2009)
[6] Fedorov, A., Beichel, R., Kalpathy-Cramer, J., Finet, J., Fillion-Robin, J.C., Pujol, S., Bauer, C., Jennings, D., Fennessy, F., Sonka, M., Buatti, J., Aylward, S., Miller, J., Pieper, S., Kikinis, R.: 3D slicer as an image computing platform for the quantitative imaging network. Magnetic Resonance Imaging 30, 1323–41 (07 2012)
[7] Fu, S., Lu, Y., Wang, Y., Zhou, Y., Shen, W., Fishman, E., Yuille, A.: Domain adaptive relational reasoning for 3d multi-organ segmentation. In: Medical Image Computing and Computer Assisted Intervention. pp. 656–666 (2020)
[8] Ian L. Dryden, K.V.M.: Statistical Shape Analysis: With Applications in R. Wiley, 2 edn. (2016)
[9] Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A.: H-DenseUNet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Transactions on Medical Imaging 37(12), 2663–2674 (2018)
[10] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 3431–3440. Boston, MA (2015)
[11] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: International Conference on 3D Vision (3DV). pp. 565–571 (2016)
[12] Nakao, M., Tong, F., Nakamura, M., Matsuda, T.: Image-to-graph convolutional network for deformable shape reconstruction from a single projection image. In: Medical Image Computing and Computer Assisted Intervention. pp. 259–268 (2021)
[13] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention. pp. 234–241 (2015)
[14] Sawdayee, H., Vaxman, A., Bermano, A.H.: OReX: Object reconstruction from planar cross-sections using neural fields. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20854–20862 (2023)
[15] Wang, Z.Y., Zhou, X.Y., Li, P., Theodoreli-Riga, C., Yang, G.Z.: Instantiation-Net: 3D mesh reconstruction from single 2D image for right ventricle. In: Medical Image Computing and Computer Assisted Intervention. pp. 680–691 (2020)
[16] Yeung, P.H., Hesse, L., Aliasi, M., Haak, M., Xie, W., Namburete, A.I., et al.: Implicitvol: Sensorless 3D ultrasound reconstruction with deep implicit representation. arXiv preprint arXiv:2109.12108 (2021)
[17] Yuan, X., Liu, C., Feng, F., Zhu, Y., Wang, Y.: Slice-mask based 3D cardiac shape reconstruction from ct volume. In: Asian Conference on Computer Vision. pp. 1909–1925 (2022)
[18] Yushkevich, P.A., Piven, J., Cody Hazlett, H., Gimpel Smith, R., Ho, S., Gee, J.C., Gerig, G.: User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage 31(3), 1116–1128 (2006)
[19] Zhang, J., Hislop-Jambrich, J., Besier, T.F.: Predictive statistical models of baseline variations in 3-D femoral cortex morphology. Medical Engineering & Physics 38(5), 450–457 (2016)
[20] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: A nested U-Net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. pp. 3–11 (2018)