Unsupervised Airway Tree Clustering with Deep Learning: The Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study

Abstract

High-resolution full lung CT scans now enable the detailed segmentation of airway trees up to the 6th branching generation. The airway binary masks display very complex tree structures that may encode biological information relevant to disease risk and yet remain challenging to exploit via traditional methods such as meshing or skeletonization. Recent clinical studies suggest that some variations in shape patterns and caliber of the human airway tree are highly associated with adverse health outcomes, including all-cause mortality and incident COPD. However, quantitative characterization of variations observed on CT segmented airway tree remain incomplete, as does our understanding of the clinical and developmental implications of such. In this work, we present an unsupervised deep-learning pipeline for feature extraction and clustering of human airway trees, learned directly from projections of 3D airway segmentations. We identify four reproducible and clinically distinct airway sub-types in the MESA Lung CT cohort.

Index Terms— Airway structure, Lung CT, Community Detection, Deep Learning

1 Introduction

A growing body of evidence suggests that differences in airway tree structure are associated with adverse health outcomes [1]-[5]. Airway tree caliber assessed on CT is a risk factor for all-cause mortality, in addition to cause-specific mortality from COPD, lung cancer and even atherosclerotic cardiovascular disease [1]. Variations in airway structure are evident at the segmental level in over 25% of the population [2]. The presence of accessory segmental airway or absence of standard segmental airway is associated with higher odds of COPD, and the latter is also associated with genetic marker FGF10, which regulates embryonic airway budding [2]. Airway structure further impacts the deposition of inhaled particles at bifurcations [6]. All of the above motivate the identification of airway tree subtypes on large populations of segmented CT scans and the analysis of their clinical relevance.
Prior work analyzed airway topology in Billera-Holmes-Vogtmann (BHV) tree space [7] for clustering [8], and classification of COPD subjects based on geodesic distance between tree topologies [9].

However, metrics in BHV tree-space require that the ‘leaf set’ be conserved across all trees in the dataset, thereby limiting the metric to the depth of anatomical labeling. BHV tree-space further considers only topology and not the impact of airway caliber. Improvements in airway segmentation from CT have accelerated in recent years [10, 11] and robust airway segmentations can now be obtained beyond the segmental level. This leads to large-size segmentation masks with fine details, which need to be analyzed on both lungs together (rather than on patches) to compare subjects.

In this work, we propose to use the deep-learning (DL) auto-encoding capacity to learn airway tree structural patterns directly from projections of whole 3D airway segmentation masks. Our method is designed to handle the following challenges: (1) Potential detrimental influence of the trachea which comprises $\approx$ 40% of voxels in a 3D airway mask. This might lead to disproportional importance of this structure in DL encoder training while at the same time there is between-study variability of the superior cut-off plane position in the CT field of view. (2) Lack of co-registration between subjects CT scans. (3) Tradeoff required between voxel-level segmentation (usually done on patches) and encoding of features at scan level for population clustering and proposition of new airway shape phenotypes.
We train and evaluate our proposed method on binary airway masks observed on HRCT scans from the Multi-Ethnic Study of Atherosclerosis (MESA) [12] Exam 5 participants to identify airway shape phenotypes.

2 Materials and Methods

2.1 Datasets and airway segmentations

We exploited two CT scan cohorts with airway segmentations: (1) The MESA Lung study [13] is part of a population-based study of 6,814 participants recruited in 2000-2002 across six study centers in the USA. Participants were ages 48-85, were 53% female, 39% white, and were free of cardiovascular disease at baseline [12]. In 2010-2012, the MESA Lung Study performed full-lung high-resolution CT scans for 3,195 participants with in-plane resolution in range $\left[0.4668,0.9180\right]$ mm, and slice spacing $0.5$ mm following the SPIROMIC CT protocol [14]. Airway segmentation masks (from VIDA Diagnostics) are available with on average $323\pm 124$ airway segments per tree, each with a maximum depth of $12.7\pm 1.6$ Weibel generations (0 = trachea, 16 = terminal bronchioles). (2) The ATM’22 grand challenge dataset consists of 500 anonymized CT scans, of which n=300 have reference airway segmentations [11], showing on average $202\pm 73$ segments per tree with a maximum depth of $11.1\pm 1.6$ . Lung conditions of the scanned subjects range from healthy status to severe pulmonary disease.

2.2 Airway masks pre-processing

From the MESA Lung cohort, we retained n=2,587 participants based on the criteria: (1) the airway mask contains a single connected component, with airways at least to the segmental level; (2) Each lobe in the lung mask comprises $\geq 5\%$ of the total lung volume; (3) Segmental airway branch variant annotations from [2] are available. We rotated the full-size 3D airway masks to align their first three PCA vectors with the image axis and then cropped with the bounding box around the segmentation. We then computed Maximum intensity projections (MIPs) for the axial, coronal, and sagittal planes.

Refer to caption — Fig. 1: 2D projections of airway masks in axial, coronal, and sagittal planes without and with peripheral airways dilations to emphasize shape characteristics of small airways.

We tested a scenario where we increased the visibility and hence up-weight the importance of peripheral airways in the MIP representations. We generated 2 separate MIPs from airways above or below Weibel generation $>5$ . For the generation above 5, we applied a dilation kernel (4x4 ones) and then merged it with the other MIP. We downsampled each MIP and concatenated the 3 views to an array of size 3 x 256 x 256 per subject. We generated MIPs with (T) and without (NT) the trachea and with dilation (D) or without (ND).

2.3 Deep encoding of airway MIPs

For the auto-encoder, we modified the baseline UNet architecture [15] by: (1) reducing the feature maps per layer by $1/8$ th; (2) eliminating skip connections, and constraining all reconstruction information to a bottleneck feature of size 128x16x16, which we used for downstream clustering. The UNet-noskip model has 537k tunable parameters (Figure 2). We split the 2,587 subjects into 5 train-validation folds, stratified by segmental airway variant [2]. The reconstruction loss function was an equally weighted combination of Binary Cross Entropy and Dice Losses. For each fold, we conducted mini-batch gradient descent with a batch size of 12, Adam optimizer, a learning rate of 0.001, and a cosine annealing learning rate scheduler with warm restarts. The learning rate decayed from 0.001 to 0 over 20 epochs, restarted, and subsequently decayed over 40 epochs. The decay period roughly doubled at each restart. Early stop** of training was conducted if there was no improvement in validation loss after 10 epochs. We selected the UNet-noskip model from the highest-performing fold for fine-tuning for 5 epochs.

We reduced the learning rate to 0.0001, with all other parameters as above.

We used the ATM grand challenge dataset for external evaluation of the trained autoencoder.

2.4 Airway tree clustering

We applied PCA to reduce the UNet-noskip bottleneck feature size from 128x16x16 per tree to 2,048 (98% variance explained); We computed pairwise L2 distances between vectors to generate a $kNN$ graph with edge weights between two airway trees $E(i,j)=\alpha(i,j)\sqrt{|v_{i}-v_{j}|^{2}}$ , where $\alpha(i,j)=1$ if tree $j$ is one of the $k$ nearest neighbors of tree $i$ and otherwise 0. We used the Louvain algorithm [16] to perform community detection on the $kNN$ graph by iteratively selecting partitions that optimize a ‘modularity’ score on the graph. Increasing $k$ in the pipeline generally results in fewer clusters. We set the value of $k$ empirically by running Louvain clustering with $k\in[5,2500]$ and selecting the first plateau where the number of clusters remains constant for $\Delta k=100$ . We set $k_{opt}$ as the midpoint of the plateau (Figure 4a).

We also report clustering reproducibility with the following variants: (1) clustering on 80% rather than 100% of training data; (2) varying $k$ in $k$ NN graph; (3) using Cosine distance in $k$ NN graph (4) clustering on PCA-based 1024-D vectors (89% variance explained).

Finally, we related airway tree clusters to spirometry-based measures of chronic obstructive pulmonary disease (COPD) and CT-based measures of emphysema severity and dysanapsis by computing the least square mean values adjusted for participant age, sex, height, race-ethnicity, study site, smoking status, and pack-years smoking.

3 Results

3.1 Evaluation metrics

MIP Auto-encoding: We compare the ground truth MIPS $\mathcal{M}$ , and corresponding centerline skeleton $\mathcal{S}$ with their versions ( $\hat{\mathcal{M}},\hat{\mathcal{S}}$ ) from UNet-noskips encoding. We used the following metrics, as in [17]: Tree Length (TL) $=\frac{|\mathcal{S}\cap\hat{\mathcal{M}}|}{|\mathcal{S}|}$ ; Centerline leakage (CL) $=\frac{|\hat{\mathcal{S}}\not\in\mathcal{M}|}{|\mathcal{S}|}$ ; False positive rate (FPR) $=\frac{|\hat{\mathcal{M}}\not\in\mathcal{M}|}{|\mathcal{M}|}$ ; Dice score coefficient (DSC) = $\frac{2|\hat{\mathcal{M}}\cap\mathcal{M}|}{|\mathcal{M}|+|\hat{\mathcal{M}}|}$ . We report mean and standard deviation of metrics across validation folds. We report average metrics across the three MIP views.

MIP Clustering: Agreement between two clusterings $(\mathcal{C},\hat{\mathcal{C}})$ is evaluated using the Rand Index (RI) and Adjusted Rand Index (ARI). The Rand Index (RI) considers the number of pairs of datapoints that are co-clustered in both $\mathcal{C}\text{ and }\hat{\mathcal{C}}$ (denoted true positive (TP) samples), and pairs that belong to different clusters in both $\mathcal{C}$ and $\hat{\mathcal{C}}$ (denoted true negative (TN) samples) and is defined $RI=\frac{|TP|+|TN|}{n(n+1)/2}$ where the denominator is the total number of pairs for $n$ data points. RI increases as the number of clusters increases.

ARI tackles this by adjusting RI such that ARI = 0 for uniform random assignment of data points across clusters and 1 for perfect agreement, $ARI=\frac{RI-\mathbb{E}(RI)}{1-\mathbb{E}(RI)}$ .

3.2 MIP Auto-encoding

We present the mean/standard deviation values of MIP reconstruction metrics in MESA Exam 5 Table 1 for different combinations of trachea inclusion and use of small airway dilation. When training with the trachea (T), all Dice scores are high. But, tree length remains low (TL $<70\%$ ) when validating on ND+T MIPs, emphasising the challenge of reconstructing fine airway structures at subject level, even if training on ND data.

Fine-tuning the best model (D+T/D+T) on dilated MIPs with the trachea masked out maintains performance in TL and FPR. Dice score falls marginally, likely due to the removal of the large, easy to encode trachea.

Evaluation on ND+T MIPs from the ATM’22 cohort demonstrates the robustness of our pre-trained autoencoder. All metrics are in line with MESA Exam 5, with Dice score outperforming at 0.835 in part due to smaller segmented trees.

Table 1: MIP auto-encoding with UNet-noskips. We threshold output probability maps at 0.5 to generate binary masks. *Datatypes: D = MIPs with peripheral airway dilation, ND = MIPS with no dilation, T = MIPS including trachea, NT = MIPS with masked trachea. Best model is indicated with

{}^{\dagger}

and evaluated on the non-dilated ATM cohort (D+T

{}^{\dagger}

/ ND+T). We fine-tuned the best model using D+NT data for 5 epochs (FT

{}^{\dagger}

). Blue=evaluation on dilated images.

MIPS $\rightarrow$ MIPS pre-training on MESA Exam 5
Train/Eval Data	5 fold cross-validation ( $\mu\pm\sigma$ )
Train/Eval Data	Dice ( $\uparrow$ )	FPR ( $\downarrow$ )	TL ( $\uparrow$ )	CL ( $\downarrow$ )
D+T/D+T ${}^{\dagger}$	0.895±0.002	0.023±0.001	0.933±0.003	0.011±0.000
D+T/ND+T	0.793±0.006	0.027±0.002	0.6815±0.02	0.021±0.001
ND+T/D+T	0.887±0.01	0.034±0.003	0.692±0.016	0.021±0.002
ND+T/ND+T	0.798±0.01	0.029±0.003	0.692±0.016	0.022±0.002
Fine-tuning best model ${}^{\dagger}$ on MESA Exam 5 (masked trachea)
D+NT/D+NT (FT ${}^{\dagger}$ )	0.884±0.001	0.024±0.001	0.929±0.002	0.012±0.001
Best model ${}^{\dagger}$ evaluation on ATM
D+T ${}^{\dagger}$ / ND+T	0.835±0.034	0.029±0.007	0.773±0.07	0.02±0.008

3.3 Airway tree clustering

With trachea $\mathcal{C}(D+T)$ : Randomly selected cases from the 4 discovered clusters (W,X,Y,Z) are shown in Figure 3a. Clusters seem driven by global shape features such as tree size and orientation of the trachea. Repeated clustering on 5 subsets of MESA Exam 5 (80% each), lead to $k_{opt}=575$ (Figure 4a) and 4 clusters for each run. Comparing to clustering on 100% of the data we got RI = {0.956, 0.953, 0.944, 0.864, 0.974} and ARI = {0.851, 0.847, 0.834, 0.673, 0.901}. Hence, cluster reproducibility is strong, despite standard deviation of the ARI being quite high. Comparing L2 distance against Cosine distance in the $k$ NN graph (on 100% of the training data) yields RI = 0.960, ARI = 0.865 indicating robustness to $k$ NN graph distance metric choice. Clustering with 1,024-D vs the 2,048-D feature vectors yields RI = 0.977, ARI = 0.942. Without trachea $\mathcal{C}(D+NT)$ : Clusters (0,1,2,3), illustrated in Figure 3b, show less distinct obvious global difference, indicating more subtle variations in airway trees being picked by the auto-encoder feature extractor. Repeated clustering lead to $k_{opt}=535$ , and 4 clusters, with higher reproduciblity metrics than with the trachea: RI ={0.949, 0.934, 0.951, 0.956, 0.944}, ARI = {0.869, 0.828, 0.873, 0.886, 0.855}. Clustering output across the 5 subsets was stable across a wide range of $k$ , indicating strong robustness to $k$ NN graph construction (Figure 4b).

We present the variation in clinical outcomes for $\mathcal{C}(D+NT)$ in Figure 3b. Cluster properties demonstrate the capabilities of deep MIP encoding to identify stable partitions with significant clinical differences (COPD prevalence in Cluster 0 is 0.011, while Cluster 1 is 8x higher) in the general population of MESA Exam 5. Cluster 1 has high COPD prevalence without high percent emphysema, which is an open area of clinical investigation and demonstrates power of using the airway segmentation rather than the CT intensities as input.

4 Discussion

In this work, we have introduced and validated a framework for deep-learned representations of airway trees segmented on HRCT scans. We demonstrate the efficacy of dilating peripheral airways to improve depth of reconstructions achievable without the use of patch-based methods. We demonstrate for the first time, the use of unstructured, deep-learned shape features for the robust unsupervised clustering of airway trees to discover new phenotypes. We demonstrate that removal of the trachea results in more reproducible clusters, and reduces the impact of initial tree orientation on cluster assignment (without complex registration). Training on dilated airways was risky, but from our results, we see it is sufficient to discover sub-types with significant clinical associations with clinical COPD prevalence. This could drive future efforts in airway segmentation challenges to focus on tree depth rather than voxel level radius estimation. Our community discovery clustering method identifies four subtypes within MESA Exam 5 that are robust to the user-defined hyperparameters in our clustering pipeline and exhibit significant differences in COPD risk.
Although our deep-learned shape features indicate a strong understanding of airway tree structures, separation of specific topological variations are not fully identified due to the use of 2D MIPs representation with lower resolution and dilation of peripheral airways. Future work will explore multi-scale models to dig deeper into the peripheral airway tree and investigate the clustering of individual lobes of the lung. Our UNet-noskips model performs better on shallow trees, and outperforms in the coronal view. Future work will focus on hard-example mining to alleviate this imbalance. We further plan to validate our method in independent CT cohorts, e.g., Canadian Cohort Obstructive Lung Disease (CanCOLD)[18], SubPopoulations and Intermediate Outcomes in COPD (SPIROMICS) study [19], and conduct longitudinal analysis in MESA Exams 6, 7.

5 Compliance with ethical standards

Each MESA study site was approved by the institutional review board (http://www.mesa-nhlbi.org). Written informed consent was obtained from all participants.

6 Acknowledgments

This research was supported by National Heart, Lung, and Blood Institute grants NIH R01-HL130506, R01 HL155816
NIH R01 HL121270, R01-HL077162 and R01-HL093081 and contracts 75N92020D00001, HHSN268201500003I,
N01-HC-95159, 75N92020D00005, N01-HC-95160,
75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163,
75N92020D00004, N01-HC-95164, 75N92020D00007,
N01-HC-95165, N01-HC-95166, N01- HC-95167,
N01-HC-95168 and N01-HC-95169, and by grants
UL1-TR-000040, UL1-TR-001079, and UL1-TR-001420 from the National Center for Advancing Translational Sciences (NCATS). The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesa-nhlbi.org. Author E.A.H is a shareholder of VIDA Diagnostics, Inc.

References

[1] M. Vameghestahbanati et al., “Association of dysanapsis with mortality among older adults,” European Respiratory Journal, vol. 61, no. 6, 2023.
[2] B. M. Smith et al., “Human airway branch variation and chronic obstructive pulmonary disease,” Proceedings of the National Academy of Sciences, vol. 115, no. 5, pp. E974–E981, 2018.
[3] B. M. Smith et al., “Association of dysanapsis with chronic obstructive pulmonary disease among older adults,” Journal of the American Medical Association, vol. 323, no. 22, pp. 2268–2280, 2020.
[4] S. Bodduluri et al., “Computed tomography–based airway surface area–to-volume ratio for phenoty** airway remodeling in chronic obstructive pulmonary disease,” American Journal of Respiratory and Critical Care Medicine, vol. 203, no. 2, pp. 185–191, 2021.
[5] S. Bodduluri et al., “Airway fractal dimension predicts respiratory morbidity and mortality in copd,” The Journal of Clinical Investigation, vol. 128, no. 12, pp. 5374–5382, 2018.
[6] A. R. Lambert et al., “Regional deposition of particles in an image-based airway model: large-eddy simulation and left-right lung ventilation asymmetry,” Aerosol Science and Technology, vol. 45, no. 1, pp. 11–25, 2011.
[7] L. J. Billera, S. P. Holmes, and K. Vogtmann, “Geometry of the space of phylogenetic trees,” Advances in Applied Mathematics, vol. 27, no. 4, pp. 733–767, 2001.
[8] A. Wysoczanski et al., “Unsupervised clustering of airway tree structures on high-resolution ct: the mesa lung study,” in Proceedings of International Symposium on Biomedical Imaging, 2021, pp. 1568–1572.
[9] A. Feragen et al., “Tree-space statistics and approximations for large-scale analysis of anatomical trees,” in Proceedings of Information Processing in Medical Imaging, 2013, pp. 74–85.
[10] S. A. Nadeem et al., “A ct -based automated algorithm for airway segmentation using freeze-and-grow propagation and deep learning,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 405–418, 2020.
[11] M. Zhang et al., “Multi-site, multi-domain airway tree modeling,” Medical Image Analysis, vol. 90, pp. 102957, 2023.
[12] D. E. Bild et al., “Multi-ethnic study of atherosclerosis: objectives and design,” American journal of epidemiology, vol. 156, no. 9, pp. 871–881, 2002.
[13] J. Rodriguez et al., “The association of pipe and cigar use with cotinine levels, lung function, and airflow obstruction: a cross-sectional study,” Annals of internal medicine, vol. 203, no. 2, pp. 185–191, 2010.
[14] J. P. Sieren et al., “SPIROMICS protocol for multicenter quantitative computed tomography to phenotype the lungs,” American journal of respiratory and critical care medicine, vol. 194, no. 7, pp. 794–806, 2016.
[15] O. Ronneberger, Philipp Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Proceedings of Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241.
[16] P. De Meo et al., “Generalized louvain method for community detection in large networks,” in International conference on intelligent systems design and applications, 2011, pp. 88–93.
[17] A. Garcia-Uceda et al., “Automatic airway segmentation from computed tomography using robust and efficient 3-d convolutional neural networks,” Scientific Reports, vol. 11, no. 1, pp. 16001, 2021.
[18] J. Bourbeau et al., “Canadian cohort obstructive lung disease (CanCOLD): fulfilling the need for longitudinal observational studies in copd,” COPD: Journal of Chronic Obstructive Pulmonary Disease, vol. 11, no. 2, pp. 125–132, 2014.
[19] D. Couper et al., “Design of the subpopulations and intermediate outcomes in COPD study (SPIROMICS),” Thorax, vol. 69, no. 5, pp. 492–495, 2014.