These authors contributed equally to this work.
@Correspondence: [email protected]
Unsupervised Latent Stain Adaption
for Digital Pathology
Abstract
In digital pathology, deep learning (DL) models for tasks such as segmentation or tissue classification are known to suffer from domain shifts due to different staining techniques. Stain adaptation aims to reduce the generalization error between different stains by training a model on source stains that generalizes to target stains. Despite the abundance of target stain data, a key challenge is the lack of annotations. To address this, we propose a joint training between artificially labeled and unlabeled data including all available stained images called Unsupervised Latent Stain Adaption (ULSA). Our method uses stain translation to enrich labeled source images with synthetic target images in order to increase supervised signals. Moreover, we leverage unlabeled target stain images using stain-invariant feature consistency learning. With ULSA we present a semi-supervised strategy for efficient stain adaption without access to annotated target stain data. Remarkably, ULSA is task agnostic in patch-level analysis for whole slide images (WSIs). Through extensive evaluation on external datasets, we demonstrate that ULSA achieves state-of-the-art (SOTA) performance in kidney tissue segmentation and breast cancer classification across a spectrum of staining variations. Our findings suggest that ULSA is an important framework towards stain adaption in digital pathology.
Keywords:
Semi-supervised Learning Stain Adaption Whole Slide Image Transfer Learning Segmentation Classification1 Introduction
Recent advances in DL for digital pathology have shown promising results for a wide range of applications, from cancer and biomarker detection to tissue structure segmentation [3]. However, large-scale studies have indicated that the effectiveness of DL techniques in histology is heavily dependent on the availability of labeled data [15]. Despite its theoretical promise, acquiring a sufficient number of expert annotations remains challenging. In the realm of digital pathology, image datasets often consist of sequential slides stained with various techniques, each providing distinct insights into the same region of interest. Despite variations in staining protocols, these slides frequently share a significant amount of consistent information. However, expert annotations may be available for one type of staining but may be lacking for others, which are often accessible in large quantities without labels. Generating expert annotations for multiple staining techniques for the same analysis tasks would be exceedingly time-consuming. In the era of foundational models [11], we also prefer generalized DL models robust to data shifts instead of domain experts. In this paper, we question how to tailor a DL model trained for a specific task to handle variations in staining within the distribution of target stains, for which no annotations are available. This can be accomplished by incorporating unlabeled data during the training phase. The aspect of stain adaptation across different inter-staining techniques has not been sufficiently explored so far. Despite efforts to develop stain-to-stain translation techniques, their effectiveness is typically evaluated either visually by experts or through translation metrics [19]. Prior research has not focused on directly incorporating unlabeled target stain images into the training process yet, only using them for translation [2, 7]. Here, we present ULSA, a semi-supervised strategy designed for joint training of all staining data for the first time. We introduce a framework that integrates unlabeled target stain images into supervised training by maintaining the supervised learning signal for synthetic target stainings generated through cycle GAN (cGAN) inference [18]. Feature-wise stain-adaption enables using unlabeled target data and enforces feature consistency across stains. Combining these key ingredients, we propose a new method for efficient stain adaption, that outperforms current SOTA approaches. Our novelties can be summarized as
-
(1) Unsupervised stain adaption. ULSA leverages target stain data in a supervised and unsupervised fashion with only annotated source stains. We propose a framework for training stain-invariant models for digital pathology.
-
(2) Feature consistency learning. We maximize cosine-similarity between hierarchical features across stains to achieve stain-invariance on feature level.
-
(3) Task agnostic framework. ULSA is applicable for classification and segmentation training of stain-invariant models.
-
(4) Outperforming SOTA. Our approach outperforms methods from stain-translation, DL based augmentation and semi-supervised learning slightly for source stains and by a large margin for target stains. Our approach only needs 10% of labels to reach the same performance as SOTA trained with all data.
2 Method
Stain adaption aims to minimize the generalization error in task performance between a source staining and a target staining (Fig. 1a). In particular, a parameterized model trained on labeled source staining data should ideally maintain task performance on other unlabeled target stainings where no labels are available. We propose to address this challenge by incorporating unlabeled target stainings through (i) a cGAN model to augment labeled images into target stains inheriting the same annotation, (ii) unsupervised stain adaption (USA) to jointly train on all stains with supervised and unsupervised objectives including all stains, followed by (iii) stain-invariant feature consistency learning (FCL) by unsupervised matching of latent representations between stainings. The overall method is outlined in Fig. 2.
![Refer to caption](x1.png)
2.0.1 (i) cGAN stain augmentation.
After pretraining we used cGANs, which we define as stain translation function to synthetically augment source training data into target stainings. This process is structure preservering [18], thus each target stain image inherits the label corresponding to the associated source image used for translation. Fig. 1c shows exemplary translation results. This strategy aims to increase the labeled training dataset and thus the supervisory signal to achieve stain-invariance on prediction level.
![Refer to caption](x2.png)
2.0.2 (ii) Unsupervised stain adaption.
Since labeled data are given only in source staining , we would ideally desire a map** from source to target samples. We approximate this map** by using distinct cGAN augmenter to obtain additional labeled target samples . With we denote the cardinality of a set of stains. Note that all inferred samples inherit the label associated with . We approximate by a mixture
In addition, we leverage unlabeled data by unsupervised learning. Given an unlabeled image , we translate it to by Reinhard translation and noise injection . Note that receives another unlabeled random sampled image from target stains as reference for subsequent translations. Fig. 1b include example inputs. We used Reinhard normalization as Macenkos method has much higher runtime leading to computational overhead, see supplementary material (SM). Finally, we enforce the model to embed images invariant to stain translations by maximizing cosine similarity in unsupervised loss , see (iii). For this reason, we used light gaussian blurring for noise injection, see SM for other choices. In summary, we define our objective
where is a supervised loss. With minimizing using labeled images of synthetic target stains, we aim for stain-adaption on prediction level. We use multi-class and binary cross entropy loss for segmentation and classification, respectively. We set equal weight , obtained by performing a grid search as described in SM. In each iteration we compute on a batch of labeled data and compute feature consistency with on a batch of unlabeled data. Next, we detail the unsupervised loss term associated with feature consistency learning.
2.0.3 (iii) Stain-invariant feature consistency learning.
For unsupervised learning, we aim to calculate similar feature representations for an input and a stain translated noised version . To enforce this for every downsampling block in a model, we apply non-parametric 2D adaptive average pooling. In Einstein notation, this tensor operation leads to where , , , and refers to batch-size, channels, height, width and the model block, respectively (Fig. 1b). We aim to maximize the cosine similarity between all hierarchical features from a model . In this way, we enforce feature similarity by updating the model such that extracted image features in latent space become similar. We define features as model outputs until block after 2D adaptive pooling, , . For brevity, index also refers to parameter optimization from input layer until block . Following the literature [10, 17], we do not backpropagate gradients for non-augmented input, denoted with . Hence, we define our unsupervised objective as
where denotes the Euclidean norm and the number of downsampling blocks. With minimizing , we force stain-invariance on feature level.
3 Experiments
We applied our proposed method to kidney tissue segmentation and cancer classification. We measured the performance using the dice score and area under the receiver operating characteristic curve (AUROC), respectively. In the following we describe our datasets, more information about statistics can be found in SM.
3.1 Datasets and Comparable Methods
For slide tiling, we used a modified version of the CLAM preprocessing pipeline [8] and manually selected color thresholds as different stains require adjustments in tissue detection. All image data were processed with a size of px.
3.1.1 Kidney segmentation datasets.
Our internal annotated train and validation datasets consist of annotated patches extracted from periodic acid-Schiff (PAS) stained WSIs with magnification. Annotation masks contain the classes tubule, glomerulus, glomerular tuft, artery, arterial lumen, and vein. For external testing we included annotated glomerulus images from PAS stained WSIs provided by HuBMAP consortium [4]. After processing the raw data we obtained samples. Moreover, we included test data from the NEPTUNE [1] study. This dataset contains , , Silver (SIL) and Trichome (TRI) stain samples. The classes glomerulus, glomerular tuft, artery and tubule were annotated and images were taken from WSIs with magnification, which we rescaled to magnification. We further processed unlabeled WSIs at magnification from the KPMP database [6]. We obtained , , , tiles.
3.1.2 Breast cancer classification datasets.
Our cancer classification datasets contain , images obtained from HE stained WSIs and tiled at magnification level. Our test set contains Cytokeratin (CK5) and Cluster of differentiation (CD8) stains with and samples. Sample sizes and binary label ratios were equalized for test sets after preprocessing and excess data were held out of experiments to avoid data leakage at patient level. Additionally we have unlabeled datasets containing , , samples. We splitted data on slide-level such that each patient appears exclusively in one dataset.
3.1.3 Comparable Methods.
We selected comparable methods from the domains of stain translation, unsupervised augmentation, and semi-supervised consistency training. (1) Baseline. For comparison, we include a naive approach where we train a model on source stains without access to target stains, thus obtain a lower bound for stain adaptation. (2) Reinhard. Reinhard’s method is a stain normalization technique that adjusts the color appearance of histopathology images by aligning them with a reference color space [12]. (3) Macenko. Macenko’s method [9] is a stain normalization technique that standardizes the color appearance of images by map** them to a reference color space. It has been shown that this method provides the best performance across various colorization techniques for downstream tasks [7]. (4) cGAN Augmentation. This method generates synthetic images with diverse staining variations in order to translate between stainings and train stain-invariant models[2]. (5) FixMatch. A semi-supervised learning approach that combines labeled and unlabeled data by enforcing consistency between predictions made on unlabeled samples based on confidence scores [13]. (6) Unsupervised Data Augmentation (UDA). UDA is a semi-supervised learning technique designed for classification tasks that leverages augmentations on unlabeled data to improve model performance by minimizing probability distributions between two versions of an image. Originally proposed for natural images [17] it has been adapted to histology images [5].
3.2 Implementation
3.2.1 cGAN pretraining.
We initially started with hyperparameters following the literature [2] and performed a grid search with details noted in SM. By visually evaluating stain-translation results, we selected a model trained with epochs and a learning rate of . For each stain translation we used unlabeled images from KPMP. For source stains, we used unlabeled HE data for cancer classification, and our labeled PAS stained training set for segmentation. Each stain translation training and inference task took around 8 hours.
3.2.2 Segmentation and Classification.
For all experiments we used a ResNet-50 as classification model and encoder model for U-Net in segmentation. We tuned hyperparameters on validation sets and set a learning rate decay of and early stop** for 5 and 10 consecutive epochs with no decrease in validation loss. We employed AdamW as optimizer and used a starting learning rate of and weight decay . Images were resized into px scale. We used an overall batch-size of 128 for all experiments. For semi-supervised learning we batched data to 32 and 96 for labeled and unlabeled data, respectively. All models were initialized with ImageNet-pretrained weights and experiments were performed on a single Nvidia A100 GPU. We measured a maximum training time across all methods of 10 and 7 hours for segmentation and classification, respectively (except Macenko, see SM).
4 Results and Discussion
In this section, we report results and question the necessary number of labels for stain adaption in kidney tissue segmentation and breast cancer classification.
Method | Segmentation | Classification | ||||||||||||||||||||||
Intra-stain | Inter-stain | Inter-stain | ||||||||||||||||||||||
PAS | TRI | HE | SIL | Overall | CK5 | CD8 | Overall | |||||||||||||||||
Baseline |
|
|
|
|
|
|
|
|
||||||||||||||||
Reinhard [12] |
|
|
|
|
|
|
|
|
||||||||||||||||
Macenko [7] |
|
|
|
|
|
|
|
|
||||||||||||||||
cGAN [2] |
|
|
|
|
|
|
|
|
||||||||||||||||
FixMatch [13] |
|
|
|
|
|
|
|
|
||||||||||||||||
UDA [5] |
|
|
|
|
|
|
|
|
||||||||||||||||
Ours |
|
|
|
|
|
|
|
|
4.0.1 Measuring stain adaption.
In the area of stain adaption, we aim to at least maintain performance on source stains (Intra-stain, Tab. 1) and maximize performance on target stains (Inter-stain, Tab. 1). For segmentation on source data our method is on par with other methods, showing we also slightly increased performance. Note that all other methods decrease performance on source stains compared to the baseline. More importantly, we increase the dice score for target stains (Inter-stain, Overall) by more than and compared to naive training and Macenko colorization, the best comparable method. However, note that despite Macenko provides the second best stain adaption result, source performance is decreased by and compared to baseline and ULSA. In the case of classification, we had no further annotated intra-stain data for testing but report results for target data. Our method increase AUROC by 4.5 and 1.2 compared to naive and best comparable method. Overall, these findings demonstrate efficient stain adaption by increasing target while not only maintaining but increasing source performance (Fig. 1a). Additionally, we trained all methods using different fractions of labeled data and tested for performance on targets in segmentation (Fig. 3a). Interestingly, ULSA shows great stain adaption even for very less annotated data compared to all other methods. ULSA is on par with the second best comparable method using only 10% of labeled data.
4.0.2 Ablation study.
We measured the influence of different components of our method in segmentation (Fig. 3b). By drop** either cGAN or FCL components, overall target performance decrease. Using hierarchical features (ULSA) instead of features from last block (LB FCL) increase scores. Apart from that, we also initialized our ULSA method with pretrained weights from a foundation model for histology (FM ULSA), obtained by large-scale learning of HE images [16]. With this setup we demonstrate that ImageNet pretrained weights yield better performance. This hints that building foundation models should incorporate stain variations to avoid catastrophic forgetting about learned morphologies.
![Refer to caption](extracted/5695564/figures/fig3.png)
5 Conclusion
We proposed ULSA, a novel SOTA strategy to decrease stain generalization errors in digital pathology tasks. Our semi-supervised learning strategy leverages annotated data for both source and artificial target stains. Moreover, we incorporate unlabeled data for stain-invariant feature consistency learning. Finally, joint optimization of supervised and unsupervised objectives enables efficient stain adaption. We empirically demonstrated that ULSA training increases performance on unlabeled target stains in patch level segmentation and classification. This suggests that ULSA is a task agnostic framework. We further showed ULSA achieves efficient stain adaption even in settings with scarce labels. A potential limitation of our approach is that even if the performance for unseen interstains is increased, augmentation strategies may not translate the correct marker information from other stains. A possible example is immunohistochemical (IHC) staining, where specific immune cells are highlighted. This could potentially affect downstream applications in certain scenarios not seen in this study. Future work could compare and replace cGAN with other GAN-based augmentation strategies such as HistAuGAN [14].
5.0.1 Acknowledgements.
This work was supported by Deutsche Forschungsgemeinschaft (DFG Project number 445703531). The authors gratefully acknowledge the computational and data resources provided by the Leibniz Supercomputing Centre (www.lrz.de).
5.0.2 Disclosure of Interests.
The authors declare that they have no conflicts of interest related to this work.
References
- [1] Barisoni, L., Nast, C.C., Jennette, J.C., Hodgin, J.B., Herzenberg, A.M., Lemley, K.V., Conway, C.M., Kopp, J.B., Kretzler, M., Lienczewski, C., Avila-Casado, C., Bagnasco, S., Sethi, S., Tomaszewski, J., Gasim, A.H., Hewitt, S.M.: Digital pathology evaluation in the multicenter nephrotic syndrome study network (neptune). Clinical Journal of the American Society of Nephrology 8(8), 1449–1459 (Aug 2013). https://doi.org/10.2215/cjn.08370812, http://dx.doi.org/10.2215/CJN.08370812
- [2] Bouteldja, N., Hölscher, D.L., Klinkhammer, B.M., Buelow, R.D., Lotz, J., Weiss, N., Daniel, C., Amann, K., Boor, P.: Stain-independent deep learning–based analysis of digital kidney histopathology. The American Journal of Pathology 193(1), 73–83 (Jan 2023). https://doi.org/10.1016/j.ajpath.2022.09.011, http://dx.doi.org/10.1016/j.ajpath.2022.09.011
- [3] Bouteldja, N., Klinkhammer, B.M., Bülow, R.D., Droste, P., Otten, S.W., Freifrau von Stillfried, S., Moellmann, J., Sheehan, S.M., Korstanje, R., Menzel, S., Bankhead, P., Mietsch, M., Drummer, C., Lehrke, M., Kramann, R., Floege, J., Boor, P., Merhof, D.: Deep learning–based segmentation and quantification in experimental kidney histopathology. Journal of the American Society of Nephrology 32(1), 52–68 (Nov 2020). https://doi.org/10.1681/asn.2020050597, http://dx.doi.org/10.1681/ASN.2020050597
- [4] Howard, A., Lawrence, A., Sims, B., Tinsley, E., Kazmierczak, J., Borner, K., Godwin, L., Novaes, M., Culliton, P., Holland, R., Watson, R., Ju, Y.: Hubmap - hacking the kidney (2020), https://kaggle.com/competitions/hubmap-kidney-segmentation
- [5] Jiang, Y., Sui, X., Ding, Y., Xiao, W., Zheng, Y., Zhang, Y.: A semi-supervised learning approach with consistency regularization for tumor histopathological images analysis. Frontiers in Oncology 12 (Jan 2023). https://doi.org/10.3389/fonc.2022.1044026, http://dx.doi.org/10.3389/fonc.2022.1044026
- [6] Kidney Precision Medicine Project: Kidney Precision Medicine Project Data. Accessed September 01, 2023. https://www.kpmp.org, the results here are in whole or part based upon data generated by the Kidney Precision Medicine Project. Funded by the National Institute of Diabetes and Digestive and Kidney Diseases
- [7] Lampert, T., Merveille, O., Schmitz, J., Forestier, G., Feuerhake, F., Wemmert, C.: Strategies for training stain invariant cnns. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE (Apr 2019). https://doi.org/10.1109/isbi.2019.8759266, http://dx.doi.org/10.1109/ISBI.2019.8759266
- [8] Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5(6), 555–570 (2021)
- [9] Macenko, M., Niethammer, M., Marron, J.S., Borland, D., Woosley, J.T., Guan, X., Schmitt, C., Thomas, N.E.: A method for normalizing histology slides for quantitative analysis. In: 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE (Jun 2009). https://doi.org/10.1109/isbi.2009.5193250, http://dx.doi.org/10.1109/ISBI.2009.5193250
- [10] Miyato, T., Maeda, S.i., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence 41(8), 1979–1993 (2018)
- [11] Moor, M., Banerjee, O., Abad, Z.S.H., Krumholz, H.M., Leskovec, J., Topol, E.J., Rajpurkar, P.: Foundation models for generalist medical artificial intelligence. Nature 616(7956), 259–265 (Apr 2023). https://doi.org/10.1038/s41586-023-05881-4, http://dx.doi.org/10.1038/s41586-023-05881-4
- [12] Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Computer graphics and applications 21(5), 34–41 (2001)
- [13] Sohn, K., Berthelot, D., Li, C.L., Zhang, Z., Carlini, N., Cubuk, E.D., Kurakin, A., Zhang, H., Raffel, C.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence (2020)
- [14] Wagner, S.J., Khalili, N., Sharma, R., Boxberg, M., Marr, C., de Back, W., Peng, T.: Structure-preserving multi-domain stain color augmentation using style-transfer with disentangled representations. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021 (2021)
- [15] Wagner, S.J., Reisenbüchler, D., West, N.P., Niehues, J.M., Zhu, J., Foersch, S., Veldhuizen, G.P., Quirke, P., Grabsch, H.I., van den Brandt, P.A., Hutchins, G.G., Richman, S.D., Yuan, T., Langer, R., Jenniskens, J.C., Offermans, K., Mueller, W., Gray, R., Gruber, S.B., Greenson, J.K., Rennert, G., Bonner, J.D., Schmolze, D., Jonnagaddala, J., Hawkins, N.J., Ward, R.L., Morton, D., Seymour, M., Magill, L., Nowak, M., Hay, J., Koelzer, V.H., Church, D.N., Matek, C., Geppert, C., Peng, C., Zhi, C., Ouyang, X., James, J.A., Loughrey, M.B., Salto-Tellez, M., Brenner, H., Hoffmeister, M., Truhn, D., Schnabel, J.A., Boxberg, M., Peng, T., Kather, J.N., Church, D., Domingo, E., Edwards, J., Glimelius, B., Gogenur, I., Harkin, A., Hay, J., Iveson, T., Jaeger, E., Kelly, C., Kerr, R., Maka, N., Morgan, H., Oien, K., Orange, C., Palles, C., Roxburgh, C., Sansom, O., Saunders, M., Tomlinson, I.: Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell 41(9), 1650–1661.e4 (Sep 2023). https://doi.org/10.1016/j.ccell.2023.08.002, http://dx.doi.org/10.1016/j.ccell.2023.08.002
- [16] Wang, X., Du, Y., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Retccl: Clustering-guided contrastive learning for whole-slide image retrieval. Medical Image Analysis 83, 102645 (Jan 2023). https://doi.org/10.1016/j.media.2022.102645, http://dx.doi.org/10.1016/j.media.2022.102645
- [17] Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training (2020)
- [18] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV). pp. 2242–2251 (2017). https://doi.org/10.1109/ICCV.2017.244
- [19] Zingman, I., Frayle, S., Tankoyeu, I., Sukhanov, S., Heinemann, F.: A comparative evaluation of image-to-image translation methods for stain transfer in histopathology (2023)