11institutetext: Institute of Image Analysis and Computer Vision, University of Regensburg, Regensburg, Germany 22institutetext: Institute of Pathology, Medical School Hannover, Hannover, Germany 33institutetext: Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
\starThese authors contributed equally to this work.
@Correspondence: [email protected]

Unsupervised Latent Stain Adaption
for Digital Pathology

Daniel Reisenbüchler\star,@ 11    Lucas Luttner\star 11    Nadine S. Schaadt 22    Friedrich Feuerhake 22    Dorit Merhof 1133
Abstract

In digital pathology, deep learning (DL) models for tasks such as segmentation or tissue classification are known to suffer from domain shifts due to different staining techniques. Stain adaptation aims to reduce the generalization error between different stains by training a model on source stains that generalizes to target stains. Despite the abundance of target stain data, a key challenge is the lack of annotations. To address this, we propose a joint training between artificially labeled and unlabeled data including all available stained images called Unsupervised Latent Stain Adaption (ULSA). Our method uses stain translation to enrich labeled source images with synthetic target images in order to increase supervised signals. Moreover, we leverage unlabeled target stain images using stain-invariant feature consistency learning. With ULSA we present a semi-supervised strategy for efficient stain adaption without access to annotated target stain data. Remarkably, ULSA is task agnostic in patch-level analysis for whole slide images (WSIs). Through extensive evaluation on external datasets, we demonstrate that ULSA achieves state-of-the-art (SOTA) performance in kidney tissue segmentation and breast cancer classification across a spectrum of staining variations. Our findings suggest that ULSA is an important framework towards stain adaption in digital pathology.

Keywords:
Semi-supervised Learning Stain Adaption Whole Slide Image Transfer Learning Segmentation Classification

1 Introduction

Recent advances in DL for digital pathology have shown promising results for a wide range of applications, from cancer and biomarker detection to tissue structure segmentation [3]. However, large-scale studies have indicated that the effectiveness of DL techniques in histology is heavily dependent on the availability of labeled data [15]. Despite its theoretical promise, acquiring a sufficient number of expert annotations remains challenging. In the realm of digital pathology, image datasets often consist of sequential slides stained with various techniques, each providing distinct insights into the same region of interest. Despite variations in staining protocols, these slides frequently share a significant amount of consistent information. However, expert annotations may be available for one type of staining but may be lacking for others, which are often accessible in large quantities without labels. Generating expert annotations for multiple staining techniques for the same analysis tasks would be exceedingly time-consuming. In the era of foundational models [11], we also prefer generalized DL models robust to data shifts instead of domain experts. In this paper, we question how to tailor a DL model trained for a specific task to handle variations in staining within the distribution of target stains, for which no annotations are available. This can be accomplished by incorporating unlabeled data during the training phase. The aspect of stain adaptation across different inter-staining techniques has not been sufficiently explored so far. Despite efforts to develop stain-to-stain translation techniques, their effectiveness is typically evaluated either visually by experts or through translation metrics [19]. Prior research has not focused on directly incorporating unlabeled target stain images into the training process yet, only using them for translation [2, 7]. Here, we present ULSA, a semi-supervised strategy designed for joint training of all staining data for the first time. We introduce a framework that integrates unlabeled target stain images into supervised training by maintaining the supervised learning signal for synthetic target stainings generated through cycle GAN (cGAN) inference [18]. Feature-wise stain-adaption enables using unlabeled target data and enforces feature consistency across stains. Combining these key ingredients, we propose a new method for efficient stain adaption, that outperforms current SOTA approaches. Our novelties can be summarized as

  • (1) Unsupervised stain adaption. ULSA leverages target stain data in a supervised and unsupervised fashion with only annotated source stains. We propose a framework for training stain-invariant models for digital pathology.

  • (2) Feature consistency learning. We maximize cosine-similarity between hierarchical features across stains to achieve stain-invariance on feature level.

  • (3) Task agnostic framework. ULSA is applicable for classification and segmentation training of stain-invariant models.

  • (4) Outperforming SOTA. Our approach outperforms methods from stain-translation, DL based augmentation and semi-supervised learning slightly for source stains and by a large margin for target stains. Our approach only needs 10% of labels to reach the same performance as SOTA trained with all data.

2 Method

Stain adaption aims to minimize the generalization error in task performance between a source staining sS𝑠𝑆s\in Sitalic_s ∈ italic_S and a target staining tT𝑡𝑇t\in Titalic_t ∈ italic_T (Fig. 1a). In particular, a parameterized model mθsubscript𝑚𝜃m_{\theta}italic_m start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT trained on labeled source staining data xSLsuperscriptsubscript𝑥𝑆𝐿x_{S}^{L}italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT should ideally maintain task performance on other unlabeled target stainings xTUsuperscriptsubscript𝑥𝑇𝑈x_{T}^{U}italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT where no labels are available. We propose to address this challenge by incorporating unlabeled target stainings through (i) a cGAN model to augment labeled images into target stains inheriting the same annotation, (ii) unsupervised stain adaption (USA) to jointly train on all stains with supervised and unsupervised objectives including all stains, followed by (iii) stain-invariant feature consistency learning (FCL) by unsupervised matching of latent representations between stainings. The overall method is outlined in Fig. 2.

Refer to caption
Figure 1: (a). Problem statement of unsupervised stain adaption. (b). Stain-invariant feature consistency learning. (c). Artificial images generated by cGAN.

2.0.1 (i) cGAN stain augmentation.

After pretraining we used cGANs, which we define as stain translation function 𝒢(xSL)=xSTL𝒢superscriptsubscript𝑥𝑆𝐿superscriptsubscript𝑥𝑆𝑇𝐿\mathcal{G}(x_{S}^{L})=x_{S\cup T}^{L}caligraphic_G ( italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) = italic_x start_POSTSUBSCRIPT italic_S ∪ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT to synthetically augment source training data into target stainings. This process is structure preservering [18], thus each target stain image inherits the label corresponding to the associated source image xSLsuperscriptsubscript𝑥𝑆𝐿x_{S}^{L}italic_x start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT used for translation. Fig. 1c shows exemplary translation results. This strategy aims to increase the labeled training dataset and thus the supervisory signal to achieve stain-invariance on prediction level.

Refer to caption
Figure 2: ULSA model. Labeled source stains are translated into synthetic target stain data to obtain supervision for image-wise stain-invariance. We extract features for a real and stain translated noised image of target stained data, where we maximize cosine similarity to achieve unsupervised feature-wise stain-invariance in latent space.

2.0.2 (ii) Unsupervised stain adaption.

Since labeled data are given only in source staining xLpS(xL)similar-tosuperscript𝑥𝐿subscript𝑝𝑆superscript𝑥𝐿x^{L}\sim p_{S}(x^{L})italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ), we would ideally desire a map** from source to target samples. We approximate this map** by using |S||T|𝑆𝑇|S|\cdot|T|| italic_S | ⋅ | italic_T | distinct cGAN augmenter to obtain additional labeled target samples xLp^tT(xLxU)similar-tosuperscript𝑥𝐿subscript^𝑝𝑡𝑇conditionalsuperscript𝑥𝐿superscript𝑥𝑈x^{L}\sim\hat{p}_{t\in T}(x^{L}\mid x^{U})italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∼ over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∣ italic_x start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ). With |||\cdot|| ⋅ | we denote the cardinality of a set of stains. Note that all inferred samples xLsuperscript𝑥𝐿x^{L}italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT inherit the label y𝑦yitalic_y associated with xLsuperscript𝑥𝐿x^{L}italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT. We approximate pSTsubscript𝑝𝑆𝑇p_{S\cup T}italic_p start_POSTSUBSCRIPT italic_S ∪ italic_T end_POSTSUBSCRIPT by a mixture

p^ST(xL)=1|S|+|T|(sSps(xL)+tTp^t(xL)).subscript^𝑝𝑆𝑇superscript𝑥𝐿1𝑆𝑇subscript𝑠𝑆subscript𝑝𝑠superscript𝑥𝐿subscript𝑡𝑇subscript^𝑝𝑡superscript𝑥𝐿\hat{p}_{S\cup T}\left(x^{L}\right)=\frac{1}{|S|+|T|}\left(\sum_{s\in S}p_{s}% \left(x^{L}\right)+\sum_{t\in T}\hat{p}_{t}\left(x^{L}\right)\right).over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_S ∪ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG | italic_S | + | italic_T | end_ARG ( ∑ start_POSTSUBSCRIPT italic_s ∈ italic_S end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) ) .

In addition, we leverage unlabeled data by unsupervised learning. Given an unlabeled image xtTU,1superscriptsubscript𝑥𝑡𝑇𝑈1x_{t\in T}^{U,1}italic_x start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 1 end_POSTSUPERSCRIPT, we translate it to x~tTU,1=((xtTU,1xtTU,2))superscriptsubscript~𝑥𝑡𝑇𝑈1conditionalsuperscriptsubscript𝑥𝑡𝑇𝑈1superscriptsubscript𝑥𝑡𝑇𝑈2\tilde{x}_{t\in T}^{U,1}=\mathcal{I}\left(\mathcal{R}\left(x_{t\in T}^{U,1}% \mid x_{t\in T}^{U,2}\right)\right)over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 1 end_POSTSUPERSCRIPT = caligraphic_I ( caligraphic_R ( italic_x start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 1 end_POSTSUPERSCRIPT ∣ italic_x start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 2 end_POSTSUPERSCRIPT ) ) by Reinhard translation \mathcal{R}caligraphic_R and noise injection \mathcal{I}caligraphic_I. Note that \mathcal{R}caligraphic_R receives another unlabeled random sampled image xtTU,2superscriptsubscript𝑥𝑡𝑇𝑈2x_{t\in T}^{U,2}italic_x start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 2 end_POSTSUPERSCRIPT from target stains tT𝑡𝑇t\in Titalic_t ∈ italic_T as reference for subsequent translations. Fig. 1b include example inputs. We used Reinhard normalization as Macenkos method has much higher runtime leading to computational overhead, see supplementary material (SM). Finally, we enforce the model to embed images invariant to stain translations by maximizing cosine similarity in unsupervised loss Usubscript𝑈\mathcal{L}_{U}caligraphic_L start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, see (iii). For this reason, we used light gaussian blurring for noise injection, see SM for other choices. In summary, we define our objective

minθ(θxL,y,xU)=𝔼xLp^ST(xL)[S(θxL,y)]+λ𝔼xUpST(xU)[U(θxU)]subscript𝜃conditional𝜃superscript𝑥𝐿𝑦superscript𝑥𝑈similar-tosuperscript𝑥𝐿subscript^𝑝𝑆𝑇superscript𝑥𝐿𝔼delimited-[]subscript𝑆conditional𝜃superscript𝑥𝐿𝑦𝜆similar-tosuperscript𝑥𝑈subscript𝑝𝑆𝑇superscript𝑥𝑈𝔼delimited-[]subscript𝑈conditional𝜃superscript𝑥𝑈\min_{\theta}\mathcal{L}\left(\theta\mid x^{L},y,x^{U}\right)=\underset{x^{L}% \sim\hat{p}_{S\cup T}\left(x^{L}\right)}{\mathbb{E}}\left[\mathcal{L}_{S}\left% (\theta\mid x^{L},y\right)\right]+\lambda\underset{x^{U}\sim p_{S\cup T}\left(% x^{U}\right)}{\mathbb{E}}\left[\mathcal{L}_{U}\left(\theta\mid x^{U}\right)\right]roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ∣ italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT , italic_y , italic_x start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ) = start_UNDERACCENT italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ∼ over^ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_S ∪ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG blackboard_E end_ARG [ caligraphic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_θ ∣ italic_x start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT , italic_y ) ] + italic_λ start_UNDERACCENT italic_x start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_S ∪ italic_T end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ) end_UNDERACCENT start_ARG blackboard_E end_ARG [ caligraphic_L start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_θ ∣ italic_x start_POSTSUPERSCRIPT italic_U end_POSTSUPERSCRIPT ) ]

where Ssubscript𝑆\mathcal{L}_{S}caligraphic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT is a supervised loss. With minimizing Ssubscript𝑆\mathcal{L}_{S}caligraphic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT using labeled images of synthetic target stains, we aim for stain-adaption on prediction level. We use multi-class and binary cross entropy loss for segmentation and classification, respectively. We set equal weight λ=1𝜆1\lambda=1italic_λ = 1, obtained by performing a grid search as described in SM. In each iteration we compute Ssubscript𝑆\mathcal{L}_{S}caligraphic_L start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT on a batch of labeled data and compute feature consistency with Usubscript𝑈\mathcal{L}_{U}caligraphic_L start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT on a batch of unlabeled data. Next, we detail the unsupervised loss term associated with feature consistency learning.

2.0.3 (iii) Stain-invariant feature consistency learning.

For unsupervised learning, we aim to calculate similar feature representations for an input x=xtTU,1𝑥superscriptsubscript𝑥𝑡𝑇𝑈1x=x_{t\in T}^{U,1}italic_x = italic_x start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 1 end_POSTSUPERSCRIPT and a stain translated noised version x~=x~tTU,1~𝑥superscriptsubscript~𝑥𝑡𝑇𝑈1\tilde{x}=\tilde{x}_{t\in T}^{U,1}over~ start_ARG italic_x end_ARG = over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t ∈ italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_U , 1 end_POSTSUPERSCRIPT. To enforce this for every downsampling block in a model, we apply non-parametric 2D adaptive average pooling. In Einstein notation, this tensor operation leads to B,Ci,Hi,WiB,Ciformulae-sequence𝐵subscript𝐶𝑖subscript𝐻𝑖subscript𝑊𝑖𝐵subscript𝐶𝑖B,C_{i},H_{i},W_{i}\rightarrow B,C_{i}italic_B , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_B , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where B𝐵Bitalic_B, C𝐶Citalic_C, H𝐻Hitalic_H, W𝑊Witalic_W and i𝑖iitalic_i refers to batch-size, channels, height, width and the model block, respectively (Fig. 1b). We aim to maximize the cosine similarity between all hierarchical features from a model mθsubscript𝑚𝜃m_{\theta}italic_m start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT. In this way, we enforce feature similarity by updating the model such that extracted image features in latent space become similar. We define features as model outputs until block i𝑖iitalic_i after 2D adaptive pooling, fθ¯,i=mθ¯,i(x)subscript𝑓¯𝜃𝑖subscript𝑚¯𝜃𝑖𝑥f_{\bar{\theta},i}=m_{\bar{\theta},i}(x)italic_f start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG , italic_i end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG , italic_i end_POSTSUBSCRIPT ( italic_x ), f~θ,i=mθ,i(x~)subscript~𝑓𝜃𝑖subscript𝑚𝜃𝑖~𝑥\tilde{f}_{\theta,i}=m_{\theta,i}(\tilde{x})over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_θ , italic_i end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT italic_θ , italic_i end_POSTSUBSCRIPT ( over~ start_ARG italic_x end_ARG ). For brevity, index θ,i𝜃𝑖\theta,iitalic_θ , italic_i also refers to parameter optimization from input layer until block i𝑖iitalic_i. Following the literature [10, 17], we do not backpropagate gradients for non-augmented input, denoted with θ¯¯𝜃\bar{\theta}over¯ start_ARG italic_θ end_ARG. Hence, we define our unsupervised objective usubscript𝑢\mathcal{L}_{u}caligraphic_L start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT as

U(θ)=1bib𝐟θ¯,i𝐟~θ,i𝐟θ¯,i2𝐟~θ,i2,subscript𝑈𝜃1𝑏subscript𝑖𝑏subscript𝐟¯𝜃𝑖subscript~𝐟𝜃𝑖subscriptdelimited-∥∥subscript𝐟¯𝜃𝑖2subscriptdelimited-∥∥subscript~𝐟𝜃𝑖2\mathcal{L}_{U}(\theta)=-\frac{1}{b}\sum_{i\in b}\frac{\mathbf{f}_{\bar{\theta% },i}\cdot\tilde{\mathbf{f}}_{\theta,i}}{\lVert\mathbf{f}_{\bar{\theta},i}% \rVert_{2}\cdot\lVert\tilde{\mathbf{f}}_{\theta,i}\rVert_{2}},caligraphic_L start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_θ ) = - divide start_ARG 1 end_ARG start_ARG italic_b end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_b end_POSTSUBSCRIPT divide start_ARG bold_f start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG , italic_i end_POSTSUBSCRIPT ⋅ over~ start_ARG bold_f end_ARG start_POSTSUBSCRIPT italic_θ , italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_f start_POSTSUBSCRIPT over¯ start_ARG italic_θ end_ARG , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ∥ over~ start_ARG bold_f end_ARG start_POSTSUBSCRIPT italic_θ , italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ,

where 2subscriptdelimited-∥∥2\lVert\cdot\rVert_{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT denotes the Euclidean norm and b𝑏bitalic_b the number of downsampling blocks. With minimizing Usubscript𝑈\mathcal{L}_{U}caligraphic_L start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, we force stain-invariance on feature level.

3 Experiments

We applied our proposed method to kidney tissue segmentation and cancer classification. We measured the performance using the dice score and area under the receiver operating characteristic curve (AUROC), respectively. In the following we describe our datasets, more information about statistics can be found in SM.

3.1 Datasets and Comparable Methods

For slide tiling, we used a modified version of the CLAM preprocessing pipeline [8] and manually selected color thresholds as different stains require adjustments in tissue detection. All image data were processed with a size of 512×512512512512\times 512512 × 512 px.

3.1.1 Kidney segmentation datasets.

Our internal annotated train ntrainPAS=2100superscriptsubscript𝑛𝑡𝑟𝑎𝑖𝑛𝑃𝐴𝑆2100n_{train}^{PAS}=2100italic_n start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P italic_A italic_S end_POSTSUPERSCRIPT = 2100 and validation nvalPAS=160superscriptsubscript𝑛𝑣𝑎𝑙𝑃𝐴𝑆160n_{val}^{PAS}=160italic_n start_POSTSUBSCRIPT italic_v italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P italic_A italic_S end_POSTSUPERSCRIPT = 160 datasets consist of annotated patches extracted from periodic acid-Schiff (PAS) stained WSIs with 20×20\times20 × magnification. Annotation masks contain the classes tubule, glomerulus, glomerular tuft, artery, arterial lumen, and vein. For external testing we included annotated glomerulus images from PAS stained WSIs provided by HuBMAP consortium [4]. After processing the raw data we obtained next,hubmapPAS=2670superscriptsubscript𝑛𝑒𝑥𝑡𝑢𝑏𝑚𝑎𝑝𝑃𝐴𝑆2670n_{ext,hubmap}^{PAS}=2670italic_n start_POSTSUBSCRIPT italic_e italic_x italic_t , italic_h italic_u italic_b italic_m italic_a italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P italic_A italic_S end_POSTSUPERSCRIPT = 2670 samples. Moreover, we included test data from the NEPTUNE [1] study. This dataset contains next,neptuneHE=402superscriptsubscript𝑛𝑒𝑥𝑡𝑛𝑒𝑝𝑡𝑢𝑛𝑒𝐻𝐸402n_{ext,neptune}^{HE}=402italic_n start_POSTSUBSCRIPT italic_e italic_x italic_t , italic_n italic_e italic_p italic_t italic_u italic_n italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_E end_POSTSUPERSCRIPT = 402, next,neptunePAS=1176superscriptsubscript𝑛𝑒𝑥𝑡𝑛𝑒𝑝𝑡𝑢𝑛𝑒𝑃𝐴𝑆1176n_{ext,neptune}^{PAS}=1176italic_n start_POSTSUBSCRIPT italic_e italic_x italic_t , italic_n italic_e italic_p italic_t italic_u italic_n italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P italic_A italic_S end_POSTSUPERSCRIPT = 1176, next,neptuneSIL=688superscriptsubscript𝑛𝑒𝑥𝑡𝑛𝑒𝑝𝑡𝑢𝑛𝑒𝑆𝐼𝐿688n_{ext,neptune}^{SIL}=688italic_n start_POSTSUBSCRIPT italic_e italic_x italic_t , italic_n italic_e italic_p italic_t italic_u italic_n italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S italic_I italic_L end_POSTSUPERSCRIPT = 688 Silver (SIL) and next,neptuneTRI=817superscriptsubscript𝑛𝑒𝑥𝑡𝑛𝑒𝑝𝑡𝑢𝑛𝑒𝑇𝑅𝐼817n_{ext,neptune}^{TRI}=817italic_n start_POSTSUBSCRIPT italic_e italic_x italic_t , italic_n italic_e italic_p italic_t italic_u italic_n italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T italic_R italic_I end_POSTSUPERSCRIPT = 817 Trichome (TRI) stain samples. The classes glomerulus, glomerular tuft, artery and tubule were annotated and images were taken from WSIs with 40×40\times40 × magnification, which we rescaled to 20×20\times20 × magnification. We further processed unlabeled WSIs at 20×20\times20 × magnification from the KPMP database [6]. We obtained nunlabeledHE=385670superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝐻𝐸385670n_{unlabeled}^{HE}=385670italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_E end_POSTSUPERSCRIPT = 385670, nunlabeledPAS=409554superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝑃𝐴𝑆409554n_{unlabeled}^{PAS}=409554italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P italic_A italic_S end_POSTSUPERSCRIPT = 409554, nunlabeledSIL=538412superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝑆𝐼𝐿538412n_{unlabeled}^{SIL}=538412italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S italic_I italic_L end_POSTSUPERSCRIPT = 538412, nunlabeledTRI=415822superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝑇𝑅𝐼415822n_{unlabeled}^{TRI}=415822italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T italic_R italic_I end_POSTSUPERSCRIPT = 415822 tiles.

3.1.2 Breast cancer classification datasets.

Our cancer classification datasets contain ntrainHE=1950superscriptsubscript𝑛𝑡𝑟𝑎𝑖𝑛𝐻𝐸1950n_{train}^{HE}=1950italic_n start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_E end_POSTSUPERSCRIPT = 1950, nvalHE=287superscriptsubscript𝑛𝑣𝑎𝑙𝐻𝐸287n_{val}^{HE}=287italic_n start_POSTSUBSCRIPT italic_v italic_a italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_E end_POSTSUPERSCRIPT = 287 images obtained from HE stained WSIs and tiled at 40×40\times40 × magnification level. Our test set contains Cytokeratin (CK5) and Cluster of differentiation (CD8) stains with ntestCK5=1000superscriptsubscript𝑛𝑡𝑒𝑠𝑡𝐶𝐾51000n_{test}^{CK5}=1000italic_n start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C italic_K 5 end_POSTSUPERSCRIPT = 1000 and ntestCD8=1000superscriptsubscript𝑛𝑡𝑒𝑠𝑡𝐶𝐷81000n_{test}^{CD8}=1000italic_n start_POSTSUBSCRIPT italic_t italic_e italic_s italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C italic_D 8 end_POSTSUPERSCRIPT = 1000 samples. Sample sizes and binary label ratios were equalized for test sets after preprocessing and excess data were held out of experiments to avoid data leakage at patient level. Additionally we have unlabeled datasets containing nunlabeledHE=117369superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝐻𝐸117369n_{unlabeled}^{HE}=117369italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H italic_E end_POSTSUPERSCRIPT = 117369, nunlabeledCK5=41735superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝐶𝐾541735n_{unlabeled}^{CK5}=41735italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C italic_K 5 end_POSTSUPERSCRIPT = 41735, nunlabeledCD8=52492superscriptsubscript𝑛𝑢𝑛𝑙𝑎𝑏𝑒𝑙𝑒𝑑𝐶𝐷852492n_{unlabeled}^{CD8}=52492italic_n start_POSTSUBSCRIPT italic_u italic_n italic_l italic_a italic_b italic_e italic_l italic_e italic_d end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C italic_D 8 end_POSTSUPERSCRIPT = 52492 samples. We splitted data on slide-level such that each patient appears exclusively in one dataset.

3.1.3 Comparable Methods.

We selected comparable methods from the domains of stain translation, unsupervised augmentation, and semi-supervised consistency training. (1) Baseline. For comparison, we include a naive approach where we train a model on source stains without access to target stains, thus obtain a lower bound for stain adaptation. (2) Reinhard. Reinhard’s method is a stain normalization technique that adjusts the color appearance of histopathology images by aligning them with a reference color space [12]. (3) Macenko. Macenko’s method [9] is a stain normalization technique that standardizes the color appearance of images by map** them to a reference color space. It has been shown that this method provides the best performance across various colorization techniques for downstream tasks [7]. (4) cGAN Augmentation. This method generates synthetic images with diverse staining variations in order to translate between stainings and train stain-invariant models[2]. (5) FixMatch. A semi-supervised learning approach that combines labeled and unlabeled data by enforcing consistency between predictions made on unlabeled samples based on confidence scores [13]. (6) Unsupervised Data Augmentation (UDA). UDA is a semi-supervised learning technique designed for classification tasks that leverages augmentations on unlabeled data to improve model performance by minimizing probability distributions between two versions of an image. Originally proposed for natural images [17] it has been adapted to histology images [5].

3.2 Implementation

3.2.1 cGAN pretraining.

We initially started with hyperparameters following the literature [2] and performed a grid search with details noted in SM. By visually evaluating stain-translation results, we selected a model trained with 300300300300 epochs and a learning rate of 1.5e41.5𝑒41.5e-41.5 italic_e - 4. For each stain translation we used 10,0001000010,00010 , 000 unlabeled images from KPMP. For source stains, we used unlabeled HE data for cancer classification, and our labeled PAS stained training set for segmentation. Each stain translation training and inference task took around 8 hours.

3.2.2 Segmentation and Classification.

For all experiments we used a ResNet-50 as classification model and encoder model for U-Net in segmentation. We tuned hyperparameters on validation sets and set a learning rate decay of 1e101𝑒101e-101 italic_e - 10 and early stop** for 5 and 10 consecutive epochs with no decrease in validation loss. We employed AdamW as optimizer and used a starting learning rate of 1e041𝑒041e-041 italic_e - 04 and weight decay 1e051𝑒051e-051 italic_e - 05. Images were resized into 224×224224224224\times 224224 × 224 px scale. We used an overall batch-size of 128 for all experiments. For semi-supervised learning we batched data to 32 and 96 for labeled and unlabeled data, respectively. All models were initialized with ImageNet-pretrained weights and experiments were performed on a single Nvidia A100 GPU. We measured a maximum training time across all methods of 10 and 7 hours for segmentation and classification, respectively (except Macenko, see SM).

4 Results and Discussion

In this section, we report results and question the necessary number of labels for stain adaption in kidney tissue segmentation and breast cancer classification.

Table 1: Dice and AUROC scores for segmentation and classification tasks, respectively. We report mean and standard deviation across three consecutive runs. All methods except for Baseline access additional unlabeled target stains in trainings phase.
Method Segmentation Classification
Intra-stain Inter-stain Inter-stain
PAS TRI HE SIL Overall CK5 CD8 Overall
Baseline
87.4
(0.66)
59.4
(4.19)
43.2
(4.01)
76.4
(1.67)
62.1
(3.24)
86.6
(9.61)
90.5
(7.10)
88.6
(8.35)
Reinhard [12]
87.3
(0.50)
64.8
(6.50)
40.2
(3.23)
77.9
(2.57)
64.3
(4.39)
89.8
(3.65)
94.1
(1.48)
91.9
(2.32)
Macenko [7]
85.0
(0.80)
71.6
(2.60)
48.3
(4.70)
81.6
(0.50)
70.3
(2.29)
89.5
(2.90)
93.4
(2.06)
91.4
(2.48)
cGAN [2]
84.2
(0.92)
69.9
(1.44)
46.0
(4.71)
79.0
(1.63)
68.1
(2.18)
87.1
(6.24)
88.0
(3.32)
87.5
(3.67)
FixMatch [13]
87.2
(0.51)
64.8
(2.67)
40.3
(7.28)
78.2
(1.02)
64.5
(3.05)
88.3
(2.25)
94.3
(1.37)
91.3
(1.78)
UDA [5]
87.2
(0.54)
64.9
(1.84)
45.8
(2.49)
77.5
(0.60)
65.4
(1.53)
89.7
(2.19)
92.8
(1.97)
91.3
(1.47)
Ours
87.9
(0.33)
74.1
(0.94)
53.6
(1.46)
81.8
(0.61)
72.6
(0.93)
91.6
(4.63)
94.6
(1.40)
93.1
(3.01)

4.0.1 Measuring stain adaption.

In the area of stain adaption, we aim to at least maintain performance on source stains (Intra-stain, Tab. 1) and maximize performance on target stains (Inter-stain, Tab. 1). For segmentation on source data our method is on par with other methods, showing we also slightly increased performance. Note that all other methods decrease performance on source stains compared to the baseline. More importantly, we increase the dice score for target stains (Inter-stain, Overall) by more than 10101010 and 2.32.32.32.3 compared to naive training and Macenko colorization, the best comparable method. However, note that despite Macenko provides the second best stain adaption result, source performance is decreased by 2.42.4-2.4- 2.4 and 2.92.9-2.9- 2.9 compared to baseline and ULSA. In the case of classification, we had no further annotated intra-stain data for testing but report results for target data. Our method increase AUROC by 4.5 and 1.2 compared to naive and best comparable method. Overall, these findings demonstrate efficient stain adaption by increasing target while not only maintaining but increasing source performance (Fig. 1a). Additionally, we trained all methods using different fractions of labeled data and tested for performance on targets in segmentation (Fig. 3a). Interestingly, ULSA shows great stain adaption even for very less annotated data compared to all other methods. ULSA is on par with the second best comparable method using only 10% of labeled data.

4.0.2 Ablation study.

We measured the influence of different components of our method in segmentation (Fig. 3b). By drop** either cGAN or FCL components, overall target performance decrease. Using hierarchical features (ULSA) instead of features from last block (LB FCL) increase scores. Apart from that, we also initialized our ULSA method with pretrained weights from a foundation model for histology (FM ULSA), obtained by large-scale learning of HE images [16]. With this setup we demonstrate that ImageNet pretrained weights yield better performance. This hints that building foundation models should incorporate stain variations to avoid catastrophic forgetting about learned morphologies.

Refer to caption
Figure 3: (a). Performance for different fractions of labeled data. (b). Ablation study.

5 Conclusion

We proposed ULSA, a novel SOTA strategy to decrease stain generalization errors in digital pathology tasks. Our semi-supervised learning strategy leverages annotated data for both source and artificial target stains. Moreover, we incorporate unlabeled data for stain-invariant feature consistency learning. Finally, joint optimization of supervised and unsupervised objectives enables efficient stain adaption. We empirically demonstrated that ULSA training increases performance on unlabeled target stains in patch level segmentation and classification. This suggests that ULSA is a task agnostic framework. We further showed ULSA achieves efficient stain adaption even in settings with scarce labels. A potential limitation of our approach is that even if the performance for unseen interstains is increased, augmentation strategies may not translate the correct marker information from other stains. A possible example is immunohistochemical (IHC) staining, where specific immune cells are highlighted. This could potentially affect downstream applications in certain scenarios not seen in this study. Future work could compare and replace cGAN with other GAN-based augmentation strategies such as HistAuGAN [14].

5.0.1 Acknowledgements.

This work was supported by Deutsche Forschungsgemeinschaft (DFG Project number 445703531). The authors gratefully acknowledge the computational and data resources provided by the Leibniz Supercomputing Centre (www.lrz.de).

5.0.2 Disclosure of Interests.

The authors declare that they have no conflicts of interest related to this work.

References

  • [1] Barisoni, L., Nast, C.C., Jennette, J.C., Hodgin, J.B., Herzenberg, A.M., Lemley, K.V., Conway, C.M., Kopp, J.B., Kretzler, M., Lienczewski, C., Avila-Casado, C., Bagnasco, S., Sethi, S., Tomaszewski, J., Gasim, A.H., Hewitt, S.M.: Digital pathology evaluation in the multicenter nephrotic syndrome study network (neptune). Clinical Journal of the American Society of Nephrology 8(8), 1449–1459 (Aug 2013). https://doi.org/10.2215/cjn.08370812, http://dx.doi.org/10.2215/CJN.08370812
  • [2] Bouteldja, N., Hölscher, D.L., Klinkhammer, B.M., Buelow, R.D., Lotz, J., Weiss, N., Daniel, C., Amann, K., Boor, P.: Stain-independent deep learning–based analysis of digital kidney histopathology. The American Journal of Pathology 193(1), 73–83 (Jan 2023). https://doi.org/10.1016/j.ajpath.2022.09.011, http://dx.doi.org/10.1016/j.ajpath.2022.09.011
  • [3] Bouteldja, N., Klinkhammer, B.M., Bülow, R.D., Droste, P., Otten, S.W., Freifrau von Stillfried, S., Moellmann, J., Sheehan, S.M., Korstanje, R., Menzel, S., Bankhead, P., Mietsch, M., Drummer, C., Lehrke, M., Kramann, R., Floege, J., Boor, P., Merhof, D.: Deep learning–based segmentation and quantification in experimental kidney histopathology. Journal of the American Society of Nephrology 32(1), 52–68 (Nov 2020). https://doi.org/10.1681/asn.2020050597, http://dx.doi.org/10.1681/ASN.2020050597
  • [4] Howard, A., Lawrence, A., Sims, B., Tinsley, E., Kazmierczak, J., Borner, K., Godwin, L., Novaes, M., Culliton, P., Holland, R., Watson, R., Ju, Y.: Hubmap - hacking the kidney (2020), https://kaggle.com/competitions/hubmap-kidney-segmentation
  • [5] Jiang, Y., Sui, X., Ding, Y., Xiao, W., Zheng, Y., Zhang, Y.: A semi-supervised learning approach with consistency regularization for tumor histopathological images analysis. Frontiers in Oncology 12 (Jan 2023). https://doi.org/10.3389/fonc.2022.1044026, http://dx.doi.org/10.3389/fonc.2022.1044026
  • [6] Kidney Precision Medicine Project: Kidney Precision Medicine Project Data. Accessed September 01, 2023. https://www.kpmp.org, the results here are in whole or part based upon data generated by the Kidney Precision Medicine Project. Funded by the National Institute of Diabetes and Digestive and Kidney Diseases
  • [7] Lampert, T., Merveille, O., Schmitz, J., Forestier, G., Feuerhake, F., Wemmert, C.: Strategies for training stain invariant cnns. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE (Apr 2019). https://doi.org/10.1109/isbi.2019.8759266, http://dx.doi.org/10.1109/ISBI.2019.8759266
  • [8] Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5(6), 555–570 (2021)
  • [9] Macenko, M., Niethammer, M., Marron, J.S., Borland, D., Woosley, J.T., Guan, X., Schmitt, C., Thomas, N.E.: A method for normalizing histology slides for quantitative analysis. In: 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE (Jun 2009). https://doi.org/10.1109/isbi.2009.5193250, http://dx.doi.org/10.1109/ISBI.2009.5193250
  • [10] Miyato, T., Maeda, S.i., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence 41(8), 1979–1993 (2018)
  • [11] Moor, M., Banerjee, O., Abad, Z.S.H., Krumholz, H.M., Leskovec, J., Topol, E.J., Rajpurkar, P.: Foundation models for generalist medical artificial intelligence. Nature 616(7956), 259–265 (Apr 2023). https://doi.org/10.1038/s41586-023-05881-4, http://dx.doi.org/10.1038/s41586-023-05881-4
  • [12] Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Computer graphics and applications 21(5), 34–41 (2001)
  • [13] Sohn, K., Berthelot, D., Li, C.L., Zhang, Z., Carlini, N., Cubuk, E.D., Kurakin, A., Zhang, H., Raffel, C.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence (2020)
  • [14] Wagner, S.J., Khalili, N., Sharma, R., Boxberg, M., Marr, C., de Back, W., Peng, T.: Structure-preserving multi-domain stain color augmentation using style-transfer with disentangled representations. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021 (2021)
  • [15] Wagner, S.J., Reisenbüchler, D., West, N.P., Niehues, J.M., Zhu, J., Foersch, S., Veldhuizen, G.P., Quirke, P., Grabsch, H.I., van den Brandt, P.A., Hutchins, G.G., Richman, S.D., Yuan, T., Langer, R., Jenniskens, J.C., Offermans, K., Mueller, W., Gray, R., Gruber, S.B., Greenson, J.K., Rennert, G., Bonner, J.D., Schmolze, D., Jonnagaddala, J., Hawkins, N.J., Ward, R.L., Morton, D., Seymour, M., Magill, L., Nowak, M., Hay, J., Koelzer, V.H., Church, D.N., Matek, C., Geppert, C., Peng, C., Zhi, C., Ouyang, X., James, J.A., Loughrey, M.B., Salto-Tellez, M., Brenner, H., Hoffmeister, M., Truhn, D., Schnabel, J.A., Boxberg, M., Peng, T., Kather, J.N., Church, D., Domingo, E., Edwards, J., Glimelius, B., Gogenur, I., Harkin, A., Hay, J., Iveson, T., Jaeger, E., Kelly, C., Kerr, R., Maka, N., Morgan, H., Oien, K., Orange, C., Palles, C., Roxburgh, C., Sansom, O., Saunders, M., Tomlinson, I.: Transformer-based biomarker prediction from colorectal cancer histology: A large-scale multicentric study. Cancer Cell 41(9), 1650–1661.e4 (Sep 2023). https://doi.org/10.1016/j.ccell.2023.08.002, http://dx.doi.org/10.1016/j.ccell.2023.08.002
  • [16] Wang, X., Du, Y., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Retccl: Clustering-guided contrastive learning for whole-slide image retrieval. Medical Image Analysis 83, 102645 (Jan 2023). https://doi.org/10.1016/j.media.2022.102645, http://dx.doi.org/10.1016/j.media.2022.102645
  • [17] Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training (2020)
  • [18] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV). pp. 2242–2251 (2017). https://doi.org/10.1109/ICCV.2017.244
  • [19] Zingman, I., Frayle, S., Tankoyeu, I., Sukhanov, S., Heinemann, F.: A comparative evaluation of image-to-image translation methods for stain transfer in histopathology (2023)