Deep Learning for Automated Detection of Breast Cancer in Deep Ultraviolet Fluorescence Images with Diffusion Probabilistic Model

Abstract

Data limitation is a significant challenge in applying deep learning to medical images. Recently, the diffusion probabilistic model (DPM) has shown the potential to generate high-quality images by converting Gaussian random noise into realistic images. In this paper, we apply the DPM to augment the deep ultraviolet fluorescence (DUV) image dataset with an aim to improve breast cancer classification for intra-operative margin assessment. For classification, we divide the whole surface DUV image into small patches and extract convolutional features for each patch by utilizing the pre-trained ResNet. Then, we feed them into an XGBoost classifier for patch-level decisions and then fuse them with a regional importance map computed by Grad-CAM++ for whole surface-level prediction. Our experimental results show that augmenting the training dataset with the DPM significantly improves breast cancer detection performance in DUV images, increasing accuracy from 93% to 97%, compared to using Affine transformations and ProGAN.

Index Terms—  Diffusion Probabilistic Model, Data Augmentation, Breast Cancer Classification

1 Introduction

Deep ultraviolet fluorescence scanning microscopy (DUV-FSM) provides rapid whole-surface imaging of dissected tissue during breast-conserving surgery without the need for invasive techniques or excessive sectioning. DUV images are particularly helpful in identifying cancer cells at the surgical specimen’s edge (margin), thanks to their clear color and texture differences from healthy tissue. Then, an automated breast cancer detection in DUV images is required for intra-operative margin assessment. Deep learning-based methods have shown potential in medical image classification, but they often face due to their reliance on extensive training data [1],[2]. This challenge is especially notable in the classification of DUV images with a limited number of subjects, given its novelty [3].

Data augmentation techniques are used to boost the medical image dataset. A widely used augmentation technique is Generative Adversarial Networks (GAN) introduced by Goodfellow et. al.[4], with its later variants being the most common models for creating synthetic images. For example, SH Gheshlaghi et. al.[5] employed an Auxiliary Classifier Generative Adversarial Network (ACGAN) to augment a small dataset with realistic images and class labels, specifically focusing on breast cancer histopathological image classification. Nevertheless, GANs tend to capture a lower degree of diversity in generated content when compared to contemporary likelihood-based models [6],[7],[8]. Additionally, GANs can be challenging to train, often susceptible to issues such as mode collapse, which can be mitigated through meticulous hyperparameter selection and the application of suitable regularizers  [9],[10].

To tackle these challenges, we apply the diffusion probabilistic model (DPM) in data augmentation to generate realistic and diverse DUV images. DPM has recently surfaced as a potent generative model, positioning itself as a potential substitute for GANs [11]. The DPM harnesses cross-attention and adaptable conditioning to facilitate the creation of desired images. Commencing with Ho et. al.[12], a series of studies have demonstrated that DPMs have the ability to produce high-fidelity images akin to those produced by GANs [7],[13]. These models offer a range of advantageous qualities for image synthesis, including stable training. [14] utilized a DPM for the synthesis of histopathology images and compared it with ProGAN. Generative metrics demonstrated the superiority of the diffusion model with respect to data augmentation [15],[16]. In this paper, we adopt the DPM to enhance deep learning-based breast cancer detection in DUV images. The key contributions of this paper can be summarized as follows:

  • DPMs are employed to generate authentic DUV images, representing the initial use of this application.

  • DUV breast cancer detection is enhanced through DPM’s augmented training, surpassing GAN performance.

Refer to caption
Fig. 1: Overview of the proposed method: The proposed method starts by extracting patches from Whole Surface Images (WSI-DUV). A two-step diffusion process, involving noise addition and removal with probabilistic models, is applied to generate patch images. Utilizing a generated dataset alongside an existing training dataset, deep convolutional features are extracted using a pre-trained ResNet50 network. Patch-level classification is then performed using XGBoost, and a regional importance map is computed with Grad-CAM++ on a pre-trained DenseNet169 model for DUV-WSI. The final prediction at the WSI level is achieved through a decision fusion approach, combining patch-level results with the regional importance map.

2 METHOD

This section outlines our utilization of the DPM in the context of breast cancer classification in DUV images, as described in Figure 1. The pivotal role of the DPM lies in its capacity to generate synthetic DUV patch images, addressing the challenge of limited training data. Subsequently, the augmented dataset is employed to extract convolutional features using a pre-trained ResNet50 network. Ultimately, the patch-level classification is executed with the aid of the XGBoost classifier and fused into a whole-surface-level decision with a regional importance map. The ensuing sections will delve into each of these steps in greater detail.

2.1 Diffusion Probabilistic Model

Diffusion probabilistic models (DPM) fall under the category of generative models, aiming to produce data resembling their original training data. These models function by iteratively introducing random noise to the training data (forward diffusion) and subsequently learning to eliminate this noise (reverse diffusion). Once trained, the DPM can generate new data by applying random noise through the learned process that eliminates the noise. We apply DPM to augment the training dataset by considering DUV patch images as the input, denoted as x𝑥xitalic_x. Specifically, a DUV WSI for a sample i𝑖iitalic_i is divided into multiple DUV patches where each sample’s field of view ΩisubscriptΩ𝑖\Omega_{i}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the union of non-overlap** patches ΩijsuperscriptsubscriptΩ𝑖𝑗\Omega_{i}^{j}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT such that Ωi=j=1NΩij,and ΩikΩil= for k,lformulae-sequencesubscriptΩ𝑖superscriptsubscript𝑗1𝑁superscriptsubscriptΩ𝑖𝑗and superscriptsubscriptΩ𝑖𝑘superscriptsubscriptΩ𝑖𝑙 for for-all𝑘𝑙\Omega_{i}=\cup_{j=1}^{N}\Omega_{i}^{j},\text{and }\Omega_{i}^{k}\cap\Omega_{i% }^{l}=\emptyset\text{ for }\forall{k,l}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∪ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , and roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∩ roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT = ∅ for ∀ italic_k , italic_l.

The DPM includes two steps: forward and reverse diffusion. It’s important to note that our patch images are categorized into two distinct class labels (type-embed) represented by c𝑐citalic_c, which c{Benign,Malignant}𝑐𝐵𝑒𝑛𝑖𝑔𝑛𝑀𝑎𝑙𝑖𝑔𝑛𝑎𝑛𝑡c\in\left\{Benign,Malignant\right\}italic_c ∈ { italic_B italic_e italic_n italic_i italic_g italic_n , italic_M italic_a italic_l italic_i italic_g italic_n italic_a italic_n italic_t }. In the forward diffusion step, we add random Gaussian noise to our data repeatedly, for a certain number of times called T𝑇Titalic_T (t-embed) until it reaches the desired complex data points distribution. If we label the data distribution for our input as q(x0,c)q(x_{0_{,c}})italic_q ( italic_x start_POSTSUBSCRIPT 0 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), the forward process defined as following steps:

q(xt,c|xt1,c)=𝒩(xt1,c1βt,Iβt),q(x_{t_{,c}}|x_{t-1_{,c}})=\mathcal{N}(x_{t-1_{,c}}\sqrt{1-\beta_{t}},I\beta_{% t}),italic_q ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t - 1 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = caligraphic_N ( italic_x start_POSTSUBSCRIPT italic_t - 1 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG , italic_I italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (1)

where βt(0,1)subscript𝛽𝑡01\beta_{t}\in(0,1)italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ ( 0 , 1 ) is noise scales. By using reparametrization xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be expressed as a linear combination of x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and a Gaussian noise variable ε=𝒩(0,I)𝜀𝒩0𝐼\varepsilon=\mathcal{N}(0,I)italic_ε = caligraphic_N ( 0 , italic_I ):

xt,c=αtx0,c+1αtϵ,x_{t_{,c}}=\sqrt{\alpha_{t}}x_{0_{,c}}+\sqrt{1-\alpha_{t}}\epsilon,italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT = square-root start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT 0 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT + square-root start_ARG 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_ϵ , (2)
αt=t=1T1βt.subscript𝛼𝑡superscriptsubscriptproduct𝑡1𝑇1subscript𝛽𝑡\alpha_{t}=\prod_{t=1}^{T}1-\beta_{{}_{t}}.italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT 1 - italic_β start_POSTSUBSCRIPT start_FLOATSUBSCRIPT italic_t end_FLOATSUBSCRIPT end_POSTSUBSCRIPT . (3)

For the reverse diffusion step, we want to generate a sample from q(xt1,c|xt,c)q(x_{t-1_{,c}}|x_{t_{,c}})italic_q ( italic_x start_POSTSUBSCRIPT italic_t - 1 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ). Since q(xt1,c|xt,c)q(x_{t-1_{,c}}|x_{t_{,c}})italic_q ( italic_x start_POSTSUBSCRIPT italic_t - 1 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) is an unknown distribution, we train a neural network pθ(xt1,c|xt,c,αt)p_{\theta}(x_{t-1_{,c}}|x_{t_{,c}},\alpha_{t})italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t - 1 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) to approximate it. To generate a random sample in the reverse diffusion, the latent variable xt,csubscript𝑥𝑡𝑐x_{t{,c}}italic_x start_POSTSUBSCRIPT italic_t , italic_c end_POSTSUBSCRIPT should approximately follow an isotropic Gaussian distribution. This implies that key variables, including αtsubscript𝛼𝑡\alpha_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, need to be very close to zero, and βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT should also have a small value, ensuring that xt,c𝒩(0,I)x_{t_{,c}}\sim\mathcal{N}(0,I)italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_I ). The network for pθsubscript𝑝𝜃p_{\theta}italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT has a similar role to the decoder network in variational autoencoder (VAE). Notably, the encoder in the DPM differs from VAE in that it constitutes a fixed forward diffusion process. Within the reverse diffusion, a neural network εθsubscript𝜀𝜃\varepsilon_{\theta}italic_ε start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT with parameters θ𝜃\thetaitalic_θ learns to denoise the provided xt,cx_{t_{,c}}italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT, producing xt1,cx_{t-1_{,c}}italic_x start_POSTSUBSCRIPT italic_t - 1 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT as output. This denoising process involves iteratively subtracting the predicted noise from the neural network. We utilized a U-Net neural network architecture with ResNet Blocks as its backbone, as introduced by [17]. This architectural choice was implemented in both the downsampling and upsampling blocks of the U-Net which has 23 convolutional layers and two residual blocks (1000 denoising diffusion steps and a learning rate of 1e41𝑒41e-41 italic_e - 4). The loss function for the DPM guides the model to generate synthesis images closely matching the desired distribution. Comprising two main parts, Lsimplesubscript𝐿𝑠𝑖𝑚𝑝𝑙𝑒L_{simple}italic_L start_POSTSUBSCRIPT italic_s italic_i italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT and Lvlbsubscript𝐿𝑣𝑙𝑏L_{vlb}italic_L start_POSTSUBSCRIPT italic_v italic_l italic_b end_POSTSUBSCRIPT, the loss function combines to minimize the difference between actual and estimated noise in the generated images[18]:

Loss=Lsimple+Lvlb,𝐿𝑜𝑠𝑠subscript𝐿𝑠𝑖𝑚𝑝𝑙𝑒subscript𝐿𝑣𝑙𝑏Loss=L_{simple}+L_{vlb},italic_L italic_o italic_s italic_s = italic_L start_POSTSUBSCRIPT italic_s italic_i italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_v italic_l italic_b end_POSTSUBSCRIPT , (4)

where Lsimplesubscript𝐿𝑠𝑖𝑚𝑝𝑙𝑒L_{simple}italic_L start_POSTSUBSCRIPT italic_s italic_i italic_m italic_p italic_l italic_e end_POSTSUBSCRIPT focuses on the difference between the actual and estimated noise in the synthesis images and it involves a mean-squared error and Lvlbsubscript𝐿𝑣𝑙𝑏L_{vlb}italic_L start_POSTSUBSCRIPT italic_v italic_l italic_b end_POSTSUBSCRIPT is a sum of score-matching losses and helps in learning the standard deviation σtzsubscript𝜎𝑡𝑧\sigma_{t}zitalic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_z during the diffusion process. The final generated image x0,cx_{0_{,c}}italic_x start_POSTSUBSCRIPT 0 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT at the end of these iterations is expressed by the following:

x0,c=11βt(xt,cβt1αtεθ(xt,c,t))+σtz,x_{0_{,c}}=\frac{1}{\sqrt{1-\beta_{t}}}(x_{t_{,c}}-\beta_{t}\sqrt{1-\alpha_{t}% }\varepsilon_{\theta}(x_{t_{,c}},t))+\sigma_{t}z,italic_x start_POSTSUBSCRIPT 0 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT square-root start_ARG 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t ) ) + italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_z , (5)

where σtzsubscript𝜎𝑡𝑧\sigma_{t}zitalic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_z represents the noise added to the generated data at a particular diffusion step or time step t𝑡titalic_t.

2.2 Deep learning classification of DUV images

Given generated patch DUV images in two labels, represented by x0,cx_{0_{,c}}italic_x start_POSTSUBSCRIPT 0 start_POSTSUBSCRIPT , italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT, we add them to our training data to improve our breast cancer detection. Employing the deep learning-based breast cancer classification method for DUV images [19], N𝑁Nitalic_N DUV patches, consisting of both generated and original patch DUV images (denoted as pij,j={1,,N}superscriptsubscript𝑝𝑖𝑗𝑗1𝑁p_{i}^{j},j=\left\{1,...,N\right\}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_j = { 1 , … , italic_N }), are categorized benign (11-1- 1) or malignant (+11+1+ 1). Features are extracted using the final layer of a pre-trained ResNet50 [17], and an XGBoost classifier [20] assigns a binary output yij{1,+1}superscriptsubscript𝑦𝑖𝑗11y_{i}^{j}\in\left\{-1,+1\right\}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∈ { - 1 , + 1 } to each patch pijsuperscriptsubscript𝑝𝑖𝑗p_{i}^{j}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT. Additionally, Grad-CAM++[21] on the pre-trained DenseNet169 model calculates the regional importance map rijsuperscriptsubscript𝑟𝑖𝑗r_{i}^{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT for each DUV patch by taking the average relevance value over a patch’s region ΩijsuperscriptsubscriptΩ𝑖𝑗\Omega_{i}^{j}roman_Ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT [19]. Finally, a decision fusion method is employed to determine the WSI-level classification label Li{1,+1}subscript𝐿𝑖11L_{i}\in\left\{-1,+1\right\}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { - 1 , + 1 } based on the patch-level classification labels yijsuperscriptsubscript𝑦𝑖𝑗y_{i}^{j}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT for all patches j={1,,m}𝑗1𝑚j=\left\{1,...,m\right\}italic_j = { 1 , … , italic_m }. Toward this, we define the weight wijsuperscriptsubscript𝑤𝑖𝑗w_{i}^{j}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT for each patch pijsuperscriptsubscript𝑝𝑖𝑗p_{i}^{j}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT as the thresholded regional importance value rijsuperscriptsubscript𝑟𝑖𝑗r_{i}^{j}italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT.

wij={0if rij<0.25rijotherwisesuperscriptsubscript𝑤𝑖𝑗cases0if superscriptsubscript𝑟𝑖𝑗0.25superscriptsubscript𝑟𝑖𝑗otherwisew_{i}^{j}=\begin{cases}0&\text{if }r_{i}^{j}<0.25\\ r_{i}^{j}&\text{otherwise}\end{cases}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT = { start_ROW start_CELL 0 end_CELL start_CELL if italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT < 0.25 end_CELL end_ROW start_ROW start_CELL italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW (6)

This weighting scheme neglects patches with low importance for either malignant or benign conditions in the fused decision for the DUV WSI. Then, the weighted majority voting is employed to determine the WSI-level classification label Li{1,+1}subscript𝐿𝑖11L_{i}\in\{-1,+1\}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { - 1 , + 1 }.

Li=sign(j=1mwijyij),subscript𝐿𝑖𝑠𝑖𝑔𝑛superscriptsubscript𝑗1𝑚superscriptsubscript𝑤𝑖𝑗superscriptsubscript𝑦𝑖𝑗L_{i}=sign(\sum_{j=1}^{m}w_{i}^{j}\cdot y_{i}^{j}),italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s italic_i italic_g italic_n ( ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ⋅ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , (7)

where sign()𝑠𝑖𝑔𝑛sign(\cdot)italic_s italic_i italic_g italic_n ( ⋅ ) is the sign function to map positive (malignant) and negative (benign) values to 11-1- 1 and +11+1+ 1, respectively. It is worth noting that the DPM-augmented dataset mitigates the risk of overfitting.

Refer to caption
Fig. 2: Comparison of synthesized patch images using ProGAN and DPM: The images generated by DPM closely mimic real biological features, showcasing characteristics like enlarged cells, dense cellular structures, infiltration, and varied nuclear traits in both malignant and benign types, in contrast to ProGAN (a: malignant, b: benign).it’s apparent that DPM produces high-quality, sharp data samples with intricate features, while ProGAN generates images with noticeable blurring artifacts. Additionally, our method stands out due to its capability to capture a diverse range of image variations.

3 Experiment

In this study, we employed the Diffusion Probabilistic Model (DPM) to enhance our breast cancer DUV image classification. The training dataset is augmented with 1000 synthesized patches, evenly distributed between benign and malignant labels. These images, along with disease labels, seamlessly augment the training dataset without additional input from pathologists, addressing the limitations of small datasets. For comparison, we used the traditional affine transform (rotation and flip) and ProGAN[22] to boost the DUV patch training dataset. It is important to mention that we trained separate ProGAN networks for the benign-only and malignant-only datasets to ensure the automatic assignment of labels to generated images, as ProGAN does not incorporate label information. We evaluated the results in two parts: Visual Inspection and Classification Performance.

3.1 Dataset

The breast cancer dataset consists of DUV images from 60 samples (24 normal/benign and 36 malignant). This DUV dataset was collected from the Medical College of Wisconsin (MCW) tissue bank (4) with a custom DUV-FSM system. The DUV-FSM used a deep ultraviolet (DUV) excitation at 285 nm and a low magnification objective (4X), which achieved a small spatial resolution from 2 to 3 mm. To enhance fluorescence contrast, breast tissues are stained with propidium iodide and eosin Y. This technique produces images of the microscopic resolution, sharpness, and contrast from fresh tissue stained with multiple fluorescence dyes. Following the extraction of patches from these images, the dataset comprises 25,024 patches from normal/benign cases and 9,444 patches from malignant cases for training our diffusion model.

3.2 Visual Inspection

Our examination involves visual comparison of the DUV patches generated through DPM with those produced by the proposed ProGAN, as depicted in Figure 2. DPM excels in generating high-quality, sharply detailed data samples with intricate features that closely replicate real biological characteristics. These characteristics include enlarged cells, dense cellular structures, infiltration, and diverse nuclear traits in both malignant and benign types. In contrast, ProGAN exhibits noticeable blurring artifacts in its generated images. Furthermore, our method demonstrates its capability to handle a broad spectrum of diverse image styles. For example, in malignant patch images, a color combination ranging from red to light green is achieved, while in benign patch images, a graceful shift to green tones is observed. Conversely, the two ProGAN networks with organized data do not produce a wide range of image diversity as part of their output.

3.3 Classification Performance

For a quantitative assessment, we gauged the classification performance of our method by juxtaposing it with the Affine Transform and ProGAN approaches when integrating synthesized images from each into our original dataset. Table 1 presents the outcomes of the 5-fold cross-validation for classification performance, underscoring the efficacy of our proposed method (DPM). It substantially enhanced accuracy to (97%), achieving noteworthy sensitivity (97%) and specificity (93%). In contrast, ProGAN failed to enhance classification performance compared to the value of Affine Transform about (93%) of accuracy. Importantly, it should be noted that while WSI-level accuracy remains consistent, there are variations in the patch-level classification results for Affine Transform and ProGAN. These findings underscore the resilience of our proposed method, in stark contrast to the over-fitting issues faced by the Affine Transform and ProGAN. Noteworthy is the observation of remarkably high sensitivity and specificity, highlighting the benefits of our approach in intra-operative margin assessment. This suggests its potential to significantly mitigate the risk of cancer recurrence by minimizing the probability of surgeons misidentifying breast cancer margins in dissected tissue. Considering that separate ProGAN networks were trained for benign-only and malignant-only datasets due to its inability to directly incorporate label information, this approach illustrates ProGAN’s limitations in generating labeled, class-specific images. In contrast, DPM’s flexibility and effectiveness in handling complex data distributions and producing accurately labeled images offer a solution to the challenges of limited data sizes and suboptimal training conditions faced by ProGAN.

Table 1: Results from ten 5-fold cross validations (means and standard deviations) with randomized seeds:
(1) Affine (2) ProGAN (3) DPM
Accuracy 93%±0.69plus-or-minuspercent930.6993\%\pm 0.6993 % ± 0.69 93%±0.74plus-or-minuspercent930.7493\%\pm 0.7493 % ± 0.74 97%±1.23plus-or-minuspercent971.2397\%\pm 1.2397 % ± 1.23
Sensitivity 94%±0.58plus-or-minuspercent940.5894\%\pm 0.5894 % ± 0.58 94%±0.61plus-or-minuspercent940.6194\%\pm 0.6194 % ± 0.61 97%±1.13plus-or-minuspercent971.1397\%\pm 1.1397 % ± 1.13
Specificity 76%±4.3plus-or-minuspercent764.376\%\pm 4.376 % ± 4.3 76%±5.25plus-or-minuspercent765.2576\%\pm 5.2576 % ± 5.25 93%±7plus-or-minuspercent93793\%\pm 793 % ± 7

4 CONCLUSION

This study presents a compelling solution to the challenge of limited data in deep learning applications for medical image analysis, particularly in breast cancer classification of DUV images. Utilizing a Diffusion Probabilistic Model, the research effectively generates synthetic DUV patch images, thereby augmenting the dataset and improving the performance of deep learning models. The approach emphasizes diverse morphology levels during image synthesis, resulting in a dataset of 1000 patch images that encompasses both benign and malignant cases. Quantitative results demonstrate a significant enhancement in performance, with accuracy, sensitivity, and specificity reaching 97%, 97%, and 93%, respectively. This highlights the approach’s potential to greatly improve breast cancer detection. The study underscores the efficacy of data augmentation techniques in addressing data limitations in medical image analysis, ultimately contributing to more accurate and robust diagnostic systems. However, due to the unique and original nature of our dataset, which requires extensive review by pathologists for accurate labeling, future work will focus on expanding our dataset with more labeled images to enhance our model’s performance.

References

  • [1] T. Lu, J. M. Jorns, M. Patton, R. Fisher, A. Emmrich, T. Doehring, T. G. Schmidt, D. H. Ye, T. Yen, and B. Yu, “Rapid assessment of breast tumor margins using deep ultraviolet fluorescence scanning microscopy,” Journal of Biomedical Optics, vol. 25, no. 12, pp. 126501–126501, 2020.
  • [2] T. Lu, J. M. Jorns, D. H. Ye, M. Patton, R. Fisher, A. Emmrich, T. G. Schmidt, T. Yen, and B. Yu, “Automated assessment of breast margins in deep ultraviolet fluorescence images using texture analysis,” Biomedical Optics Express, vol. 13, no. 9, pp. 5015–5034, 2022.
  • [3] L. Lan, L. You, Z. Zhang, Z. Fan, W. Zhao, N. Zeng, Y. Chen, and X. Zhou, “Generative adversarial networks and its applications in biomedical informatics,” Frontiers in public health, vol. 8, pp. 164, 2020.
  • [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
  • [5] S. H. Gheshlaghi, C. N. E. Kan, and D. H. Ye, “Breast cancer histopathological image classification with adversarial image synthesis,” in 2021 43rd annual international conference of the IEEE engineering in medicine & biology society (EMBC). IEEE, 2021, pp. 3387–3390.
  • [6] A. Razavi, A. Van den Oord, and O. Vinyals, “Generating diverse high-fidelity images with vq-vae-2,” Advances in neural information processing systems, vol. 32, 2019.
  • [7] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International Conference on Machine Learning. PMLR, 2021, pp. 8162–8171.
  • [8] C. Nash, J. Menick, S. Dieleman, and P. W. Battaglia, “Generating images with sparse representations,” arXiv preprint arXiv:2103.03841, 2021.
  • [9] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” arXiv preprint arXiv:1809.11096, 2018.
  • [10] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018.
  • [11] P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
  • [12] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
  • [13] J. Choi, J. Lee, C. Shin, S. Kim, H. Kim, and S. Yoon, “Perception prioritized training of diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11472–11481.
  • [14] P. A. Moghadam, S. Van Dalen, K. C. Martin, J. Lennerz, S. Yip, H. Farahani, and A. Bashashati, “A morphology focused diffusion probabilistic model for synthesis of histopathology images,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2000–2009.
  • [15] Q. Zhou and H. Yin, “A u-net based progressive gan for microscopic image augmentation,” in Annual Conference on Medical Image Understanding and Analysis. Springer, 2022, pp. 458–468.
  • [16] J. Kim and H. Park, “Adaptive latent diffusion model for 3d medical image to image translation: Multi-modal magnetic resonance imaging study,” arXiv preprint arXiv:2311.00265, 2023.
  • [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [18] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural computation, vol. 23, no. 7, pp. 1661–1674, 2011.
  • [19] T. To, T. Lu, J. M. Jorns, M. Patton, T. G. Schmidt, T. Yen, B. Yu, and D. H. Ye, “Deep learning classification of deep ultraviolet fluorescence images toward intra-operative margin assessment in breast cancer,” Frontiers in Oncology, vol. 13, pp. 1179025, 2023.
  • [20] A. Shokouhmand, N. D. Aranoff, E. Driggin, P. Green, and N. Tavassolian, “Efficient detection of aortic stenosis using morphological characteristics of cardiomechanical signals and heart rate variability parameters,” Scientific reports, vol. 11, no. 1, pp. 23817, 2021.
  • [21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
  • [22] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.