A Dual Attention-aided DenseNet-121 for Classification of Glaucoma from Fundus Images

Soham Chakraborty Dept. of Computer Sc. & Engg.
Jadavpur University,
Kolkata, 700032, India
email: [email protected]
   Ayush Roy Dept. of Electrical Engg.
Jadavpur University,
Kolkata, 700032, India
email: [email protected]
   Payel Pramanik Dept. of Computer Sc. & Engg.
Jadavpur University,
Kolkata, 700032, India
email: [email protected]
   Daria Valenkova Dept. of Automation and Cont. Proc.
Saint Petersburg Electrotechnical University ”LETI”
Saint Petersburg, Russia
email: [email protected]
   Ram Sarkar Dept. of Computer Sc. & Engg.
Jadavpur University,
Kolkata, 700032, India
email: [email protected]
Abstract

Deep learning and computer vision methods are nowadays predominantly used in the field of ophthalmology. In this paper, we present an attention-aided DenseNet-121 for classifying normal and glaucomatous eyes from fundus images. It involves the convolutional block attention module to highlight relevant spatial and channel features extracted by DenseNet-121. The channel recalibration module further enriches the features by utilizing edge information along with the statistical features of the spatial dimension. For the experiments, two standard datasets, namely RIM-ONE and ACRIMA, have been used. Our method has shown superior results than state-of-the-art models. An ablation study has also been conducted to show the effectiveness of each of the components. The code of the proposed work is available at: https://github.com/Soham2004GitHub/DADGC.

Index Terms:
Glaucoma detection, Fundus image, Deep learning, Dual attention, Channel recalibration

I Introduction

Glaucoma, often referred to as the ”silent thief of sight,” is a progressive vision loss disease affecting approximately 80 million people worldwide. According to the World Health Organization (WHO), 3.54% of individuals aged between 40 and 80 are affected by glaucoma, with a higher susceptibility observed in those under 40 compared to those over 80 [1]. Early detection is crucial to initiate preventive measures and mitigate further vision impairment. Diagnosis typically involves examining fundus images for signs such as a pale optic disc (OD), indicative of glaucoma. However, manual assessment by experts is subjective and time-consuming [2]. To address the increasing prevalence of glaucoma and the need for accessible screening, computer-aided diagnosis (CAD) systems are explored. Nowadays, these systems utilize deep learning techniques to detect glaucoma from fundus images accurately. CAD systems assist in identifying patients requiring immediate examination by an ophthalmologist, thereby reducing the workload on medical professionals while maintaining high sensitivity in diagnosis.

In recent times, Convolutional Neural Networks (CNNs) have become essential in computer vision for tasks like classification and detection [3, 4, 5]. Pre-trained CNN models, originally trained on large datasets, are now utilized for various tasks, often with feature extraction augmented by attention mechanism [6, 7, 8] This enhancement has significantly boosted CNN performance.

Contribution: In this work, we have proposed a transfer learning-based CNN model for glaucoma classification from fundus images. The highlighting points of our work are:

1. DenseNet-121 is used to extract deep features from the input fundus images.

2. We introduce an amalgamation of two attention modules namely, Convolutional Block Attention Module (CBAM) and Channel Recalibration Module (CRM) on the extracted deep features of the DenseNet-121 model. CBAM attention module highlights the relevant spatial and channel features extracted by DenseNet-121. The CRM Module further enriches the features by utilizing edge information along with the statistical features of the spatial dimension.

3. We have evaluated our method on two publicly available datasets namely, RIM-ONE and ACRIMA, and achieved superior results.

II Related Work

In literature of fundus image classification, numerous deep learning-based procedures have been adopted by various researchers. For instance, Sonti et. al [9] introduced KR-NET for retinal fundus glaucoma classification. In this, the region of interest (ROI) is segmented using variable mode decomposition (VMD) and a 26-layer CNN is designed to classify glaucoma using ACRIMA, RIM-ONE, and Drishti-GS1 datasets. Diaz-Pinto et al. [10] utilized CNNs pretrained on ImageNet to automatically assess glaucoma, employing five different models and datasets, including their newly collected dataset called ACRIMA, which is publicly available. Claro et al. [11] created a hybrid space from various feature descriptors with seven CNN architectures, and validated using 10-fold cross-validation (CV) with Random Forest(RF) classifier on various public datasets. Gomez-Valverde et al. [12] proposed a method using six different CNN architectures and validated on Drishti-GS1, RIM-ONE datasets using 10-fold CV. Liu et al. [13] developed a deep neural network (DNN) model and evaluated on several fundus datasets. An 18-layer CNN model is designed by Elangovan et al. [14] for the classification of glaucoma and verified on five public datasets. The authors in [15] used CNN-based implementation on the ACRIMA dataset and reported a comparative analysis of various CNN models implemented to classify glaucoma. De Sales Carvalho et al. [16] proposed a new method of glaucoma classification using 3-D CNN and verified their results on public datasets with and without segmentation. Elangovan et al. [17] developed a stacking deep ensemble model including 13 pre-trained models and five various approaches for classifying glaucoma images, and validated on publicly available datasets.

Summarizing the literature, a wide range of deep learning approaches has been explored by researchers for glaucoma detection from fundus images. Despite the diversity in methodologies, the primary aim remains constant: to develop reliable models for early glaucoma detection. Our proposed model builds upon this by employing a transfer learning-based CNN approach. We extract deep features using transfer learning and augment them with two attention mechanisms, focusing on relevant spatial and channel features while incorporating edge and statistical features of the spatial dimension.

III Methodology

To develop a glaucoma classification model, we introduce an amalgamation of the CBAM and CRM attention modules on the encoded features of the DenseNet-121 model, which is the backbone of the entire model. Features generated by the DenseNet-121, denoted as Fencsubscript𝐹𝑒𝑛𝑐F_{enc}italic_F start_POSTSUBSCRIPT italic_e italic_n italic_c end_POSTSUBSCRIPT, are processed by CBAM and CRM modules successively, and then flattened using a Global Average Pooling (GAP) layer. This flattened feature is subsequently passed to a dense classification layer with Sigmoid activation to predict the class of the inputs. Fig. 1 illustrates the proposed model architecture.

Refer to caption
Figure 1: Block diagram of the proposed glaucoma classification model.

III-A Convolutional Block Attention Module (CBAM)

The CBAM attention mechanism [18] enhances feature maps from CNNs by integrating channel-wise and spatial attention. It consists of two main modules: the 1D Channel Attention Module (CAM) and the 2D Spatial Attention Module (SAM). CAM assigns weights to channels, emphasizing those most influential to model performance. The input feature map Fencsubscript𝐹𝑒𝑛𝑐F_{enc}italic_F start_POSTSUBSCRIPT italic_e italic_n italic_c end_POSTSUBSCRIPT of CAM undergoes Global Average Pooling (GAP) and Global Max Pooling (GMP) to generate Fgapsubscript𝐹𝑔𝑎𝑝F_{gap}italic_F start_POSTSUBSCRIPT italic_g italic_a italic_p end_POSTSUBSCRIPT and Fgmpsubscript𝐹𝑔𝑚𝑝F_{gmp}italic_F start_POSTSUBSCRIPT italic_g italic_m italic_p end_POSTSUBSCRIPT respectively. Fgapsubscript𝐹𝑔𝑎𝑝F_{gap}italic_F start_POSTSUBSCRIPT italic_g italic_a italic_p end_POSTSUBSCRIPT and Fgmpsubscript𝐹𝑔𝑚𝑝F_{gmp}italic_F start_POSTSUBSCRIPT italic_g italic_m italic_p end_POSTSUBSCRIPT are both treated by dense layers separately and then added together. A Sigmoid activation then treats the feature map after addition to generate the channel attention weights. The out of CAM is the channel-wise multiplication of the channel attention weights and Fencsubscript𝐹𝑒𝑛𝑐F_{enc}italic_F start_POSTSUBSCRIPT italic_e italic_n italic_c end_POSTSUBSCRIPT. The output of CAM enriches the input feature map, which is then processed by SAM. SAM improves feature representation in the spatial dimension by generating a new feature map of the same dimension as the input. It utilizes a convolutional layer followed by dense Layers with rectified linear unit (ReLU) activation to reduce and then restore feature map dimensions. The output of CBAM, enriched across channel and spatial dimensions, focuses specifically on the optic cup and disc in fundus images, aiding in better classification results.

III-B Channel Recalibration Module (CRM)

The CRM adaptively recalibrates feature representations based on channel-wise statistics. It aims to adaptively emphasize informative channels while suppressing less relevant ones. By incorporating both intensity and edge-related information, the CRM enables CNNs to capture fine-grained patterns essential for accurate glaucoma fundus classification.

The input feature maps are reshaped to facilitate subsequent operations. F𝐹Fitalic_F and its corresponding edge map, Fedgesubscript𝐹𝑒𝑑𝑔𝑒F_{edge}italic_F start_POSTSUBSCRIPT italic_e italic_d italic_g italic_e end_POSTSUBSCRIPT, are reshaped to enable efficient channel-wise computations. The mean μ𝜇\muitalic_μ and standard deviation σ𝜎\sigmaitalic_σ are calculated along the channel axis for both the original and edge feature maps. These statistics capture essential information about spatial feature distribution and gradients, respectively. These μ𝜇\muitalic_μ and σ𝜎\sigmaitalic_σ values are concatenated. This concatenation operation forms a composite tensor T𝑇Titalic_T that encapsulates both intensity and edge-related statistics. A 1D convolutional layer (kernel size = 2) is applied to the T𝑇Titalic_T, followed by batch normalization. This convolutional transformation facilitates the learning of interdependencies between the mean and standard deviation components across channels. The output of the convolutional layer is treated by a Sigmoid activation function, resulting in a gating tensor G𝐺Gitalic_G. This gating tensor modulates the original feature map by assigning importance weights to each channel adaptively. F𝐹Fitalic_F is recalibrated by performing a hadamard product with G𝐺Gitalic_G, producing FCRMsubscript𝐹𝐶𝑅𝑀F_{CRM}italic_F start_POSTSUBSCRIPT italic_C italic_R italic_M end_POSTSUBSCRIPT. As mentioned earlier, such recalibration enhances informative channels and suppresses irrelevant ones, thereby improving discriminative power in subsequent layers. The block diagram representation of CRM is shown in Fig. 2.

Refer to caption
Figure 2: Channel Recalibration Module.

IV Results

IV-A Experimental Setup

We have conducted research using two standard datasets, ACRIMA [10] and RIM-ONE [19], for glaucoma classification. The ACRIMA dataset comprises 705 images, with 309 healthy and 396 glaucoma images, obtained from a Spanish national project via the IMAGEnet training system. These images, available in .jpg format, range in dimensions from 178 × 178 pixels to 1420 × 1420 pixels. The RIM-ONE dataset, developed in collaboration with three Spanish hospitals, contains 485 images, including 313 healthy and 172 glaucoma images. Similarly centered around the optic disc, these images are available in .png format, with dimensions ranging from 290 × 290 pixels to 1375 × 1654 pixels. For both datasets, we performed a 5-fold cross-validation. Augmenting the training set and standardizing image dimensions to 256 × 256 × 3, we have employed a learning rate of 0.001, the Adam optimizer, and a batch size of 16 on an NVIDIA TESLA P100 GPU. Training the model for 50 epochs using binary cross-entropy loss, we have evaluated its performance using accuracy (Acc), precision (Pre), recall (Rec), and F1-score (F1).

IV-B Ablation Study

We have used RIM-ONE as the dataset for the ablation study. For selecting the backbone, we have trained four commonly used CNN models, MobileNetV2, DenseNet-121, ResNet-50, and InceptionV3. Among these, DenseNet-121 gives the best results as shown in Table LABEL:ablation1. Using DenseNet-121 as the backbone, we have performed more experiments to figure out the best architectural configuration. These experiments are: (i) DenseNet-121 + CBAM, (ii) DenseNet-121 + CRM, (iii) DenseNet-121 + CBAM + CRM.

In Table LABEL:ablation2, it is seen that the introduction of CBAM and CRM modules enhances the performance of the baseline, i.e., DenseNet-121. Also, the combined impact of the CBAM and CRM modules produces the best result and further boosts the performance of the baseline.

TABLE I: Ablation study for selecting the best baseline model.
Model Acc (%) Pre (%) Rec (%) F1 (%)
MobileNetV2 84.54 84.88 83.29 83.43
ResNet-50 87.63 86.61 86.59 86.60
InceptionV3 90.52 89.59 89.22 89.40
DenseNet-121 90.93 90.73 90.24 90.48
TABLE II: Ablation study for selecting the best model configuration.
Model Acc (%) Pre (%) Rec (%) F1 (%)
(i) 90.93 90.29 88.66 89.90
(ii) 91.96 91.98 90.80 91.13
(iii) 93.81 93.40 93.59 93.49

Moreover, the focus of CBAM and CRM can be seen in Fig. 3, where the heatmap demonstrates the regions highlighted by these attention modules to streamline the focus of the deep learner.

Refer to caption
Figure 3: Heatmap of FCRMsubscript𝐹𝐶𝑅𝑀F_{CRM}italic_F start_POSTSUBSCRIPT italic_C italic_R italic_M end_POSTSUBSCRIPT for normal and glaucoma fundus images.

IV-C Comparison with SOTA

It can be seen from Table IV that the proposed model outperforms the existing models in terms of accuracy and F1 score for ACRIMA. The proposed model outperforms the existing models in terms of accuracy for RIM-ONE but is comparable in terms of F1 score as shown in Table III.

TABLE III: Performance comparison of the proposed model with SOTA methods on the RIM-ONE dataset.
Model Acc(%) F1-score(%)
KR-NET, 2023 [9] 90.51 89.26
QB-VMD, 2019 [20] 86.13 -
Hybrid PolyNet, 2023 [21] 71.79 -
Modified CNN, 2021 [14] 85.97 -
EyeNet, 2021 [22] 89.00 -
AG-CNN,2019 [23] 85.20 83.70
DCNN, 2018 [24] 89.40 -
IEMD,2022 [25] 93.26 92.90
Proposed 93.81 93.49
TABLE IV: Performance comparison of the proposed model with SOTA methods on the ACRIMA dataset.
Model Acc(%) F1-score(%)
KR-NET, 2023 [9] 96.70 97.05
Hybrid PolyNet, 2023 [21] 96.21 -
Modified CNN, 2021 [14] 96.64 96.89
DNet-201, 2021 [26] 97.00 96.96
DCGAN, 2022 [27] 93.65 94.00
VGG-CapsNet, 2022 [27] 85.85 89.00
Hybrid CNN, 2024 [28] 92.96 93.75
MAS-aided CNN, 2024 [29] 97.18 97.06
Proposed 98.58 98.55

The confusion matrices in Fig. 4 demonstrate the effective classification of glaucoma by the proposed model for both ACRIMA and RIM-ONE datasets.

Refer to caption
Figure 4: Confusion matrices of the proposed model for ACRIMA and RIM-ONE datasets for the best fold among the 5 folds. ’G’ and ’N’ indicate ’Glaucoma’ and ’Normal’ classes respectively.

V Conclusion and Future scope

The paper proposes a dual attention-aided DenseNet-121 architecture for classifying glaucoma fundus images. DenseNet-121 acts as a backbone to extract information from the input image. This extracted feature is then spatially and channel-wise enriched using the CBAM and CRM successively. The model demonstrates superior performance than the SOTA methods. Instead of using data augmentation methods, we would like to explore the few-shot learning approaches. Another plan is to use a lightweight backbone to make the model applicable in a resource-constraint environment.

References

  • [1] Mary VS, Rajsingh EB, Naik GR. Retinal fundus image analysis for diagnosis of glaucoma: a comprehensive survey. IEEE Access. 2016.
  • [2] Sánchez CI, Niemeijer M, Dumitrescu AV, Suttorp-Schulten MS, Abramoff MD, van Ginneken B. Evaluation of a computer-aided diagnosis system for diabetic retinopathy screening on public data. Investigative ophthalmology & visual science. 2011;52(7):4866-71.
  • [3] Orlando JI, Prokofyeva E, del Fresno M, Blaschko MB. Convolutional neural network transfer for automated glaucoma identification. In: 12th international symposium on medical information processing and analysis. vol. 10160. SPIE; 2017. p. 241-50.
  • [4] Pramanik P, Mukhopadhyay S, Kaplun D, Sarkar R. A deep feature selection method for tumor classification in breast ultrasound images. In: International conference on mathematics and its applications in new computer systems. Springer; 2021. p. 241-52.
  • [5] Pramanik P, Mukhopadhyay S, Mirjalili S, Sarkar R. Deep feature selection using local search embedded social ski-driver optimization algorithm for breast cancer detection in mammograms. Neural Computing and Applications. 2023;35(7):5479-99.
  • [6] Roy A, Shivakumara P, Pal U, Mokayed H, Liwicki M. Fourier feature-based CBAM and vision transformer for text detection in drone images. In: International Conference on Document Analysis and Recognition. Springer; 2023. p. 257-71.
  • [7] Roy A, Mohiuddin S, Sarkar R. A Similarity-based Positional Attention aided Deep Learning Model for Copy-Move Forgery Detection. IEEE Transactions on Artificial Intelligence. 2024.
  • [8] Alirezazadeh P, Schirrmann M, Stolzenburg F. Improving deep learning-based plant disease classification with attention mechanism. Gesunde Pflanzen. 2023;75(1):49-59.
  • [9] Sonti K, Dhuli R. A new convolution neural network model “KR-NET” for retinal fundus glaucoma classification. Optik. 2023;283:170861.
  • [10] Diaz-Pinto A, Morales S, Naranjo V, Köhler T, Mossi JM, Navea A. CNNs for automatic glaucoma assessment using fundus images: an extensive validation. Biomedical engineering online. 2019;18:1-19.
  • [11] Claro M, Veras R, Santana A, Araujo F, Silva R, Almeida J, et al. An hybrid feature space from texture information and transfer learning for glaucoma classification. Journal of Visual Communication and Image Representation. 2019;64:102597.
  • [12] Gómez-Valverde JJ, Antón A, Fatti G, Liefers B, Herranz A, Santos A, et al. Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning. Biomedical optics express. 2019;10(2):892-913.
  • [13] Liu H, Zhang N, ** S, Xu D, Gao W. Small sample color fundus image quality assessment based on gcforest. Multimedia Tools and Applications. 2021;80:17441-59.
  • [14] Elangovan P, Nath MK. Glaucoma assessment from color fundus images using convolutional neural network. International Journal of Imaging Systems and Technology. 2021;31(2):955-71.
  • [15] Shyamalee T, Meedeniya D. CNN based fundus images classification for glaucoma identification. In: 2022 2nd International Conference on Advanced Research in Computing (ICARC). IEEE; 2022. p. 200-5.
  • [16] de Sales Carvalho NR, Rodrigues MdCLC, de Carvalho Filho AO, Mathew MJ. Automatic method for glaucoma diagnosis using a three-dimensional convoluted neural network. Neurocomputing. 2021;438:72-83.
  • [17] Elangovan P, Nath MK. En-ConvNet: A novel approach for glaucoma detection from color fundus images using ensemble of deep convolutional neural networks. International Journal of Imaging Systems and Technology. 2022;32(6):2034-48.
  • [18] Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 3-19.
  • [19] Fumero F, Alayón S, Sanchez JL, Sigut J, Gonzalez-Hernandez M. RIM-ONE: An open retinal image database for optic nerve evaluation. In: 2011 24th international symposium on computer-based medical systems (CBMS). IEEE; 2011. p. 1-6.
  • [20] Agrawal DK, Kirar BS, Pachori RB. Automated glaucoma detection using quasi-bivariate variational mode decomposition from fundus images. IET Image Processing. 2019;13(13):2401-8.
  • [21] Sangeethaa S. Presumptive discerning of the severity level of glaucoma through clinical fundus images using hybrid PolyNet. Biomedical Signal Processing and Control. 2023;81:104347.
  • [22] Suguna G, Lavanya R. Performance assessment of EyeNet model in glaucoma diagnosis. Pattern Recognition and Image Analysis. 2021;31(2):334-44.
  • [23] Li L, Xu M, Wang X, Jiang L, Liu H. Attention based glaucoma detection: A large-scale database and CNN model. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019. p. 10571-80.
  • [24] Perdomo O, Andrearczyk V, Meriaudeau F, Müller H, González FA. Glaucoma diagnosis from eye fundus images based on deep morphometric feature estimation. In: Computational Pathology and Ophthalmic Medical Image Analysis: First International Workshop, COMPAY 2018, and 5th International Workshop, OMIA 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16-20, 2018, Proceedings 5. Springer; 2018. p. 319-27.
  • [25] Parashar D, Agrawal DK. Classification of glaucoma stages using image empirical mode decomposition from fundus images. Journal of Digital Imaging. 2022;35(5):1283-92.
  • [26] Ovreiu S, Paraschiv EA, Ovreiu E. Deep learning & digital fundus images: Glaucoma detection using DenseNet. In: 2021 13th international conference on electronics, computers and artificial intelligence (ECAI). IEEE; 2021. p. 1-4.
  • [27] Singh LK, Khanna M, et al. A novel multimodality based dual fusion integrated approach for efficient and early prediction of glaucoma. Biomedical Signal Processing and Control. 2022;73:103468.
  • [28] Oguz C, Aydin T, Yaganoglu M. A CNN-based hybrid model to detect glaucoma disease. Multimedia Tools and Applications. 2024;83(6):17921-39.
  • [29] Sonti K, Dhuli R. Diagnosis of glaucoma from retinal fundus images using disc localization and sequential DNN model. International Journal of Imaging Systems and Technology. 2024;34(1):e23029.