-
Do not trust what you trust: Miscalibration in Semi-supervised Learning
Authors:
Shambhavi Mishra,
Balamurali Murugesan,
Ismail Ben Ayed,
Marco Pedersoli,
Jose Dolz
Abstract:
State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing…
▽ More
State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing and enhancing the uncertainty of network predictions is of paramount importance in the pseudo-labeling process. In this work, we empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration. To alleviate this issue, we integrate a simple penalty term, which enforces the logit distances of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident. Comprehensive experiments on a variety of SSL image classification benchmarks demonstrate that the proposed solution systematically improves the calibration performance of relevant SSL models, while also enhancing their discriminative power, being an appealing addition to tackle SSL tasks.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Class and Region-Adaptive Constraints for Network Calibration
Authors:
Balamurali Murugesan,
Julio Silva-Rodriguez,
Ismail Ben Ayed,
Jose Dolz
Abstract:
In this work, we present a novel approach to calibrate segmentation networks that considers the inherent challenges posed by different categories and object regions. In particular, we present a formulation that integrates class and region-wise constraints into the learning objective, with multiple penalty weights to account for class and region differences. Finding the optimal penalty weights manu…
▽ More
In this work, we present a novel approach to calibrate segmentation networks that considers the inherent challenges posed by different categories and object regions. In particular, we present a formulation that integrates class and region-wise constraints into the learning objective, with multiple penalty weights to account for class and region differences. Finding the optimal penalty weights manually, however, might be unfeasible, and potentially hinder the optimization process. To overcome this limitation, we propose an approach based on Class and Region-Adaptive constraints (CRaC), which allows to learn the class and region-wise penalty weights during training. CRaC is based on a general Augmented Lagrangian method, a well-established technique in constrained optimization. Experimental results on two popular segmentation benchmarks, and two well-known segmentation networks, demonstrate the superiority of CRaC compared to existing approaches. The code is available at: https://github.com/Bala93/CRac/
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Neighbor-Aware Calibration of Segmentation Networks with Penalty-Based Constraints
Authors:
Balamurali Murugesan,
Sukesh Adiga Vasudeva,
Bingyuan Liu,
Hervé Lombaert,
Ismail Ben Ayed,
Jose Dolz
Abstract:
Ensuring reliable confidence scores from deep neural networks is of paramount significance in critical decision-making systems, particularly in real-world domains such as healthcare. Recent literature on calibrating deep segmentation networks has resulted in substantial progress. Nevertheless, these approaches are strongly inspired by the advancements in classification tasks, and thus their uncert…
▽ More
Ensuring reliable confidence scores from deep neural networks is of paramount significance in critical decision-making systems, particularly in real-world domains such as healthcare. Recent literature on calibrating deep segmentation networks has resulted in substantial progress. Nevertheless, these approaches are strongly inspired by the advancements in classification tasks, and thus their uncertainty is usually modeled by leveraging the information of individual pixels, disregarding the local structure of the object of interest. Indeed, only the recent Spatially Varying Label Smoothing (SVLS) approach considers pixel spatial relationships across classes, by softening the pixel label assignments with a discrete spatial Gaussian kernel. In this work, we first present a constrained optimization perspective of SVLS and demonstrate that it enforces an implicit constraint on soft class proportions of surrounding pixels. Furthermore, our analysis shows that SVLS lacks a mechanism to balance the contribution of the constraint with the primary objective, potentially hindering the optimization process. Based on these observations, we propose NACL (Neighbor Aware CaLibration), a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty, offering more flexibility. Comprehensive experiments on a wide variety of well-known segmentation benchmarks demonstrate the superior calibration performance of the proposed approach, without affecting its discriminative power. Furthermore, ablation studies empirically show the model agnostic nature of our approach, which can be used to train a wide span of deep segmentation networks.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation
Authors:
Balamurali Murugesan,
Rukhshanda Hussain,
Rajarshi Bhattacharya,
Ismail Ben Ayed,
Jose Dolz
Abstract:
Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work w…
▽ More
Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.
△ Less
Submitted 13 January, 2024; v1 submitted 30 June, 2023;
originally announced July 2023.
-
Trust your neighbours: Penalty-based constraints for model calibration
Authors:
Balamurali Murugesan,
Sukesh Adiga V,
Bingyuan Liu,
Hervé Lombaert,
Ismail Ben Ayed,
Jose Dolz
Abstract:
Ensuring reliable confidence scores from deep networks is of pivotal importance in critical decision-making systems, notably in the medical domain. While recent literature on calibrating deep segmentation networks has led to significant progress, their uncertainty is usually modeled by leveraging the information of individual pixels, which disregards the local structure of the object of interest.…
▽ More
Ensuring reliable confidence scores from deep networks is of pivotal importance in critical decision-making systems, notably in the medical domain. While recent literature on calibrating deep segmentation networks has led to significant progress, their uncertainty is usually modeled by leveraging the information of individual pixels, which disregards the local structure of the object of interest. In particular, only the recent Spatially Varying Label Smoothing (SVLS) approach addresses this issue by softening the pixel label assignments with a discrete spatial Gaussian kernel. In this work, we first present a constrained optimization perspective of SVLS and demonstrate that it enforces an implicit constraint on soft class proportions of surrounding pixels. Furthermore, our analysis shows that SVLS lacks a mechanism to balance the contribution of the constraint with the primary objective, potentially hindering the optimization process. Based on these observations, we propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty, offering more flexibility. Comprehensive experiments on a variety of well-known segmentation benchmarks demonstrate the superior performance of the proposed approach.
△ Less
Submitted 13 January, 2024; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Calibrating Segmentation Networks with Margin-based Label Smoothing
Authors:
Balamurali Murugesan,
Bingyuan Liu,
Adrian Galdran,
Ismail Ben Ayed,
Jose Dolz
Abstract:
Despite the undeniable progress in visual recognition tasks fueled by deep neural networks, there exists recent evidence showing that these models are poorly calibrated, resulting in over-confident predictions. The standard practices of minimizing the cross entropy loss during training promote the predicted softmax probabilities to match the one-hot label assignments. Nevertheless, this yields a p…
▽ More
Despite the undeniable progress in visual recognition tasks fueled by deep neural networks, there exists recent evidence showing that these models are poorly calibrated, resulting in over-confident predictions. The standard practices of minimizing the cross entropy loss during training promote the predicted softmax probabilities to match the one-hot label assignments. Nevertheless, this yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations, which exacerbates the miscalibration problem. Recent observations from the classification literature suggest that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. Despite these findings, the impact of these losses in the relevant task of calibrating medical image segmentation networks remains unexplored. In this work, we provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian term) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of public medical image segmentation benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, whereas the discriminative performance is also improved.
△ Less
Submitted 30 January, 2024; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Deep learning based non-contact physiological monitoring in Neonatal Intensive Care Unit
Authors:
Nicky Nirlipta Sahoo,
Balamurali Murugesan,
Ayantika Das,
Srinivasa Karthik,
Keerthi Ram,
Steffen Leonhardt,
Jayaraj Joseph,
Mohanasankar Sivaprakasam
Abstract:
Preterm babies in the Neonatal Intensive Care Unit (NICU) have to undergo continuous monitoring of their cardiac health. Conventional monitoring approaches are contact-based, making the neonates prone to various nosocomial infections. Video-based monitoring approaches have opened up potential avenues for contactless measurement. This work presents a pipeline for remote estimation of cardiopulmonar…
▽ More
Preterm babies in the Neonatal Intensive Care Unit (NICU) have to undergo continuous monitoring of their cardiac health. Conventional monitoring approaches are contact-based, making the neonates prone to various nosocomial infections. Video-based monitoring approaches have opened up potential avenues for contactless measurement. This work presents a pipeline for remote estimation of cardiopulmonary signals from videos in NICU setup. We have proposed an end-to-end deep learning (DL) model that integrates a non-learning based approach to generate surrogate ground truth (SGT) labels for supervision, thus refraining from direct dependency on true ground truth labels. We have performed an extended qualitative and quantitative analysis to examine the efficacy of our proposed DL-based pipeline and achieved an overall average mean absolute error of 4.6 beats per minute (bpm) and root mean square error of 6.2 bpm in the estimated heart rate.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
A deep cascade of ensemble of dual domain networks with gradient-based T1 assistance and perceptual refinement for fast MRI reconstruction
Authors:
Balamurali Murugesan,
Sriprabha Ramanarayanan,
Sricharan Vijayarangan,
Keerthi Ram,
Naranamangalam R Jagannathan,
Mohanasankar Sivaprakasam
Abstract:
Deep learning networks have shown promising results in fast magnetic resonance imaging (MRI) reconstruction. In our work, we develop deep networks to further improve the quantitative and the perceptual quality of reconstruction. To begin with, we propose reconsynergynet (RSN), a network that combines the complementary benefits of independently operating on both the image and the Fourier domain. Fo…
▽ More
Deep learning networks have shown promising results in fast magnetic resonance imaging (MRI) reconstruction. In our work, we develop deep networks to further improve the quantitative and the perceptual quality of reconstruction. To begin with, we propose reconsynergynet (RSN), a network that combines the complementary benefits of independently operating on both the image and the Fourier domain. For a single-coil acquisition, we introduce deep cascade RSN (DC-RSN), a cascade of RSN blocks interleaved with data fidelity (DF) units. Secondly, we improve the structure recovery of DC-RSN for T2 weighted Imaging (T2WI) through assistance of T1 weighted imaging (T1WI), a sequence with short acquisition time. T1 assistance is provided to DC-RSN through a gradient of log feature (GOLF) fusion. Furthermore, we propose perceptual refinement network (PRN) to refine the reconstructions for better visual information fidelity (VIF), a metric highly correlated to radiologists opinion on the image quality. Lastly, for multi-coil acquisition, we propose variable splitting RSN (VS-RSN), a deep cascade of blocks, each block containing RSN, multi-coil DF unit, and a weighted average module. We extensively validate our models DC-RSN and VS-RSN for single-coil and multi-coil acquisitions and report the state-of-the-art performance. We obtain a SSIM of 0.768, 0.923, 0.878 for knee single-coil-4x, multi-coil-4x, and multi-coil-8x in fastMRI. We also conduct experiments to demonstrate the efficacy of GOLF based T1 assistance and PRN.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
MAC-ReconNet: A Multiple Acquisition Context based Convolutional Neural Network for MR Image Reconstruction using Dynamic Weight Prediction
Authors:
Sriprabha Ramanarayanan,
Balamurali Murugesan,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Convolutional Neural network-based MR reconstruction methods have shown to provide fast and high quality reconstructions. A primary drawback with a CNN-based model is that it lacks flexibility and can effectively operate only for a specific acquisition context limiting practical applicability. By acquisition context, we mean a specific combination of three input settings considered namely, the ana…
▽ More
Convolutional Neural network-based MR reconstruction methods have shown to provide fast and high quality reconstructions. A primary drawback with a CNN-based model is that it lacks flexibility and can effectively operate only for a specific acquisition context limiting practical applicability. By acquisition context, we mean a specific combination of three input settings considered namely, the anatomy under study, undersampling mask pattern and acceleration factor for undersampling. The model could be trained jointly on images combining multiple contexts. However the model does not meet the performance of context specific models nor extensible to contexts unseen at train time. This necessitates a modification to the existing architecture in generating context specific weights so as to incorporate flexibility to multiple contexts. We propose a multiple acquisition context based network, called MAC-ReconNet for MRI reconstruction, flexible to multiple acquisition contexts and generalizable to unseen contexts for applicability in real scenarios. The proposed network has an MRI reconstruction module and a dynamic weight prediction (DWP) module. The DWP module takes the corresponding acquisition context information as input and learns the context-specific weights of the reconstruction module which changes dynamically with context at run time. We show that the proposed approach can handle multiple contexts based on cardiac and brain datasets, Gaussian and Cartesian undersampling patterns and five acceleration factors. The proposed network outperforms the naive jointly trained model and gives competitive results with the context-specific models both quantitatively and qualitatively. We also demonstrate the generalizability of our model by testing on contexts unseen at train time.
△ Less
Submitted 10 March, 2022; v1 submitted 9 November, 2021;
originally announced November 2021.
-
Style Transfer based Coronary Artery Segmentation in X-ray Angiogram
Authors:
Supriti Mulay,
Keerthi Ram,
Balamurali Murugesan,
Mohanasankar Sivaprakasam
Abstract:
X-ray coronary angiography (XCA) is a principal approach employed for identifying coronary disorders. Deep learning-based networks have recently shown tremendous promise in the diagnosis of coronary disorder from XCA scans. A deep learning-based edge adaptive instance normalization style transfer technique for segmenting the coronary arteries, is presented in this paper. The proposed technique com…
▽ More
X-ray coronary angiography (XCA) is a principal approach employed for identifying coronary disorders. Deep learning-based networks have recently shown tremendous promise in the diagnosis of coronary disorder from XCA scans. A deep learning-based edge adaptive instance normalization style transfer technique for segmenting the coronary arteries, is presented in this paper. The proposed technique combines adaptive instance normalization style transfer with the dense extreme inception network and convolution block attention module to get the best artery segmentation performance. We tested the proposed method on two publicly available XCA datasets, and achieved a segmentation accuracy of 0.9658 and Dice coefficient of 0.71. We believe that the proposed method shows that the prediction can be completed in the fastest time with training on the natural images, and can be reliably used to diagnose and detect coronary disorders.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
RPnet: A Deep Learning approach for robust R Peak detection in noisy ECG
Authors:
Sricharan Vijayarangan,
Vignesh R,
Balamurali Murugesan,
Preejith SP,
Jayaraj Joseph,
Mohansankar Sivaprakasam
Abstract:
Automatic detection of R-peaks in an Electrocardiogram signal is crucial in a multitude of applications including Heart Rate Variability (HRV) analysis and Cardio Vascular Disease(CVD) diagnosis. Although there have been numerous approaches that have successfully addressed the problem, there has been a notable dip in the performance of these existing detectors on ECG episodes that contain noise an…
▽ More
Automatic detection of R-peaks in an Electrocardiogram signal is crucial in a multitude of applications including Heart Rate Variability (HRV) analysis and Cardio Vascular Disease(CVD) diagnosis. Although there have been numerous approaches that have successfully addressed the problem, there has been a notable dip in the performance of these existing detectors on ECG episodes that contain noise and HRV Irregulates. On the other hand, Deep Learning(DL) based methods have shown to be adept at modelling data that contain noise. In image to image translation, Unet is the fundamental block in many of the networks. In this work, a novel application of the Unet combined with Inception and Residual blocks is proposed to perform the extraction of R-peaks from an ECG. Furthermore, the problem formulation also robustly deals with issues of variability and sparsity of ECG R-peaks. The proposed network was trained on a database containing ECG episodes that have CVD and was tested against three traditional ECG detectors on a validation set. The model achieved an F1 score of 0.9837, which is a substantial improvement over the other beat detectors. Furthermore, the model was also evaluated on three other databases. The proposed network achieved high F1 scores across all datasets which established its generalizing capacity. Additionally, a thorough analysis of the model's performance in the presence of different levels of noise was carried out.
△ Less
Submitted 17 April, 2020;
originally announced April 2020.
-
Interpreting Deep Neural Networks for Single-Lead ECG Arrhythmia Classification
Authors:
Sricharan Vijayarangan,
Balamurali Murugesan,
Vignesh R,
Preejith SP,
Jayaraj Joseph,
Mohansankar Sivaprakasam
Abstract:
Cardiac arrhythmia is a prevalent and significant cause of morbidity and mortality among cardiac ailments. Early diagnosis is crucial in providing intervention for patients suffering from cardiac arrhythmia. Traditionally, diagnosis is performed by examination of the Electrocardiogram (ECG) by a cardiologist. This method of diagnosis is hampered by the lack of accessibility to expert cardiologists…
▽ More
Cardiac arrhythmia is a prevalent and significant cause of morbidity and mortality among cardiac ailments. Early diagnosis is crucial in providing intervention for patients suffering from cardiac arrhythmia. Traditionally, diagnosis is performed by examination of the Electrocardiogram (ECG) by a cardiologist. This method of diagnosis is hampered by the lack of accessibility to expert cardiologists. For quite some time, signal processing methods had been used to automate arrhythmia diagnosis. However, these traditional methods require expert knowledge and are unable to model a wide range of arrhythmia. Recently, Deep Learning methods have provided solutions to performing arrhythmia diagnosis at scale. However, the black-box nature of these models prohibit clinical interpretation of cardiac arrhythmia. There is a dire need to correlate the obtained model outputs to the corresponding segments of the ECG. To this end, two methods are proposed to provide interpretability to the models. The first method is a novel application of Gradient-weighted Class Activation Map (Grad-CAM) for visualizing the saliency of the CNN model. In the second approach, saliency is derived by learning the input deletion mask for the LSTM model. The visualizations are provided on a model whose competence is established by comparisons against baselines. The results of model saliency not only provide insight into the prediction capability of the model but also aligns with the medical literature for the classification of cardiac arrhythmia.
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow
Authors:
Balamurali Murugesan,
Sricharan Vijayarangan,
Kaushik Sarveswaran,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Deep learning networks are being developed in every stage of the MRI workflow and have provided state-of-the-art results. However, this has come at the cost of increased computation requirement and storage. Hence, replacing the networks with compact models at various stages in the MRI workflow can significantly reduce the required storage space and provide considerable speedup. In computer vision,…
▽ More
Deep learning networks are being developed in every stage of the MRI workflow and have provided state-of-the-art results. However, this has come at the cost of increased computation requirement and storage. Hence, replacing the networks with compact models at various stages in the MRI workflow can significantly reduce the required storage space and provide considerable speedup. In computer vision, knowledge distillation is a commonly used method for model compression. In our work, we propose a knowledge distillation (KD) framework for the image to image problems in the MRI workflow in order to develop compact, low-parameter models without a significant drop in performance. We propose a combination of the attention-based feature distillation method and imitation loss and demonstrate its effectiveness on the popular MRI reconstruction architecture, DC-CNN. We conduct extensive experiments using Cardiac, Brain, and Knee MRI datasets for 4x, 5x and 8x accelerations. We observed that the student network trained with the assistance of the teacher using our proposed KD framework provided significant improvement over the student network trained without assistance across all the datasets and acceleration factors. Specifically, for the Knee dataset, the student network achieves $65\%$ parameter reduction, 2x faster CPU running time, and 1.5x faster GPU running time compared to the teacher. Furthermore, we compare our attention-based feature distillation method with other feature distillation methods. We also conduct an ablative study to understand the significance of attention-based distillation and imitation loss. We also extend our KD framework for MRI super-resolution and show encouraging results.
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
DC-WCNN: A deep cascade of wavelet based convolutional neural networks for MR Image Reconstruction
Authors:
Sriprabha Ramanarayanan,
Balamurali Murugesan,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Several variants of Convolutional Neural Networks (CNN) have been developed for Magnetic Resonance (MR) image reconstruction. Among them, U-Net has shown to be the baseline architecture for MR image reconstruction. However, sub-sampling is performed by its pooling layers, causing information loss which in turn leads to blur and missing fine details in the reconstructed image. We propose a modifica…
▽ More
Several variants of Convolutional Neural Networks (CNN) have been developed for Magnetic Resonance (MR) image reconstruction. Among them, U-Net has shown to be the baseline architecture for MR image reconstruction. However, sub-sampling is performed by its pooling layers, causing information loss which in turn leads to blur and missing fine details in the reconstructed image. We propose a modification to the U-Net architecture to recover fine structures. The proposed network is a wavelet packet transform based encoder-decoder CNN with residual learning called CNN. The proposed WCNN has discrete wavelet transform instead of pooling and inverse wavelet transform instead of unpooling layers and residual connections. We also propose a deep cascaded framework (DC-WCNN) which consists of cascades of WCNN and k-space data fidelity units to achieve high quality MR reconstruction. Experimental results show that WCNN and DC-WCNN give promising results in terms of evaluation metrics and better recovery of fine details as compared to other methods.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
A context based deep learning approach for unbalanced medical image segmentation
Authors:
Balamurali Murugesan,
Kaushik Sarveswaran,
Vijaya Raghavan S,
Sharath M Shankaranarayana,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Automated medical image segmentation is an important step in many medical procedures. Recently, deep learning networks have been widely used for various medical image segmentation tasks, with U-Net and generative adversarial nets (GANs) being some of the commonly used ones. Foreground-background class imbalance is a common occurrence in medical images, and U-Net has difficulty in handling class im…
▽ More
Automated medical image segmentation is an important step in many medical procedures. Recently, deep learning networks have been widely used for various medical image segmentation tasks, with U-Net and generative adversarial nets (GANs) being some of the commonly used ones. Foreground-background class imbalance is a common occurrence in medical images, and U-Net has difficulty in handling class imbalance because of its cross entropy (CE) objective function. Similarly, GAN also suffers from class imbalance because the discriminator looks at the entire image to classify it as real or fake. Since the discriminator is essentially a deep learning classifier, it is incapable of correctly identifying minor changes in small structures. To address these issues, we propose a novel context based CE loss function for U-Net, and a novel architecture Seg-GLGAN. The context based CE is a linear combination of CE obtained over the entire image and its region of interest (ROI). In Seg-GLGAN, we introduce a novel context discriminator to which the entire image and its ROI are fed as input, thus enforcing local context. We conduct extensive experiments using two challenging unbalanced datasets: PROMISE12 and ACDC. We observe that segmentation results obtained from our methods give better segmentation metrics as compared to various baseline methods.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
REFUGE Challenge: A Unified Framework for Evaluating Automated Methods for Glaucoma Assessment from Fundus Photographs
Authors:
José Ignacio Orlando,
Huazhu Fu,
João Barbossa Breda,
Karel van Keer,
Deepti R. Bathula,
Andrés Diaz-Pinto,
Ruogu Fang,
Pheng-Ann Heng,
Jeyoung Kim,
JoonHo Lee,
Joonseok Lee,
Xiaoxiao Li,
Peng Liu,
Shuai Lu,
Balamurali Murugesan,
Valery Naranjo,
Sai Samarth R. Phaye,
Sharath M. Shankaranarayana,
Apoorva Sikka,
Jaemin Son,
Anton van den Hengel,
Shujun Wang,
Junyan Wu,
Zifeng Wu,
Guanghui Xu
, et al. (7 additional authors not shown)
Abstract:
Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely a…
▽ More
Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely applied for medical image analysis, have not been extensively used for glaucoma assessment due to the limited size of the available data sets. Furthermore, the lack of a standardize benchmark strategy makes difficult to compare existing methods in a uniform way. In order to overcome these issues we set up the Retinal Fundus Glaucoma Challenge, REFUGE (\url{https://refuge.grand-challenge.org}), held in conjunction with MICCAI 2018. The challenge consisted of two primary tasks, namely optic disc/cup segmentation and glaucoma classification. As part of REFUGE, we have publicly released a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one. We have also built an evaluation framework to ease and ensure fairness in the comparison of different models, encouraging the development of novel techniques in the field. 12 teams qualified and participated in the online challenge. This paper summarizes their methods and analyzes their corresponding results. In particular, we observed that two of the top-ranked teams outperformed two human experts in the glaucoma classification task. Furthermore, the segmentation results were in general consistent with the ground truth annotations, with complementary outcomes that can be further exploited by ensembling the results.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Recon-GLGAN: A Global-Local context based Generative Adversarial Network for MRI Reconstruction
Authors:
Balamurali Murugesan,
Vijaya Raghavan S,
Kaushik Sarveswaran,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Magnetic resonance imaging (MRI) is one of the best medical imaging modalities as it offers excellent spatial resolution and soft-tissue contrast. But, the usage of MRI is limited by its slow acquisition time, which makes it expensive and causes patient discomfort. In order to accelerate the acquisition, multiple deep learning networks have been proposed. Recently, Generative Adversarial Networks…
▽ More
Magnetic resonance imaging (MRI) is one of the best medical imaging modalities as it offers excellent spatial resolution and soft-tissue contrast. But, the usage of MRI is limited by its slow acquisition time, which makes it expensive and causes patient discomfort. In order to accelerate the acquisition, multiple deep learning networks have been proposed. Recently, Generative Adversarial Networks (GANs) have shown promising results in MRI reconstruction. The drawback with the proposed GAN based methods is it does not incorporate the prior information about the end goal which could help in better reconstruction. For instance, in the case of cardiac MRI, the physician would be interested in the heart region which is of diagnostic relevance while excluding the peripheral regions. In this work, we show that incorporating prior information about a region of interest in the model would offer better performance. Thereby, we propose a novel GAN based architecture, Reconstruction Global-Local GAN (Recon-GLGAN) for MRI reconstruction. The proposed model contains a generator and a context discriminator which incorporates global and local contextual information from images. Our model offers significant performance improvement over the baseline models. Our experiments show that the concept of a context discriminator can be extended to existing GAN based reconstruction models to offer better performance. We also demonstrate that the reconstructions from the proposed method give segmentation results similar to fully sampled images.
△ Less
Submitted 25 August, 2019;
originally announced August 2019.
-
Conv-MCD: A Plug-and-Play Multi-task Module for Medical Image Segmentation
Authors:
Balamurali Murugesan,
Kaushik Sarveswaran,
Sharath M Shankaranarayana,
Keerthi Ram,
Jayaraj Joseph,
Mohanasankar Sivaprakasam
Abstract:
For the task of medical image segmentation, fully convolutional network (FCN) based architectures have been extensively used with various modifications. A rising trend in these architectures is to employ joint-learning of the target region with an auxiliary task, a method commonly known as multi-task learning. These approaches help impose smoothness and shape priors, which vanilla FCN approaches d…
▽ More
For the task of medical image segmentation, fully convolutional network (FCN) based architectures have been extensively used with various modifications. A rising trend in these architectures is to employ joint-learning of the target region with an auxiliary task, a method commonly known as multi-task learning. These approaches help impose smoothness and shape priors, which vanilla FCN approaches do not necessarily incorporate. In this paper, we propose a novel plug-and-play module, which we term as Conv-MCD, which exploits structural information in two ways - i) using the contour map and ii) using the distance map, both of which can be obtained from ground truth segmentation maps with no additional annotation costs. The key benefit of our module is the ease of its addition to any state-of-the-art architecture, resulting in a significant improvement in performance with a minimal increase in parameters. To substantiate the above claim, we conduct extensive experiments using 4 state-of-the-art architectures across various evaluation metrics, and report a significant increase in performance in relation to the base networks. In addition to the aforementioned experiments, we also perform ablative studies and visualization of feature maps to further elucidate our approach.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
Deep Network for Capacitive ECG Denoising
Authors:
Vignesh Ravichandran,
Balamurali Murugesan,
Sharath M Shankaranarayana,
Keerthi Ram,
Preejith S. P,
Jayaraj Joseph,
Mohanasankar Sivaprakasam
Abstract:
Continuous monitoring of cardiac health under free living condition is crucial to provide effective care for patients undergoing post operative recovery and individuals with high cardiac risk like the elderly. Capacitive Electrocardiogram (cECG) is one such technology which allows comfortable and long term monitoring through its ability to measure biopotential in conditions without having skin con…
▽ More
Continuous monitoring of cardiac health under free living condition is crucial to provide effective care for patients undergoing post operative recovery and individuals with high cardiac risk like the elderly. Capacitive Electrocardiogram (cECG) is one such technology which allows comfortable and long term monitoring through its ability to measure biopotential in conditions without having skin contact. cECG monitoring can be done using many household objects like chairs, beds and even car seats allowing for seamless monitoring of individuals. This method is unfortunately highly susceptible to motion artifacts which greatly limits its usage in clinical practice. The current use of cECG systems has been limited to performing rhythmic analysis. In this paper we propose a novel end-to-end deep learning architecture to perform the task of denoising capacitive ECG. The proposed network is trained using motion corrupted three channel cECG and a reference LEAD I ECG collected on individuals while driving a car. Further, we also propose a novel joint loss function to apply loss on both signal and frequency domain. We conduct extensive rhythmic analysis on the model predictions and the ground truth. We further evaluate the signal denoising using Mean Square Error(MSE) and Cross Correlation between model predictions and ground truth. We report MSE of 0.167 and Cross Correlation of 0.476. The reported results highlight the feasibility of performing morphological analysis using the filtered cECG. The proposed approach can allow for continuous and comprehensive monitoring of the individuals in free living conditions.
△ Less
Submitted 29 March, 2019;
originally announced March 2019.
-
RespNet: A deep learning model for extraction of respiration from photoplethysmogram
Authors:
Vignesh Ravichandran,
Balamurali Murugesan,
Vaishali Balakarthikeyan,
Sharath M Shankaranarayana,
Keerthi Ram,
Preejith S. P,
Jayaraj Joseph,
Mohanasankar Sivaprakasam
Abstract:
Respiratory ailments afflict a wide range of people and manifests itself through conditions like asthma and sleep apnea. Continuous monitoring of chronic respiratory ailments is seldom used outside the intensive care ward due to the large size and cost of the monitoring system. While Electrocardiogram (ECG) based respiration extraction is a validated approach, its adoption is limited by access to…
▽ More
Respiratory ailments afflict a wide range of people and manifests itself through conditions like asthma and sleep apnea. Continuous monitoring of chronic respiratory ailments is seldom used outside the intensive care ward due to the large size and cost of the monitoring system. While Electrocardiogram (ECG) based respiration extraction is a validated approach, its adoption is limited by access to a suitable continuous ECG monitor. Recently, due to the widespread adoption of wearable smartwatches with in-built Photoplethysmogram (PPG) sensor, it is being considered as a viable candidate for continuous and unobtrusive respiration monitoring. Research in this domain, however, has been predominantly focussed on estimating respiration rate from PPG. In this work, a novel end-to-end deep learning network called RespNet is proposed to perform the task of extracting the respiration signal from a given input PPG as opposed to extracting respiration rate. The proposed network was trained and tested on two different datasets utilizing different modalities of reference respiration signal recordings. Also, the similarity and performance of the proposed network against two conventional signal processing approaches for extracting respiration signal were studied. The proposed method was tested on two independent datasets with a Mean Squared Error of 0.262 and 0.145. The Cross-Correlation coefficient of the respective datasets were found to be 0.933 and 0.931. The reported errors and similarity was found to be better than conventional approaches. The proposed approach would aid clinicians to provide comprehensive evaluation of sleep-related respiratory conditions and chronic respiratory ailments while being comfortable and inexpensive for the patient.
△ Less
Submitted 20 February, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation
Authors:
Balamurali Murugesan,
Kaushik Sarveswaran,
Sharath M Shankaranarayana,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Image segmentation is a primary task in many medical applications. Recently, many deep networks derived from U-Net have been extensively used in various medical image segmentation tasks. However, in most of the cases, networks similar to U-net produce coarse and non-smooth segmentations with lots of discontinuities. To improve and refine the performance of U-Net like networks, we propose the use o…
▽ More
Image segmentation is a primary task in many medical applications. Recently, many deep networks derived from U-Net have been extensively used in various medical image segmentation tasks. However, in most of the cases, networks similar to U-net produce coarse and non-smooth segmentations with lots of discontinuities. To improve and refine the performance of U-Net like networks, we propose the use of parallel decoders which along with performing the mask predictions also perform contour prediction and distance map estimation. The contour and distance map aid in ensuring smoothness in the segmentation predictions. To facilitate joint training of three tasks, we propose a novel architecture called Psi-Net with a single encoder and three parallel decoders (thus having a shape of $Ψ$), one decoder to learns the segmentation mask prediction and other two decoders to learn the auxiliary tasks of contour detection and distance map estimation. The learning of these auxiliary tasks helps in capturing the shape and the boundary information. We also propose a new joint loss function for the proposed architecture. The loss function consists of a weighted combination of Negative Log likelihood and Mean Square Error loss. We have used two publicly available datasets: 1) Origa dataset for the task of optic cup and disc segmentation and 2) Endovis segment dataset for the task of polyp segmentation to evaluate our model. We have conducted extensive experiments using our network to show our model gives better results in terms of segmentation, boundary and shape metrics.
△ Less
Submitted 14 August, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Joint shape learning and segmentation for medical images using a minimalistic deep network
Authors:
Balamurali Murugesan,
Kaushik Sarveswaran,
Sharath M Shankaranarayana,
Keerthi Ram,
Mohanasankar Sivaprakasam
Abstract:
Recently, state-of-the-art results have been achieved in semantic segmentation using fully convolutional networks (FCNs). Most of these networks employ encoder-decoder style architecture similar to U-Net and are trained with images and the corresponding segmentation maps as a pixel-wise classification task. Such frameworks only exploit class information by using the ground truth segmentation maps.…
▽ More
Recently, state-of-the-art results have been achieved in semantic segmentation using fully convolutional networks (FCNs). Most of these networks employ encoder-decoder style architecture similar to U-Net and are trained with images and the corresponding segmentation maps as a pixel-wise classification task. Such frameworks only exploit class information by using the ground truth segmentation maps. In this paper, we propose a multi-task learning framework with the main aim of exploiting structural and spatial information along with the class information. We modify the decoder part of the FCN to exploit class information and the structural information as well. We intend to do this while also kee** the parameters of the network as low as possible. We obtain the structural information using either of the two ways: i) using the contour map and ii) using the distance map, both of which can be obtained from ground truth segmentation maps with no additional annotation costs. We also explore different ways in which distance maps can be computed and study the effects of different distance maps on the segmentation performance. We also experiment extensively on two different medical image segmentation applications: i.e i) using color fundus images for optic disc and cup segmentation and ii) using endoscopic images for polyp segmentation. Through our experiments, we report results comparable to, and in some cases performing better than the current state-of-the-art architectures and with an order of 2x reduction in the number of parameters.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.