-
Full-resolution Lung Nodule Segmentation from Chest X-ray Images using Residual Encoder-Decoder Networks
Authors:
Michael James Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Manoranjan Paul,
**g Zhu,
Prabal Datta Barua,
U. Rajendra Acharya,
Fang Chen,
Jianlong Zhou
Abstract:
Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies us…
▽ More
Lung cancer is the leading cause of cancer death and early diagnosis is associated with a positive prognosis. Chest X-ray (CXR) provides an inexpensive imaging mode for lung cancer diagnosis. Suspicious nodules are difficult to distinguish from vascular and bone structures using CXR. Computer vision has previously been proposed to assist human radiologists in this task, however, leading studies use down-sampled images and computationally expensive methods with unproven generalization. Instead, this study localizes lung nodules using efficient encoder-decoder neural networks that process full resolution images to avoid any signal loss resulting from down-sampling. Encoder-decoder networks are trained and tested using the JSRT lung nodule dataset. The networks are used to localize lung nodules from an independent external CXR dataset. Sensitivity and false positive rates are measured using an automated framework to eliminate any observer subjectivity. These experiments allow for the determination of the optimal network depth, image resolution and pre-processing pipeline for generalized lung nodule localization. We find that nodule localization is influenced by subtlety, with more subtle nodules being detected in earlier training epochs. Therefore, we propose a novel self-ensemble model from three consecutive epochs centered on the validation optimum. This ensemble achieved a sensitivity of 85% in 10-fold internal testing with false positives of 8 per image. A sensitivity of 81% is achieved at a false positive rate of 6 following morphological false positive reduction. This result is comparable to more computationally complex systems based on linear and spatial filtering, but with a sub-second inference time that is faster than other methods. The proposed algorithm achieved excellent generalization results against an external dataset with sensitivity of 77% at a false positive rate of 7.6.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
A Dynamic Weighted Federated Learning for Android Malware Classification
Authors:
Ayushi Chaudhuri,
Arijit Nandi,
Buddhadeb Pradhan
Abstract:
Android malware attacks are increasing daily at a tremendous volume, making Android users more vulnerable to cyber-attacks. Researchers have developed many machine learning (ML)/ deep learning (DL) techniques to detect and mitigate android malware attacks. However, due to technological advancement, there is a rise in android mobile devices. Furthermore, the devices are geographically dispersed, re…
▽ More
Android malware attacks are increasing daily at a tremendous volume, making Android users more vulnerable to cyber-attacks. Researchers have developed many machine learning (ML)/ deep learning (DL) techniques to detect and mitigate android malware attacks. However, due to technological advancement, there is a rise in android mobile devices. Furthermore, the devices are geographically dispersed, resulting in distributed data. In such scenario, traditional ML/DL techniques are infeasible since all of these approaches require the data to be kept in a central system; this may provide a problem for user privacy because of the massive proliferation of Android mobile devices; putting the data in a central system creates an overhead. Also, the traditional ML/DL-based android malware classification techniques are not scalable. Researchers have proposed federated learning (FL) based android malware classification system to solve the privacy preservation and scalability with high classification performance. In traditional FL, Federated Averaging (FedAvg) is utilized to construct the global model at each round by merging all of the local models obtained from all of the customers that participated in the FL. However, the conventional FedAvg has a disadvantage: if one poor-performing local model is included in global model development for each round, it may result in an under-performing global model. Because FedAvg favors all local models equally when averaging. To address this issue, our main objective in this work is to design a dynamic weighted federated averaging (DW-FedAvg) strategy in which the weights for each local model are automatically updated based on their performance at the client. The DW-FedAvg is evaluated using four popular benchmark datasets, Melgenome, Drebin, Kronodroid and Tuandromd used in android malware classification research.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips
Authors:
Michael James Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Nagesh Shukla,
Sanjoy Paul
Abstract:
The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when n…
▽ More
The ever-growing volume of satellite imagery data presents a challenge for industry and governments making data-driven decisions based on the timely analysis of very large data sets. Commonly used deep learning algorithms for automatic classification of satellite images are time and resource-intensive to train. The cost of retraining in the context of Big Data presents a practical challenge when new image data and/or classes are added to a training corpus. Recognizing the need for an adaptable, accurate, and scalable satellite image chip classification scheme, in this research we present an ensemble of: i) a slow to train but high accuracy vision transformer; and ii) a fast to train, low-parameter convolutional neural network. The vision transformer model provides a scalable and accurate foundation model. The high-speed CNN provides an efficient means of incorporating newly labelled data into analysis, at the expense of lower accuracy. To simulate incremental data, the very large (~400,000 images) So2Sat LCZ42 satellite image chip dataset is divided into four intervals, with the high-speed CNN retrained every interval and the vision transformer trained every half interval. This experimental setup mimics an increase in data volume and diversity over time. For the task of automated land-cover/land-use classification, the ensemble models for each data increment outperform each of the component models, with best accuracy of 65% against a holdout test partition of the So2Sat dataset. The proposed ensemble and staggered training schedule provide a scalable and cost-effective satellite image classification scheme that is optimized to process very large volumes of satellite data.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detection
Authors:
Michael Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Manoranjan Paul,
**g Zhu,
Hui Wen Loh,
Prabal Datta Barua,
U. Rajendra Arharya
Abstract:
Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accur…
▽ More
Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accuracy, and privacy. We can resolve these issues by combining the chest X-ray imaging mode with a federated deep-learning approach, provided that the federated model is trained on homogenous data to ensure that no single data source can adversely bias the model at any point in time. In this study we show that an image pre-processing pipeline that homogenizes and debiases chest X-ray images can improve both internal classification and external generalization, paving the way for a low-cost and accessible deep learning-based clinical system for lung cancer screening. An evolutionary pruning mechanism is used to train a nodule detection deep learning model on the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is performed using all combinations of lung field segmentation, close crop**, and rib suppression operators. We show that this pre-processing pipeline results in deep learning models that successfully generalize an independent lung nodule dataset using ablation studies to assess the contribution of each operator in this pipeline. In strip** chest X-ray images of known confounding variables by lung field segmentation, along with suppression of signal noise from the bone structure we can train a highly accurate deep learning lung nodule detection algorithm with outstanding generalization accuracy of 89% to nodule samples in unseen data.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Systematic investigation into generalization of COVID-19 CT deep learning models with Gabor ensemble for lung involvement scoring
Authors:
Michael J. Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Maryam Fallahpoor,
Chegeni Hossein,
Manoranjan Paul
Abstract:
The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focusing on diagnosis and stratification of COVID-19 from medical images. Despite this large-scale research effort, these models have found limited practical application due in part to unproven generalization of these models beyond their source study. This study investigates the genera…
▽ More
The COVID-19 pandemic has inspired unprecedented data collection and computer vision modelling efforts worldwide, focusing on diagnosis and stratification of COVID-19 from medical images. Despite this large-scale research effort, these models have found limited practical application due in part to unproven generalization of these models beyond their source study. This study investigates the generalizability of key published models using the publicly available COVID-19 Computed Tomography data through cross dataset validation. We then assess the predictive ability of these models for COVID-19 severity using an independent new dataset that is stratified for COVID-19 lung involvement. Each inter-dataset study is performed using histogram equalization, and contrast limited adaptive histogram equalization with and without a learning Gabor filter. The study shows high variability in the generalization of models trained on these datasets due to varied sample image provenances and acquisition processes amongst other factors. We show that under certain conditions, an internally consistent dataset can generalize well to an external dataset despite structural differences between these datasets with f1 scores up to 86%. Our best performing model shows high predictive accuracy for lung involvement score for an independent dataset for which expertly labelled lung involvement stratification is available. Creating an ensemble of our best model for disease positive prediction with our best model for disease negative prediction using a min-max function resulted in a superior model for lung involvement prediction with average predictive accuracy of 75% for zero lung involvement and 96% for 75-100% lung involvement with almost linear relationship between these stratifications.
△ Less
Submitted 19 April, 2021;
originally announced May 2021.
-
Deep Mining Generation of Lung Cancer Malignancy Models from Chest X-ray Images
Authors:
Michael J. Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Manoranjan Paul,
Douglas P. S. Gomes,
Anwaar Ul-Haq
Abstract:
Lung cancer is the leading cause of cancer death and morbidity worldwide. Many studies have shown machine learning models to be effective at detecting lung nodules from chest X-ray images. However, these techniques have yet to be embraced by the medical community due to several practical, ethical, and regulatory constraints stemming from the black-box nature of deep learning models. Additionally,…
▽ More
Lung cancer is the leading cause of cancer death and morbidity worldwide. Many studies have shown machine learning models to be effective at detecting lung nodules from chest X-ray images. However, these techniques have yet to be embraced by the medical community due to several practical, ethical, and regulatory constraints stemming from the black-box nature of deep learning models. Additionally, most lung nodules visible on chest X-ray are benign; therefore, the narrow task of computer vision-based lung nodule detection cannot be equated to automated lung cancer detection. Addressing both concerns, this study introduces a novel hybrid deep learning and decision tree-based computer vision model which presents lung cancer malignancy predictions as interpretable decision trees. The deep learning component of this process is trained using a large publicly available dataset on pathological biomarkers associated with lung cancer. These models are then used to inference biomarker scores for chest X-ray images from two, independent data sets for which malignancy metadata is available. We mine multi-variate predictive models by fitting shallow decision trees to the malignancy stratified datasets and interrogate a range of metrics to determine the best model. Our best decision tree model achieves sensitivity and specificity of 86.7% and 80.0% respectively with a positive predictive value of 92.9%. Decision trees mined using this method may be considered as a starting point for refinement into clinically useful multi-variate lung cancer malignancy models for implementation as a workflow augmentation tool to improve the efficiency of human radiologists.
△ Less
Submitted 29 August, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.