Search | arXiv e-print repository

BdSLW60: A Word-Level Bangla Sign Language Dataset

Authors: Husne Ara Rubaiyeat, Hasan Mahmud, Ahsan Habib, Md. Kamrul Hasan

Abstract: Sign language discourse is an essential mode of daily communication for the deaf and hard-of-hearing people. However, research on Bangla Sign Language (BdSL) faces notable limitations, primarily due to the lack of datasets. Recognizing wordlevel signs in BdSL (WL-BdSL) presents a multitude of challenges, including the need for well-annotated datasets, capturing the dynamic nature of sign gestures… ▽ More Sign language discourse is an essential mode of daily communication for the deaf and hard-of-hearing people. However, research on Bangla Sign Language (BdSL) faces notable limitations, primarily due to the lack of datasets. Recognizing wordlevel signs in BdSL (WL-BdSL) presents a multitude of challenges, including the need for well-annotated datasets, capturing the dynamic nature of sign gestures from facial or hand landmarks, develo** suitable machine learning or deep learning-based models with substantial video samples, and so on. In this paper, we address these challenges by creating a comprehensive BdSL word-level dataset named BdSLW60 in an unconstrained and natural setting, allowing positional and temporal variations and allowing sign users to change hand dominance freely. The dataset encompasses 60 Bangla sign words, with a significant scale of 9307 video trials provided by 18 signers under the supervision of a sign language professional. The dataset was rigorously annotated and cross-checked by 60 annotators. We also introduced a unique approach of a relative quantization-based key frame encoding technique for landmark based sign gesture recognition. We report the benchmarking of our BdSLW60 dataset using the Support Vector Machine (SVM) with testing accuracy up to 67.6% and an attention-based bi-LSTM with testing accuracy up to 75.1%. The dataset is available at https://www.kaggle.com/datasets/hasaniut/bdslw60 and the code base is accessible from https://github.com/hasanssl/BdSLW60_Code. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2310.15693 [pdf, other]

Towards Automated Recipe Genre Classification using Semi-Supervised Learning

Authors: Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan, Hasan Mahmud

Abstract: Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains t… ▽ More Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2309.00831 [pdf, other]

Multi-scale, Data-driven and Anatomically Constrained Deep Learning Image Registration for Adult and Fetal Echocardiography

Authors: Md. Kamrul Hasan, Haobo Zhu, Guang Yang, Choon Hwai Yap

Abstract: Temporal echocardiography image registration is a basis for clinical quantifications such as cardiac motion estimation, myocardial strain assessments, and stroke volume quantifications. In past studies, deep learning image registration (DLIR) has shown promising results and is consistently accurate and precise, requiring less computational time. We propose that a greater focus on the warped moving… ▽ More Temporal echocardiography image registration is a basis for clinical quantifications such as cardiac motion estimation, myocardial strain assessments, and stroke volume quantifications. In past studies, deep learning image registration (DLIR) has shown promising results and is consistently accurate and precise, requiring less computational time. We propose that a greater focus on the warped moving image's anatomic plausibility and image quality can support robust DLIR performance. Further, past implementations have focused on adult echocardiography, and there is an absence of DLIR implementations for fetal echocardiography. We propose a framework that combines three strategies for DLIR in both fetal and adult echo: (1) an anatomic shape-encoded loss to preserve physiological myocardial and left ventricular anatomical topologies in warped images; (2) a data-driven loss that is trained adversarially to preserve good image texture features in warped images; and (3) a multi-scale training scheme of a data-driven and anatomically constrained algorithm to improve accuracy. Our tests show that good anatomical topology and image textures are strongly linked to shape-encoded and data-driven adversarial losses. They improve different aspects of registration performance in a non-overlap** way, justifying their combination. Despite fundamental distinctions between adult and fetal echo images, we show that these strategies can provide excellent registration results in both adult and fetal echocardiography using the publicly available CAMUS adult echo dataset and our private multi-demographic fetal echo dataset. Our approach outperforms traditional non-DL gold standard registration approaches, including Optical Flow and Elastix. Registration improvements could be translated to more accurate and precise clinical quantification of cardiac ejection fraction, demonstrating a potential for translation. △ Less

Submitted 11 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

Comments: Our data-driven and anatomically constrained DLIR method's source code will be publicly available at https://github.com/kamruleee51/DdC-AC-DLIR

arXiv:2306.13899 [pdf, other]

Math Word Problem Solving by Generating Linguistic Variants of Problem Statements

Authors: Syed Rifat Raiyan, Md. Nafis Faiyaz, Shah Md. Jawad Kabir, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan

Abstract: The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correla… ▽ More The art of mathematical reasoning stands as a fundamental pillar of intellectual progress and is a central catalyst in cultivating human ingenuity. Researchers have recently published a plethora of works centered around the task of solving Math Word Problems (MWP) $-$ a crucial stride towards general AI. These existing models are susceptible to dependency on shallow heuristics and spurious correlations to derive the solution expressions. In order to ameliorate this issue, in this paper, we propose a framework for MWP solvers based on the generation of linguistic variants of the problem text. The approach involves solving each of the variant problems and electing the predicted expression with the majority of the votes. We use DeBERTa (Decoding-enhanced BERT with disentangled attention) as the encoder to leverage its rich textual representations and enhanced mask decoder to construct the solution expressions. Furthermore, we introduce a challenging dataset, $\mathrm{P\small{ARA}\normalsize{MAWPS}}$, consisting of paraphrased, adversarial, and inverse variants of selectively sampled MWPs from the benchmark $\mathrm{M\small{AWPS}}$ dataset. We extensively experiment on this dataset along with other benchmark datasets using some baseline MWP solver models. We show that training on linguistic variants of problem statements and voting on candidate predictions improve the mathematical reasoning and robustness of the model. We make our code and data publicly available. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: Accepted in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL-SRW 2023), 17 pages, 2 figures, 7 tables

arXiv:2305.06595 [pdf]

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Authors: Mohsinul Kabir, Obayed Bin Mahfuz, Syed Rifat Raiyan, Hasan Mahmud, Md Kamrul Hasan

Abstract: The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present Ban… ▽ More The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BanglaBook, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook. △ Less

Submitted 8 June, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted in Findings of the Association for Computational Linguistics: ACL 2023

arXiv:2305.01044

Venn Diagram Multi-label Class Interpretation of Diabetic Foot Ulcer with Color and Sharpness Enhancement

Authors: Md Mahamudul Hasan, Moi Hoon Yap, Md Kamrul Hasan

Abstract: DFU is a severe complication of diabetes that can lead to amputation of the lower limb if not treated properly. Inspired by the 2021 Diabetic Foot Ulcer Grand Challenge, researchers designed automated multi-class classification of DFU, including infection, ischaemia, both of these conditions, and none of these conditions. However, it remains a challenge as classification accuracy is still not sati… ▽ More DFU is a severe complication of diabetes that can lead to amputation of the lower limb if not treated properly. Inspired by the 2021 Diabetic Foot Ulcer Grand Challenge, researchers designed automated multi-class classification of DFU, including infection, ischaemia, both of these conditions, and none of these conditions. However, it remains a challenge as classification accuracy is still not satisfactory. This paper proposes a Venn Diagram interpretation of multi-label CNN-based method, utilizing different image enhancement strategies, to improve the multi-class DFU classification. We propose to reduce the four classes into two since both class wounds can be interpreted as the simultaneous occurrence of infection and ischaemia and none class wounds as the absence of infection and ischaemia. We introduce a novel Venn Diagram representation block in the classifier to interpret all four classes from these two classes. To make our model more resilient, we propose enhancing the perceptual quality of DFU images, particularly blurry or inconsistently lit DFU images, by performing color and sharpness enhancements on them. We also employ a fine-tuned optimization technique, adaptive sharpness aware minimization, to improve the CNN model generalization performance. The proposed method is evaluated on the test dataset of DFUC2021, containing 5,734 images and the results are compared with the top-3 winning entries of DFUC2021. Our proposed approach outperforms these existing approaches and achieves Macro-Average F1, Recall and Precision scores of 0.6592, 0.6593, and 0.6652, respectively.Additionally, We perform ablation studies and image quality measurements to further interpret our proposed method. This proposed method will benefit patients with DFUs since it tackles the inconsistencies in captured images and can be employed for a more robust remote DFU wound classification. △ Less

Submitted 5 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: The Paper is not complete, more modifications are needed

arXiv:2303.16778 [pdf, other]

doi 10.1007/978-3-031-34622-4_15

Assorted, Archetypal and Annotated Two Million (3A2M) Cooking Recipes Dataset based on Active Learning

Authors: Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan, Hasan Mahmud

Abstract: Cooking recipes allow individuals to exchange culinary ideas and provide food preparation instructions. Due to a lack of adequate labeled data, categorizing raw recipes found online to the appropriate food genres is a challenging task in this domain. Utilizing the knowledge of domain experts to categorize recipes could be a solution. In this study, we present a novel dataset of two million culinar… ▽ More Cooking recipes allow individuals to exchange culinary ideas and provide food preparation instructions. Due to a lack of adequate labeled data, categorizing raw recipes found online to the appropriate food genres is a challenging task in this domain. Utilizing the knowledge of domain experts to categorize recipes could be a solution. In this study, we present a novel dataset of two million culinary recipes labeled in respective categories leveraging the knowledge of food experts and an active learning technique. To construct the dataset, we collect the recipes from the RecipeNLG dataset. Then, we employ three human experts whose trustworthiness score is higher than 86.667% to categorize 300K recipe by their Named Entity Recognition (NER) and assign it to one of the nine categories: bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides and fusion. Finally, we categorize the remaining 1900K recipes using Active Learning method with a blend of Query-by-Committee and Human In The Loop (HITL) approaches. There are more than two million recipes in our dataset, each of which is categorized and has a confidence score linked with it. For the 9 genres, the Fleiss Kappa score of this massive dataset is roughly 0.56026. We believe that the research community can use this dataset to perform various machine learning tasks such as recipe genre classification, recipe generation of a specific genre, new recipe creation, etc. The dataset can also be used to train and evaluate the performance of various NLP tasks such as named entity recognition, part-of-speech tagging, semantic role labeling, and so on. The dataset will be available upon publication: https://tinyurl.com/3zu4778y. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Journal ref: International Conference on Machine Intelligence and Emerging Technologies. MIET 2022. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 491, pp 188-203, Springer, Cham

arXiv:2303.15430 [pdf, other]

TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models

Authors: Md Kamrul Hasan, Md Saiful Islam, Sangwu Lee, Wasifur Rahman, Iftekhar Naim, Mohammed Ibrahim Khan, Ehsan Hoque

Abstract: Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly… ▽ More Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly increases the model complexity, and makes the training process data-hungry. While an enormous amount of text data is available via the web, collecting large-scale multimodal behavioral video datasets is extremely expensive, both in terms of time and money. In this paper, we investigate whether large language models alone can successfully incorporate non-verbal information when they are presented in textual form. We present a way to convert the acoustic and visual information into corresponding textual descriptions and concatenate them with the spoken text. We feed this augmented input to a pre-trained BERT model and fine-tune it on three downstream multimodal tasks: sentiment, humor, and sarcasm detection. Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance. We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks, particularly in a low-resource setting. △ Less

Submitted 29 March, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2212.11486 [pdf, other]

Over-the-Air Federated Learning with Enhanced Privacy

Authors: Xiaochan Xue, Moh Khalid Hasan, Shucheng Yu, Laxima Niure Kandel, Min Song

Abstract: Federated learning (FL) has emerged as a promising learning paradigm in which only local model parameters (gradients) are shared. Private user data never leaves the local devices thus preserving data privacy. However, recent research has shown that even when local data is never shared by a user, exchanging model parameters without protection can also leak private information. Moreover, in wireless… ▽ More Federated learning (FL) has emerged as a promising learning paradigm in which only local model parameters (gradients) are shared. Private user data never leaves the local devices thus preserving data privacy. However, recent research has shown that even when local data is never shared by a user, exchanging model parameters without protection can also leak private information. Moreover, in wireless systems, the frequent transmission of model parameters can cause tremendous bandwidth consumption and network congestion when the model is large. To address this problem, we propose a new FL framework with efficient over-the-air parameter aggregation and strong privacy protection of both user data and models. We achieve this by introducing pairwise cancellable random artificial noises (PCR-ANs) on end devices. As compared to existing over-the-air computation (AirComp) based FL schemes, our design provides stronger privacy protection. We analytically show the secrecy capacity and the convergence rate of the proposed wireless FL aggregation algorithm. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: 6 pages

arXiv:2210.04483 [pdf, other]

Auxilio: A Sensor-Based Wireless Head-Mounted Mouse for People with Upper Limb Disability

Authors: Mohammad Ridwan Kabir, Mohammad Ishrak Abedin, Rizvi Ahmed, Saad Bin Ashraf, Hasan Mahmud, Md. Kamrul Hasan

Abstract: Upper limb disability may be caused either due to accidents, neurological disorders, or even birth defects, imposing limitations and restrictions on the interaction with a computer for the concerned individuals using a generic optical mouse. Our work proposes the design and development of a working prototype of a sensor-based wireless head-mounted Assistive Mouse Controller (AMC), Auxilio, facilit… ▽ More Upper limb disability may be caused either due to accidents, neurological disorders, or even birth defects, imposing limitations and restrictions on the interaction with a computer for the concerned individuals using a generic optical mouse. Our work proposes the design and development of a working prototype of a sensor-based wireless head-mounted Assistive Mouse Controller (AMC), Auxilio, facilitating interaction with a computer for people with upper limb disability. Combining commercially available, low-cost motion and infrared sensors, Auxilio solely utilizes head and cheek movements for mouse control. Its performance has been juxtaposed with that of a generic optical mouse in different pointing tasks as well as in ty** tasks, using a virtual keyboard. Furthermore, our work also analyzes the usability of Auxilio, featuring the System Usability Scale. The results of different experiments reveal the practicality and effectiveness of Auxilio as a head-mounted AMC for empowering the upper limb disabled community. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 28 pages, 9 figures, 5 tables

arXiv:2209.08807 [pdf, other]

A Deep Learning Approach for Parallel Imaging and Compressed Sensing MRI Reconstruction

Authors: Farhan Sadik, Md. Kamrul Hasan

Abstract: Parallel imaging accelerates MRI data acquisition by acquiring additional sensitivity information with an array of receiver coils, resulting in fewer phase encoding steps. Because of fewer data requirements than parallel imaging, compressed sensing magnetic resonance imaging (CS-MRI) has gained popularity in the field of medical imaging. Parallel imaging and compressed sensing (CS) both reduce the… ▽ More Parallel imaging accelerates MRI data acquisition by acquiring additional sensitivity information with an array of receiver coils, resulting in fewer phase encoding steps. Because of fewer data requirements than parallel imaging, compressed sensing magnetic resonance imaging (CS-MRI) has gained popularity in the field of medical imaging. Parallel imaging and compressed sensing (CS) both reduce the amount of data captured in the k-space, which speeds up traditional MRI acquisition. As acquisition time is inversely proportional to sample count, forming an image from reduced k-space samples results in faster acquisition but with aliasing artifacts. For de-aliasing the reconstructed image, this paper proposes a novel Generative Adversarial Network (GAN) called RECGAN-GR that is supervised with multi-modal losses. In comparison to existing GAN networks, our proposed method introduces a novel generator network, RemU-Net, which is integrated with dual-domain loss functions such as weighted magnitude and phase loss functions, as well as parallel imaging-based loss, GRAPPA consistency loss. As refinement learning, a k-space correction block is proposed to make the GAN network self-resistant to generating unnecessary data, which speeds up the reconstruction process. Comprehensive results show that the proposed RECGAN-GR not only improves the PSNR by 4 dB over GAN-based methods but also by 2 dB over conventional state-of-the-art CNN methods available in the literature for single-coil data. The proposed work significantly improves image quality for low-retained data, resulting in five to ten times faster acquisition. △ Less

Submitted 17 December, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: 13 pages, 11 figures

arXiv:2208.12232 [pdf, other]

doi 10.1016/j.compbiomed.2023.106624

A survey, review, and future trends of skin lesion segmentation and classification

Authors: Md. Kamrul Hasan, Md. Asif Ahamad, Choon Hwai Yap, Guang Yang

Abstract: The Computer-aided Diagnosis or Detection (CAD) approach for skin lesion analysis is an emerging field of research that has the potential to alleviate the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in develo** such CAD systems, with the intention of providing a user-friendly tool to dermatologists to reduce the challenges encountered or asso… ▽ More The Computer-aided Diagnosis or Detection (CAD) approach for skin lesion analysis is an emerging field of research that has the potential to alleviate the burden and cost of skin cancer screening. Researchers have recently indicated increasing interest in develo** such CAD systems, with the intention of providing a user-friendly tool to dermatologists to reduce the challenges encountered or associated with manual inspection. This article aims to provide a comprehensive literature survey and review of a total of 594 publications (356 for skin lesion segmentation and 238 for skin lesion classification) published between 2011 and 2022. These articles are analyzed and summarized in a number of different ways to contribute vital information regarding the methods for the development of CAD systems. These ways include relevant and essential definitions and theories, input data (dataset utilization, preprocessing, augmentations, and fixing imbalance problems), method configuration (techniques, architectures, module frameworks, and losses), training tactics (hyperparameter settings), and evaluation criteria. We intend to investigate a variety of performance-enhancing approaches, including ensemble and post-processing. We also discuss these dimensions to reveal their current trends based on utilization frequencies. In addition, we highlight the primary difficulties associated with evaluating skin lesion segmentation and classification systems using minimal datasets, as well as the potential solutions to these difficulties. Findings, recommendations, and trends are disclosed to inform future research on develo** an automated and robust CAD system for skin lesion analysis. △ Less

Submitted 2 February, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

Comments: This manuscript has been accepted to be published in Computers in Biology and Medicine and has a total of 106 pages (single column and double spacing), 13 figures, and 11 tables

Journal ref: Computers in biology and medicine (2023): 106624

arXiv:2203.08490 [pdf, other]

Learning Audio Representations with MLPs

Authors: Mashrur M. Morshed, Ahmad Omar Ahsan, Hasan Mahmud, Md. Kamrul Hasan

Abstract: In this paper, we propose an efficient MLP-based approach for learning audio representations, namely timestamp and scene-level audio embeddings. We use an encoder consisting of sequentially stacked gated MLP blocks, which accept 2D MFCCs as inputs. In addition, we also provide a simple temporal interpolation-based algorithm for computing scene-level embeddings from timestamp embeddings. The audio… ▽ More In this paper, we propose an efficient MLP-based approach for learning audio representations, namely timestamp and scene-level audio embeddings. We use an encoder consisting of sequentially stacked gated MLP blocks, which accept 2D MFCCs as inputs. In addition, we also provide a simple temporal interpolation-based algorithm for computing scene-level embeddings from timestamp embeddings. The audio representations generated by our method are evaluated across a diverse set of benchmarks at the Holistic Evaluation of Audio Representations (HEAR) challenge, hosted at the NeurIPS 2021 competition track. We achieved first place on the Speech Commands (full), Speech Commands (5 hours), and the Mridingham Tonic benchmarks. Furthermore, our approach is also the most resource-efficient among all the submitted methods, in terms of both the number of model parameters and the time required to compute embeddings. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: In submission to Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

arXiv:2202.06128 [pdf, other]

Grasp-and-Lift Detection from EEG Signal Using Convolutional Neural Network

Authors: Md. Kamrul Hasan, Sifat Redwan Wahid, Faria Rahman, Shanjida Khan Maliha, Sauda Binte Rahman

Abstract: People undergoing neuromuscular dysfunctions and amputated limbs require automatic prosthetic appliances. In develo** such prostheses, the precise detection of brain motor actions is imperative for the Grasp-and-Lift (GAL) tasks. Because of the low-cost and non-invasive essence of Electroencephalography (EEG), it is widely preferred for detecting motor actions during the controls of prosthetic t… ▽ More People undergoing neuromuscular dysfunctions and amputated limbs require automatic prosthetic appliances. In develo** such prostheses, the precise detection of brain motor actions is imperative for the Grasp-and-Lift (GAL) tasks. Because of the low-cost and non-invasive essence of Electroencephalography (EEG), it is widely preferred for detecting motor actions during the controls of prosthetic tools. This article has automated the hand movement activity viz GAL detection method from the 32-channel EEG signals. The proposed pipeline essentially combines preprocessing and end-to-end detection steps, eliminating the requirement of hand-crafted feature engineering. Preprocessing action consists of raw signal denoising, using either Discrete Wavelet Transform (DWT) or highpass or bandpass filtering and data standardization. The detection step consists of Convolutional Neural Network (CNN)- or Long Short Term Memory (LSTM)-based model. All the investigations utilize the publicly available WAY-EEG-GAL dataset, having six different GAL events. The best experiment reveals that the proposed framework achieves an average area under the ROC curve of 0.944, employing the DWT-based denoising filter, data standardization, and CNN-based detection model. The obtained outcome designates an excellent achievement of the introduced method in detecting GAL events from the EEG signals, turning it applicable to prosthetic appliances, brain-computer interfaces, robotic arms, etc. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: Accepted in https://icaeee2022.com/

arXiv:2202.02587 [pdf]

doi 10.1109/ACCESS.2022.3187969

VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam

Authors: Shahed Anzarus Sabab, Mohammad Ridwan Kabir, Sayed Rizban Hussain, Hasan Mahmud, Md. Kamrul Hasan, Husne Ara Rubaiyeat

Abstract: Human intention is an internal, mental characterization for acquiring desired information. From interactive interfaces containing either textual or graphical information, intention to perceive desired information is subjective and strongly connected with eye gaze. In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam. We extracted unique feat… ▽ More Human intention is an internal, mental characterization for acquiring desired information. From interactive interfaces containing either textual or graphical information, intention to perceive desired information is subjective and strongly connected with eye gaze. In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam. We extracted unique features (e.g., Fixation Count, Eye Movement Ratio) from the eye gaze data of 31 participants to generate a dataset containing 124 samples of visual intention for perceiving textual or graphical information, labeled as either TEXT or IMAGE, having 48.39% and 51.61% distribution, respectively. Using this dataset, we analyzed 5 classifiers, including Support Vector Machine (SVM) (Accuracy: 92.19%). Using the trained SVM, we investigated the variation of visual intention among 30 participants, distributed in 3 age groups, and found out that young users were more leaned towards graphical contents whereas older adults felt more interested in textual ones. This finding suggests that real-time eye gaze data can be a potential source of identifying visual intention, analyzing which intention aware interactive interfaces can be designed and developed to facilitate human cognition. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: 15 pages, 9 figures, 4 tables

ACM Class: I.4; I.5.2

arXiv:2201.00458 [pdf, other]

Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark

Authors: Parnian Afshar, Arash Mohammadi, Konstantinos N. Plataniotis, Keyvan Farahani, Justin Kirby, Anastasia Oikonomou, Amir Asif, Leonard Wee, Andre Dekker, Xin Wu, Mohammad Ariful Haque, Shahruk Hossain, Md. Kamrul Hasan, Uday Kamal, Winston Hsu, Jhih-Yuan Lin, M. Sohel Rahman, Nabil Ibtehaz, Sh. M. Amir Foisol, Kin-Man Lam, Zhong Guang, Runze Zhang, Sumohana S. Channappayya, Shashank Gupta, Chander Dev

Abstract: Lung cancer is one of the deadliest cancers, and in part its effective diagnosis and treatment depend on the accurate delineation of the tumor. Human-centered segmentation, which is currently the most common approach, is subject to inter-observer variability, and is also time-consuming, considering the fact that only experts are capable of providing annotations. Automatic and semi-automatic tumor… ▽ More Lung cancer is one of the deadliest cancers, and in part its effective diagnosis and treatment depend on the accurate delineation of the tumor. Human-centered segmentation, which is currently the most common approach, is subject to inter-observer variability, and is also time-consuming, considering the fact that only experts are capable of providing annotations. Automatic and semi-automatic tumor segmentation methods have recently shown promising results. However, as different researchers have validated their algorithms using various datasets and performance metrics, reliably evaluating these methods is still an open challenge. The goal of the Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark created through 2018 IEEE Video and Image Processing (VIP) Cup competition, is to provide a unique dataset and pre-defined metrics, so that different researchers can develop and evaluate their methods in a unified fashion. The 2018 VIP Cup started with a global engagement from 42 countries to access the competition data. At the registration stage, there were 129 members clustered into 28 teams from 10 countries, out of which 9 teams made it to the final stage and 6 teams successfully completed all the required tasks. In a nutshell, all the algorithms proposed during the competition, are based on deep learning models combined with a false positive reduction technique. Methods developed by the three finalists show promising results in tumor segmentation, however, more effort should be put into reducing the false positive rate. This competition manuscript presents an overview of the VIP-Cup challenge, along with the proposed algorithms and results. △ Less

Submitted 2 January, 2022; originally announced January 2022.

arXiv:2111.10776 [pdf]

A Case Study on the Independence of Speech Emotion Recognition in Bangla and English Languages using Language-Independent Prosodic Features

Authors: Fardin Saad, Hasan Mahmud, Mohammad Ridwan Kabir, Md. Alamin Shaheen, Paresha Farastu, Md. Kamrul Hasan

Abstract: A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neut… ▽ More A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we performed a step-by-step comparative analysis of Speech Emotion Recognition (SER) using Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. Six emotions were categorized for this study, such as - happy, angry, neutral, sad, disgust, and fear. We employed three Emotional Speech Sets (ESS), of which the first two were developed by native Bengali speakers in Bangla and English languages separately. The third was a subset of the Toronto Emotional Speech Set (TESS), which was developed by native English speakers from Canada. We carefully selected language-independent prosodic features, adopted a Support Vector Machine (SVM) model, and conducted three experiments to carry out our proposition. In the first experiment, we measured the performance of the three speech sets individually, followed by the second experiment, where different ESS pairs were integrated to analyze the impact on SER. Finally, we measured the recognition rate by training and testing the model with different speech sets in the third experiment. Although this study reveals that SER in Bangla and English languages is mostly language-independent, some disparities were observed while recognizing emotional states like disgust and fear in these two languages. Moreover, our investigations revealed that non-native speakers convey emotions through speech, much like expressing themselves in their native tongue. △ Less

Submitted 13 May, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

Comments: 13 pages [currently under review]

arXiv:2111.00601 [pdf]

Explainable Artificial Intelligence for Smart City Application: A Secure and Trusted Platform

Authors: M. Humayn Kabir, Khondokar Fida Hasan, Mohammad Kamrul Hasan, Keyvan Ansari

Abstract: Artificial Intelligence (AI) is one of the disruptive technologies that is sha** the future. It has growing applications for data-driven decisions in major smart city solutions, including transportation, education, healthcare, public governance, and power systems. At the same time, it is gaining popularity in protecting critical cyber infrastructure from cyber threats, attacks, damages, or unaut… ▽ More Artificial Intelligence (AI) is one of the disruptive technologies that is sha** the future. It has growing applications for data-driven decisions in major smart city solutions, including transportation, education, healthcare, public governance, and power systems. At the same time, it is gaining popularity in protecting critical cyber infrastructure from cyber threats, attacks, damages, or unauthorized access. However, one of the significant issues of those traditional AI technologies (e.g., deep learning) is that the rapid progress in complexity and sophistication propelled and turned out to be uninterpretable black boxes. On many occasions, it is very challenging to understand the decision and bias to control and trust systems' unexpected or seemingly unpredictable outputs. It is acknowledged that the loss of control over interpretability of decision-making becomes a critical issue for many data-driven automated applications. But how may it affect the system's security and trustworthiness? This chapter conducts a comprehensive study of machine learning applications in cybersecurity to indicate the need for explainability to address this question. While doing that, this chapter first discusses the black-box problems of AI technologies for Cybersecurity applications in smart city-based solutions. Later, considering the new technological paradigm, Explainable Artificial Intelligence (XAI), this chapter discusses the transition from black-box to white-box. This chapter also discusses the transition requirements concerning the interpretability, transparency, understandability, and Explainability of AI-based technologies in applying different autonomous systems in smart cities. Finally, it has presented some commercial XAI platforms that offer explainability over traditional AI technologies before presenting future challenges and opportunities. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: Book_Chapter, Springer Nature

arXiv:2109.07702 [pdf, other]

A Multi-Task Cross-Task Learning Architecture for Ad-hoc Uncertainty Estimation in 3D Cardiac MRI Image Segmentation

Authors: S. M. Kamrul Hasan, Cristian A. Linte

Abstract: Medical image segmentation has significantly benefitted thanks to deep learning architectures. Furthermore, semi-supervised learning (SSL) has recently been a growing trend for improving a model's overall performance by leveraging abundant unlabeled data. Moreover, learning multiple tasks within the same model further improves model generalizability. To generate smoother and accurate segmentation… ▽ More Medical image segmentation has significantly benefitted thanks to deep learning architectures. Furthermore, semi-supervised learning (SSL) has recently been a growing trend for improving a model's overall performance by leveraging abundant unlabeled data. Moreover, learning multiple tasks within the same model further improves model generalizability. To generate smoother and accurate segmentation masks from 3D cardiac MR images, we present a Multi-task Cross-task learning consistency approach to enforce the correlation between the pixel-level (segmentation) and the geometric-level (distance map) tasks. Our extensive experimentation with varied quantities of labeled data in the training sets justifies the effectiveness of our model for the segmentation of the left atrial cavity from Gadolinium-enhanced magnetic resonance (GE-MR) images. With the incorporation of uncertainty estimates to detect failures in the segmentation masks generated by CNNs, our study further showcases the potential of our model to flag low-quality segmentation from a given model. △ Less

Submitted 2 October, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: Accepted to 2021 Computing in Cardiology (CinC); Code is available at https://github.com/SMKamrulHasan/MTCTL

arXiv:2109.04791 [pdf]

doi 10.1109/ACCESS.2022.3151696

ANTASID: A Novel Temporal Adjustment to Shannon's Index of Difficulty for Quantifying the Perceived Difficulty of Uncontrolled Pointing Tasks

Authors: Mohammad Ridwan Kabir, Mohammad Ishrak Abedin, Rizvi Ahmed, Hasan Mahmud, Md. Kamrul Hasan

Abstract: Shannon's Index of Difficulty ($ID$), reputable for quantifying the perceived difficulty of pointing tasks as a logarithmic relationship between movement-amplitude ($A$) and target-width ($W$), is used for modelling the corresponding observed movement-times ($MT_O$) in such tasks in controlled experimental setup. However, real-life pointing tasks are both spatially and temporally uncontrolled, bei… ▽ More Shannon's Index of Difficulty ($ID$), reputable for quantifying the perceived difficulty of pointing tasks as a logarithmic relationship between movement-amplitude ($A$) and target-width ($W$), is used for modelling the corresponding observed movement-times ($MT_O$) in such tasks in controlled experimental setup. However, real-life pointing tasks are both spatially and temporally uncontrolled, being influenced by factors such as - human aspects, subjective behavior, the context of interaction, the inherent speed-accuracy trade-off where, emphasizing accuracy compromises speed of interaction and vice versa, and so on. Effective target-width ($W_e$) is considered as spatial adjustment for compensating accuracy. However, no significant adjustment exists in the literature for compensating speed in different contexts of interaction in these tasks. As a result, without any temporal adjustment, the true difficulty of an uncontrolled pointing task may be inaccurately quantified using Shannon's ID. To verify this, we propose the ANTASID (A Novel Temporal Adjustment to Shannon's ID) formulation with detailed performance analysis. We hypothesized a temporal adjustment factor ($t$) as a binary logarithm of $MT_O$, compensating for speed due to contextual differences and minimizing the non-linearity between movement-amplitude and target-width. Considering spatial and/or temporal adjustments to ID, we conducted regression analysis using our own and Benchmark datasets in both controlled and uncontrolled scenarios of pointing tasks with a generic mouse.ANTASID formulation showed significantly superior fitness values and throughput in all the scenarios while reducing the standard error. Furthermore, the quantification of ID with ANTASID varied significantly compared to the classical formulations of Shannon's ID, validating the purpose of this study. △ Less

Submitted 29 December, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

Comments: 14 pages, 7 figures, 7 tables

ACM Class: G.3; H.1.2; H.5.2

arXiv:2109.03631 [pdf, other]

Renovo: Prototype of a Low-Cost Sensor-Based Therapeutic System for Upper Limb Rehabilitation

Authors: Mohammad Ridwan Kabir, Mohammad Anas Jawad, Mohaimin Ehsan, Hasan Mahmud, Md. Kamrul Hasan

Abstract: Stroke patients with Upper Limb Disability (ULD) are re-acclimated to their lost motor capability through therapeutic interventions, following assessment by Physiotherapists (PTs) using various qualitative assessment protocols. However, the assessments are often biased and prone to errors. Real-time visualization and quantitative analysis of various Performance Metrics (PMs) of patient's motion da… ▽ More Stroke patients with Upper Limb Disability (ULD) are re-acclimated to their lost motor capability through therapeutic interventions, following assessment by Physiotherapists (PTs) using various qualitative assessment protocols. However, the assessments are often biased and prone to errors. Real-time visualization and quantitative analysis of various Performance Metrics (PMs) of patient's motion data, such as - Range of Motion (RoM), Repetition Rate (RR), Velocity (V), etc., may be vital for proper assessment. In this study, we present Renovo, a wearable inertial sensor-based therapeutic system, which assists PTs with real-time visualization and quantitative patient assessment, while providing patients with progress feedback. We showcase the results of a three-week pilot study on the rehabilitation of ULD patients (N=16), in 3 successive sessions at one-week interval, following evaluation both by Renovo and PTs (N=5). Results suggest that sensor-based quantitative assessment reduces the possibility of human error and bias, enhancing efficiency of rehabilitation. △ Less

Submitted 17 October, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 27 pages, 10 figures, 5 tables

arXiv:2107.02543 [pdf, other]

A Deep Learning-based Multimodal Depth-Aware Dynamic Hand Gesture Recognition System

Authors: Hasan Mahmud, Mashrur M. Morshed, Md. Kamrul Hasan

Abstract: The dynamic hand gesture recognition task has seen studies on various unimodal and multimodal methods. Previously, researchers have explored depth and 2D-skeleton-based multimodal fusion CRNNs (Convolutional Recurrent Neural Networks) but have had limitations in getting expected recognition results. In this paper, we revisit this approach to hand gesture recognition and suggest several improvement… ▽ More The dynamic hand gesture recognition task has seen studies on various unimodal and multimodal methods. Previously, researchers have explored depth and 2D-skeleton-based multimodal fusion CRNNs (Convolutional Recurrent Neural Networks) but have had limitations in getting expected recognition results. In this paper, we revisit this approach to hand gesture recognition and suggest several improvements. We observe that raw depth images possess low contrast in the hand regions of interest (ROI). They do not highlight important fine details, such as finger orientation, overlap between the finger and palm, or overlap between multiple fingers. We thus propose quantizing the depth values into several discrete regions, to create a higher contrast between several key parts of the hand. In addition, we suggest several ways to tackle the high variance problem in existing multimodal fusion CRNN architectures. We evaluate our method on two benchmarks: the DHG-14/28 dataset and the SHREC'17 track dataset. Our approach shows a significant improvement in accuracy and parameter efficiency over previous similar multimodal methods, with a comparable result to the state-of-the-art. △ Less

Submitted 5 November, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

arXiv:2105.14875 [pdf, other]

doi 10.1109/ACCESS.2022.3165563

Bangla Natural Language Processing: A Comprehensive Analysis of Classical, Machine Learning, and Deep Learning Based Methods

Authors: Ovishake Sen, Mohtasim Fuad, MD. Nazrul Islam, Jakaria Rabbi, Mehedi Masud, MD. Kamrul Hasan, Md. Abdul Awal, Awal Ahmed Fime, Md. Tahmid Hasan Fuad, Delowar Sikder, MD. Akil Raihan Iftee

Abstract: The Bangla language is the seventh most spoken language, with 265 million native and non-native speakers worldwide. However, English is the predominant language for online resources and technical knowledge, journals, and documentation. Consequently, many Bangla-speaking people, who have limited command of English, face hurdles to utilize English resources. To bridge the gap between limited support… ▽ More The Bangla language is the seventh most spoken language, with 265 million native and non-native speakers worldwide. However, English is the predominant language for online resources and technical knowledge, journals, and documentation. Consequently, many Bangla-speaking people, who have limited command of English, face hurdles to utilize English resources. To bridge the gap between limited support and increasing demand, researchers conducted many experiments and developed valuable tools and techniques to create and process Bangla language materials. Many efforts are also ongoing to make it easy to use the Bangla language in the online and technical domains. There are some review papers to understand the past, previous, and future Bangla Natural Language Processing (BNLP) trends. The studies are mainly concentrated on the specific domains of BNLP, such as sentiment analysis, speech recognition, optical character recognition, and text summarization. There is an apparent scarcity of resources that contain a comprehensive review of the recent BNLP tools and methods. Therefore, in this paper, we present a thorough analysis of 75 BNLP research papers and categorize them into 11 categories, namely Information Extraction, Machine Translation, Named Entity Recognition, Parsing, Parts of Speech Tagging, Question Answering System, Sentiment Analysis, Spam and Fake Detection, Text Summarization, Word Sense Disambiguation, and Speech Processing and Recognition. We study articles published between 1999 to 2021, and 50% of the papers were published after 2015. Furthermore, we discuss Classical, Machine Learning and Deep Learning approaches with different datasets while addressing the limitations and current and future trends of the BNLP. △ Less

Submitted 9 April, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: Accedpted in IEEE Access and it has 46 pages. Link: https://ieeexplore.ieee.org/document/9751052 (Early Access - April 10, 2022)

arXiv:2105.03995 [pdf, other]

Acute Lymphoblastic Leukemia Detection from Microscopic Images Using Weighted Ensemble of Convolutional Neural Networks

Authors: Chayan Mondal, Md. Kamrul Hasan, Md. Tasnim Jawad, Aishwariya Dutta, Md. Rabiul Islam, Md. Abdul Awal, Mohiuddin Ahmad

Abstract: Acute Lymphoblastic Leukemia (ALL) is a blood cell cancer characterized by numerous immature lymphocytes. Even though automation in ALL prognosis is an essential aspect of cancer diagnosis, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy demands experienced pathologists to carefully read the cell images, which i… ▽ More Acute Lymphoblastic Leukemia (ALL) is a blood cell cancer characterized by numerous immature lymphocytes. Even though automation in ALL prognosis is an essential aspect of cancer diagnosis, it is challenging due to the morphological correlation between malignant and normal cells. The traditional ALL classification strategy demands experienced pathologists to carefully read the cell images, which is arduous, time-consuming, and often suffers inter-observer variations. This article has automated the ALL detection task from microscopic cell images, employing deep Convolutional Neural Networks (CNNs). We explore the weighted ensemble of different deep CNNs to recommend a better ALL cell classifier. The weights for the ensemble candidate models are estimated from their corresponding metrics, such as accuracy, F1-score, AUC, and kappa values. Various data augmentations and pre-processing are incorporated for achieving a better generalization of the network. We utilize the publicly available C-NMC-2019 ALL dataset to conduct all the comprehensive experiments. Our proposed weighted ensemble model, using the kappa values of the ensemble candidates as their weights, has outputted a weighted F1-score of 88.6 %, a balanced accuracy of 86.2 %, and an AUC of 0.941 in the preliminary test set. The qualitative results displaying the gradient class activation maps confirm that the introduced model has a concentrated learned region. In contrast, the ensemble candidate models, such as Xception, VGG-16, DenseNet-121, MobileNet, and InceptionResNet-V2, separately produce coarse and scatter learned areas for most example cases. Since the proposed kappa value-based weighted ensemble yields a better result for the aimed task in this article, it can experiment in other domains of medical diagnostic applications. △ Less

Submitted 9 May, 2021; originally announced May 2021.

Comments: 31 pages, 9 figures

arXiv:2102.06169 [pdf, other]

COVID-19 identification from volumetric chest CT scans using a progressively resized 3D-CNN incorporating segmentation, augmentation, and class-rebalancing

Authors: Md. Kamrul Hasan, Md. Tasnim Jawad, Kazi Nasim Imtiaz Hasan, Sajal Basak Partha, Md. Masum Al Masba, Shumit Saha

Abstract: The novel COVID-19 is a global pandemic disease overgrowing worldwide. Computer-aided screening tools with greater sensitivity is imperative for disease diagnosis and prognosis as early as possible. It also can be a helpful tool in triage for testing and clinical supervision of COVID-19 patients. However, designing such an automated tool from non-invasive radiographic images is challenging as many… ▽ More The novel COVID-19 is a global pandemic disease overgrowing worldwide. Computer-aided screening tools with greater sensitivity is imperative for disease diagnosis and prognosis as early as possible. It also can be a helpful tool in triage for testing and clinical supervision of COVID-19 patients. However, designing such an automated tool from non-invasive radiographic images is challenging as many manually annotated datasets are not publicly available yet, which is the essential core requirement of supervised learning schemes. This article proposes a 3D Convolutional Neural Network (CNN)-based classification approach considering both the inter- and intra-slice spatial voxel information. The proposed system is trained in an end-to-end manner on the 3D patches from the whole volumetric CT images to enlarge the number of training samples, performing the ablation studies on patch size determination. We integrate progressive resizing, segmentation, augmentations, and class-rebalancing to our 3D network. The segmentation is a critical prerequisite step for COVID-19 diagnosis enabling the classifier to learn prominent lung features while excluding the outer lung regions of the CT scans. We evaluate all the extensive experiments on a publicly available dataset, named MosMed, having binary- and multi-class chest CT image partitions. Our experimental results are very encouraging, yielding areas under the ROC curve of 0.914 and 0.893 for the binary- and multi-class tasks, respectively, applying 5-fold cross-validations. Our method's promising results delegate it as a favorable aiding tool for clinical practitioners and radiologists to assess COVID-19. △ Less

Submitted 14 April, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

Comments: 33 pages

arXiv:2102.01824 [pdf, other]

Dermo-DOCTOR: A framework for concurrent skin lesion detection and recognition using a deep convolutional neural network with end-to-end dual encoders

Authors: Md. Kamrul Hasan, Shidhartho Roy, Chayan Mondal, Md. Ashraful Alam, Md. Toufick E Elahi, Aishwariya Dutta, S. M. Taslim Uddin Raju, Md. Tasnim Jawad, Mohiuddin Ahmad

Abstract: Automated skin lesion analysis for simultaneous detection and recognition is still challenging for inter-class homogeneity and intra-class heterogeneity, leading to low generic capability of a Single Convolutional Neural Network (CNN) with limited datasets. This article proposes an end-to-end deep CNN-based framework for simultaneous detection and recognition of the skin lesions, named Dermo-DOCTO… ▽ More Automated skin lesion analysis for simultaneous detection and recognition is still challenging for inter-class homogeneity and intra-class heterogeneity, leading to low generic capability of a Single Convolutional Neural Network (CNN) with limited datasets. This article proposes an end-to-end deep CNN-based framework for simultaneous detection and recognition of the skin lesions, named Dermo-DOCTOR, consisting of two encoders. The feature maps from two encoders are fused channel-wise, called Fused Feature Map (FFM). The FFM is utilized for decoding in the detection sub-network, concatenating each stage of two encoders' outputs with corresponding decoder layers to retrieve the lost spatial information due to pooling in the encoders. For the recognition sub-network, the outputs of three fully connected layers, utilizing feature maps of two encoders and FFM, are aggregated to obtain a final lesion class. We train and evaluate the proposed Dermo-Doctor utilizing two publicly available benchmark datasets, such as ISIC-2016 and ISIC-2017. The achieved segmentation results exhibit mean intersection over unions of 85.0 % and 80.0 % respectively for ISIC-2016 and ISIC-2017 test datasets. The proposed Dermo-DOCTOR also demonstrates praiseworthy success in lesion recognition, providing the areas under the receiver operating characteristic curves of 0.98 and 0.91 respectively for those two datasets. The experimental results show that the proposed Dermo-DOCTOR outperforms the alternative methods mentioned in the literature, designed for skin lesion detection and recognition. As the Dermo-DOCTOR provides better-results on two different test datasets, even with limited training data, it can be an auspicious computer-aided assistive tool for dermatologists. △ Less

Submitted 23 February, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: 39 Pages

arXiv:2102.01822 [pdf, other]

Multi-class probabilistic atlas-based whole heart segmentation method in cardiac CT and MRI

Authors: Tarun Kanti Ghosh, Md. Kamrul Hasan, Shidhartho Roy, Md. Ashraful Alam, Eklas Hossain, Mohiuddin Ahmad

Abstract: Accurate and robust whole heart substructure segmentation is crucial in develo** clinical applications, such as computer-aided diagnosis and computer-aided surgery. However, segmentation of different heart substructures is challenging because of inadequate edge or boundary information, the complexity of the background and texture, and the diversity in different substructures' sizes and shapes. T… ▽ More Accurate and robust whole heart substructure segmentation is crucial in develo** clinical applications, such as computer-aided diagnosis and computer-aided surgery. However, segmentation of different heart substructures is challenging because of inadequate edge or boundary information, the complexity of the background and texture, and the diversity in different substructures' sizes and shapes. This article proposes a framework for multi-class whole heart segmentation employing non-rigid registration-based probabilistic atlas incorporating the Bayesian framework. We also propose a non-rigid registration pipeline utilizing a multi-resolution strategy for obtaining the highest attainable mutual information between the moving and fixed images. We further incorporate non-rigid registration into the expectation-maximization algorithm and implement different deep convolutional neural network-based encoder-decoder networks for ablation studies. All the extensive experiments are conducted utilizing the publicly available dataset for the whole heart segmentation containing 20 MRI and 20 CT cardiac images. The proposed approach exhibits an encouraging achievement, yielding a mean volume overlap** error of 14.5 % for CT scans exceeding the state-of-the-art results by a margin of 1.3 % in terms of the same metric. As the proposed approach provides better-results to delineate the different substructures of the heart, it can be a medical diagnostic aiding tool for hel** experts with quicker and more accurate results. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 17 pages

arXiv:2007.11993 [pdf, other]

CVR-Net: A deep convolutional neural network for coronavirus recognition from chest radiography images

Authors: Md. Kamrul Hasan, Md. Ashraful Alam, Md. Toufick E Elahi, Shidhartho Roy, Sifat Redwan Wahid

Abstract: The novel Coronavirus Disease 2019 (COVID-19) is a global pandemic disease spreading rapidly around the world. A robust and automatic early recognition of COVID-19, via auxiliary computer-aided diagnostic tools, is essential for disease cure and control. The chest radiography images, such as Computed Tomography (CT) and X-ray, and deep Convolutional Neural Networks (CNNs), can be a significant and… ▽ More The novel Coronavirus Disease 2019 (COVID-19) is a global pandemic disease spreading rapidly around the world. A robust and automatic early recognition of COVID-19, via auxiliary computer-aided diagnostic tools, is essential for disease cure and control. The chest radiography images, such as Computed Tomography (CT) and X-ray, and deep Convolutional Neural Networks (CNNs), can be a significant and useful material for designing such tools. However, designing such an automated tool is challenging as a massive number of manually annotated datasets are not publicly available yet, which is the core requirement of supervised learning systems. In this article, we propose a robust CNN-based network, called CVR-Net (Coronavirus Recognition Network), for the automatic recognition of the coronavirus from CT or X-ray images. The proposed end-to-end CVR-Net is a multi-scale-multi-encoder ensemble model, where we have aggregated the outputs from two different encoders and their different scales to obtain the final prediction probability. We train and test the proposed CVR-Net on three different datasets, where the images have collected from different open-source repositories. We compare our proposed CVR-Net with state-of-the-art methods, which are trained and tested on the same datasets. We split three datasets into five different tasks, where each task has a different number of classes, to evaluate the multi-tasking CVR-Net. Our model achieves an overall F1-score & accuracy of 0.997 & 0.998; 0.963 & 0.964; 0.816 & 0.820; 0.961 & 0.961; and 0.780 & 0.780, respectively, for task-1 to task-5. As the CVR-Net provides promising results on the small datasets, it can be an auspicious computer-aided diagnostic tool for the diagnosis of coronavirus to assist the clinical practitioners and radiologists. Our source codes and model are publicly available at https://github.com/kamruleee51/CVR-Net. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Comments: 31 Pages

arXiv:2006.12738 [pdf, other]

doi 10.1109/ICCITechn.2015.7488045

Better User Recommendations using Enhancing Software Development Process Repository

Authors: Ziaur Rahman, Md. Kamrul Hasan

Abstract: Reusing previously completed software repository to enhance the development process is a common phenomenon. If developers get suggestions from the existing projects they might be benefited a lot what they eventually expect while coding. The strategies available in this field have been rapidly changing day by day. There are a number of efforts that have been focusing on mining process and construct… ▽ More Reusing previously completed software repository to enhance the development process is a common phenomenon. If developers get suggestions from the existing projects they might be benefited a lot what they eventually expect while coding. The strategies available in this field have been rapidly changing day by day. There are a number of efforts that have been focusing on mining process and constructing repository. Some of them have emphasized on the web based code searching while others have integrated web based code searching in their customized tool. But web based approaches have inefficiency especially in building repository on which they apply mining technologies. To search the code snippets in response to the user query we need an enriched repository with better representation and abstraction. To ensure that repository before mining process we have developed a concept based on Enhancing Software Development Process (ESDP). In ESDP approach multiple sources of codes from both online and offline storages are considered to construct the central repository with XML representation and applied mining techniques in the client side. The respective evaluation shows that ESDP approach works much better in response time and performance than many other existing approaches available today. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Comments: 6 Pages 6 Figures 8 Tables

ACM Class: K.6.3

Journal ref: 2015 18th International Conference on Computer and Information Technology (ICCIT), Dhaka, 2015, pp. 70-75

arXiv:2006.02578 [pdf, other]

doi 10.13140/RG.2.2.18341.86249

DFR-TSD: A Deep Learning Based Framework for Robust Traffic Sign Detection Under Challenging Weather Conditions

Authors: Sabbir Ahmed, Uday Kamal, Md. Kamrul Hasan

Abstract: Robust traffic sign detection and recognition (TSDR) is of paramount importance for the successful realization of autonomous vehicle technology. The importance of this task has led to a vast amount of research efforts and many promising methods have been proposed in the existing literature. However, the SOTA (SOTA) methods have been evaluated on clean and challenge-free datasets and overlooked the… ▽ More Robust traffic sign detection and recognition (TSDR) is of paramount importance for the successful realization of autonomous vehicle technology. The importance of this task has led to a vast amount of research efforts and many promising methods have been proposed in the existing literature. However, the SOTA (SOTA) methods have been evaluated on clean and challenge-free datasets and overlooked the performance deterioration associated with different challenging conditions (CCs) that obscure the traffic images captured in the wild. In this paper, we look at the TSDR problem under CCs and focus on the performance degradation associated with them. To overcome this, we propose a Convolutional Neural Network (CNN) based TSDR framework with prior enhancement. Our modular approach consists of a CNN-based challenge classifier, Enhance-Net, an encoder-decoder CNN architecture for image enhancement, and two separate CNN architectures for sign-detection and classification. We propose a novel training pipeline for Enhance-Net that focuses on the enhancement of the traffic sign regions (instead of the whole image) in the challenging images subject to their accurate detection. We used CURE-TSD dataset consisting of traffic videos captured under different CCs to evaluate the efficacy of our approach. We experimentally show that our method obtains an overall precision and recall of 91.1% and 70.71% that is 7.58% and 35.90% improvement in precision and recall, respectively, compared to the current benchmark. Furthermore, we compare our approach with SOTA object detection networks, Faster-RCNN and R-FCN, and show that our approach outperforms them by a large margin. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:2006.00205 [pdf]

doi 10.1109/ICAIIC.2019.8668981

Opportunities of Optical Spectrum for Future Wireless Communications

Authors: Mostafa Zaman Chowdhury, Moh Khalid Hasan, Md Shahjalal, Eun Bi Shin, Yeong Min Jang

Abstract: The requirements in terms of service quality such as data rate, latency, power consumption, number of connectivity of future fifth-generation (5G) communication is very high. Moreover, in Internet of Things (IoT) requires massive connectivity. Optical wireless communication (OWC) technologies such as visible light communication, light fidelity, optical camera communication, and free space optical… ▽ More The requirements in terms of service quality such as data rate, latency, power consumption, number of connectivity of future fifth-generation (5G) communication is very high. Moreover, in Internet of Things (IoT) requires massive connectivity. Optical wireless communication (OWC) technologies such as visible light communication, light fidelity, optical camera communication, and free space optical communication can effectively serve for the successful deployment of 5G and IoT. This paper clearly presents the contributions of OWC networks for 5G and IoT solutions. △ Less

Submitted 30 May, 2020; originally announced June 2020.

Comments: 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)

arXiv:2006.00204 [pdf]

doi 10.1109/ICTC.2018.8539460

Optical wireless hybrid networks for 5G and beyond communications

Authors: Mostafa Zaman Chowdhury, Moh Khalid Hasan, Md Shahjalal, Md Tanvir Hossan, Yeong Min Jang

Abstract: The next 5 th generation (5G) and above ultra-high speed, ultra-low latency, and extremely high reliable communication systems will consist of heterogeneous networks. These heterogeneous networks will consist not only radio frequency (RF) based systems but also optical wireless based systems. Hybrid architectures among different networks is an excellent approach for achieving the required level of… ▽ More The next 5 th generation (5G) and above ultra-high speed, ultra-low latency, and extremely high reliable communication systems will consist of heterogeneous networks. These heterogeneous networks will consist not only radio frequency (RF) based systems but also optical wireless based systems. Hybrid architectures among different networks is an excellent approach for achieving the required level of service quality. In this paper, we provide the opportunities bring by hybrid systems considering RF as well as optical wireless based communication technologies. We also discuss about the key research direction of hybrid network systems. △ Less

Submitted 30 May, 2020; originally announced June 2020.

Comments: 2018 International Conference on Information and Communication Technology Convergence (ICTC)

arXiv:2004.11253 [pdf, other]

L-CO-Net: Learned Condensation-Optimization Network for Clinical Parameter Estimation from Cardiac Cine MRI

Authors: S. M. Kamrul Hasan, Cristian A. Linte

Abstract: In this work, we implement a fully convolutional segmenter featuring both a learned group structure and a regularized weight-pruner to reduce the high computational cost in volumetric image segmentation. We validated our framework on the ACDC dataset featuring one healthy and four pathology groups imaged throughout the cardiac cycle. Our technique achieved Dice scores of 96.8% (LV blood-pool), 93.… ▽ More In this work, we implement a fully convolutional segmenter featuring both a learned group structure and a regularized weight-pruner to reduce the high computational cost in volumetric image segmentation. We validated our framework on the ACDC dataset featuring one healthy and four pathology groups imaged throughout the cardiac cycle. Our technique achieved Dice scores of 96.8% (LV blood-pool), 93.3% (RV blood-pool) and 90.0% (LV Myocardium) with five-fold cross-validation and yielded similar clinical parameters as those estimated from the ground truth segmentation data. Based on these results, this technique has the potential to become an efficient and competitive cardiac image segmentation tool that may be used for cardiac computer-aided diagnosis, planning, and guidance applications. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: 6 pages, 5 figures, IEEE Conference. arXiv admin note: text overlap with arXiv:2004.02249

arXiv:2004.02249 [pdf, other]

CondenseUNet: A Memory-Efficient Condensely-Connected Architecture for Bi-ventricular Blood Pool and Myocardium Segmentation

Authors: S. M. Kamrul Hasan, Cristian A. Linte

Abstract: With the advent of Cardiac Cine Magnetic Resonance (CMR) Imaging, there has been a paradigm shift in medical technology, thanks to its capability of imaging different structures within the heart without ionizing radiation. However, it is very challenging to conduct pre-operative planning of minimally invasive cardiac procedures without accurate segmentation and identification of the left ventricle… ▽ More With the advent of Cardiac Cine Magnetic Resonance (CMR) Imaging, there has been a paradigm shift in medical technology, thanks to its capability of imaging different structures within the heart without ionizing radiation. However, it is very challenging to conduct pre-operative planning of minimally invasive cardiac procedures without accurate segmentation and identification of the left ventricle (LV), right ventricle (RV) blood-pool, and LV-myocardium. Manual segmentation of those structures, nevertheless, is time-consuming and often prone to error and biased outcomes. Hence, automatic and computationally efficient segmentation techniques are paramount. In this work, we propose a novel memory-efficient Convolutional Neural Network (CNN) architecture as a modification of both CondenseNet, as well as DenseNet for ventricular blood-pool segmentation by introducing a bottleneck block and an upsampling path. Our experiments show that the proposed architecture runs on the Automated Cardiac Diagnosis Challenge (ACDC) dataset using half (50%) the memory requirement of DenseNet and one-twelfth (~ 8%) of the memory requirements of U-Net, while still maintaining excellent accuracy of cardiac segmentation. We validated the framework on the ACDC dataset featuring one healthy and four pathology groups whose heart images were acquired throughout the cardiac cycle and achieved the mean dice scores of 96.78% (LV blood-pool), 93.46% (RV blood-pool) and 90.1% (LV-Myocardium). These results are promising and promote the proposed methods as a competitive tool for cardiac image segmentation and clinical parameter estimation that has the potential to provide fast and accurate results, as needed for pre-procedural planning and/or pre-operative applications. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: 7 pages, 3 figures

arXiv:1911.05305 [pdf, other]

Emotion Recognition with Forearm-based Electromyography

Authors: Muhammad Shihab Rashid, Zubayet Zaman, Hasan Mahmud, Md. Kamrul Hasan

Abstract: Electromyography is an unexplored field of study when it comes to alternate input modality while interacting with a computer. However, to make computers understand human emotions is pivotal in the area of human-computer interaction and in assistive technology. Traditional input devices used currently have limitations and restrictions when it comes to express human emotions. The applications regard… ▽ More Electromyography is an unexplored field of study when it comes to alternate input modality while interacting with a computer. However, to make computers understand human emotions is pivotal in the area of human-computer interaction and in assistive technology. Traditional input devices used currently have limitations and restrictions when it comes to express human emotions. The applications regarding computers and emotions are vast. In this paper we analyze EMG signals recorded from a low cost MyoSensor and classify them into two classes - Relaxed and Angry. In order to perform this classification we have created a dataset collected from 10 users, extracted 8 significant features and classified them using Support Vector Machine algorithm. We show uniquely that forearm-based EMG signal can express emotions. Experimental results show an accuracy of 88.1% after 300 iterations.This shows significant opportunities in various fields of computer science such as gaming and e-learning tools where EMG signals can be used to detect human emotions and make the system provide feedback based on it. We discuss further applications of the method that seeks to expand the range of human-computer interaction beyond the button box. △ Less

Submitted 13 November, 2019; originally announced November 2019.

arXiv:1910.02579 [pdf]

A Novel Technique of Noninvasive Hemoglobin Level Measurement Using HSV Value of Fingertip Image

Authors: Md Kamrul Hasan, Nazmus Sakib, Joshua Field, Richard R. Love, Sheikh I. Ahamed

Abstract: Over the last decade, smartphones have changed radically to support us with mHealth technology, cloud computing, and machine learning algorithm. Having its multifaceted facilities, we present a novel smartphone-based noninvasive hemoglobin (Hb) level prediction model by analyzing hue, saturation and value (HSV) of a fingertip video. Here, we collect 60 videos of 60 subjects from two different loca… ▽ More Over the last decade, smartphones have changed radically to support us with mHealth technology, cloud computing, and machine learning algorithm. Having its multifaceted facilities, we present a novel smartphone-based noninvasive hemoglobin (Hb) level prediction model by analyzing hue, saturation and value (HSV) of a fingertip video. Here, we collect 60 videos of 60 subjects from two different locations: Blood Center of Wisconsin, USA and AmaderGram, Bangladesh. We extract red, green, and blue (RGB) pixel intensities of selected images of those videos captured by the smartphone camera with flash on. Then we convert RGB values of selected video frames of a fingertip video into HSV color space and we generate histogram values of these HSV pixel intensities. We average these histogram values of a fingertip video and consider as an observation against the gold standard Hb concentration. We generate two input feature matrices based on observation of two different data sets. Partial Least Squares (PLS) algorithm is applied on the input feature matrix. We observe R2=0.95 in both data sets through our research. We analyze our data using Python OpenCV, Matlab, and R statistics tool. △ Less

Submitted 6 October, 2019; originally announced October 2019.

arXiv:1908.05787 [pdf, other]

Integrating Multimodal Information in Large Pretrained Transformers

Authors: Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque

Abstract: Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While fine-tuning these pre-trained models is straightforward for lexical applications (applications with onl… ▽ More Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While fine-tuning these pre-trained models is straightforward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused on modeling face-to-face communication). Pre-trained models don't have the necessary components to accept two extra modalities of vision and acoustic. In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). MAG allows BERT and XLNet to accept multimodal nonverbal data during fine-tuning. It does so by generating a shift to internal representation of BERT and XLNet; a shift that is conditioned on the visual and acoustic modalities. In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis. Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only fine-tuning of BERT and XLNet. On the CMU-MOSI dataset, MAG-XLNet achieves human-level multimodal sentiment analysis performance for the first time in the NLP community. △ Less

Submitted 21 November, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

arXiv:1907.04424 [pdf, other]

Automatic Mass Detection in Breast Using Deep Convolutional Neural Network and SVM Classifier

Authors: Md. Kamrul Hasan, Tajwar Abrar Aleef

Abstract: Mammography is the most widely used gold standard for screening breast cancer, where, mass detection is considered as the prominent step. Detecting mass in the breast is, however, an arduous problem as they usually have large variations between them in terms of shape, size, boundary, and texture. In this literature, the process of mass detection is automated with the use of transfer learning techn… ▽ More Mammography is the most widely used gold standard for screening breast cancer, where, mass detection is considered as the prominent step. Detecting mass in the breast is, however, an arduous problem as they usually have large variations between them in terms of shape, size, boundary, and texture. In this literature, the process of mass detection is automated with the use of transfer learning techniques of Deep Convolutional Neural Networks (DCNN). Pre-trained VGG19 network is used to extract features which are then followed by bagged decision tree for features selection and then a Support Vector Machine (SVM) classifier is trained and used for classifying between the mass and non-mass. Area Under ROC Curve (AUC) is chosen as the performance metric, which is then maximized during classifier selection and hyper-parameter tuning. The robustness of the two selected type of classifiers, C-SVM, and \u{psion}-SVM, are investigated with extensive experiments before selecting the best performing classifier. All experiments in this paper were conducted using the INbreast dataset. The best AUC obtained from the experimental results is 0.994 +/- 0.003 i.e. [0.991, 0.997]. Our results conclude that by using pre-trained VGG19 network, high-level distinctive features can be extracted from Mammograms which when used with the proposed SVM classifier is able to robustly distinguish between the mass and non-mass present in breast. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: 11 pages

arXiv:1907.04305 [pdf, other]

DSNet: Automatic Dermoscopic Skin Lesion Segmentation

Authors: Md. Kamrul Hasan, Lavsen Dahal, Prasad N. Samarakoon, Fakrul Islam Tushar, Robert Marti Marly

Abstract: Automatic segmentation of skin lesion is considered a crucial step in Computer Aided Diagnosis (CAD) for melanoma diagnosis. Despite its significance, skin lesion segmentation remains a challenging task due to their diverse color, texture, and indistinguishable boundaries and forms an open problem. Through this study, we present a new and automatic semantic segmentation network for robust skin les… ▽ More Automatic segmentation of skin lesion is considered a crucial step in Computer Aided Diagnosis (CAD) for melanoma diagnosis. Despite its significance, skin lesion segmentation remains a challenging task due to their diverse color, texture, and indistinguishable boundaries and forms an open problem. Through this study, we present a new and automatic semantic segmentation network for robust skin lesion segmentation named Dermoscopic Skin Network (DSNet). In order to reduce the number of parameters to make the network lightweight, we used depth-wise separable convolution in lieu of standard convolution to project the learned discriminating features onto the pixel space at different stages of the encoder. Additionally, we implemented U-Net and Fully Convolutional Network (FCN8s) to compare against the proposed DSNet. We evaluate our proposed model on two publicly available datasets, namely ISIC-2017 and PH2. The obtained mean Intersection over Union (mIoU) is 77.5 % and 87.0 % respectively for ISIC-2017 and PH2 datasets which outperformed the ISIC-2017 challenge winner by 1.0 % with respect to mIoU. Our proposed network also outperformed U-Net and FCN8s respectively by 3.6 % and 6.8 % with respect to mIoU on the ISIC-2017 dataset. Our network for skin lesion segmentation outperforms other methods and can provide better segmented masks on two different test datasets which can lead to better performance in melanoma detection. Our trained model along with the source code and predicted masks are made publicly available. △ Less

Submitted 23 January, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: 25 pages

arXiv:1905.08392 [pdf, other]

A Causality-Guided Prediction of the TED Talk Ratings from the Speech-Transcripts using Neural Networks

Authors: Md Iftekhar Tanveer, Md Kamrul Hasan, Daniel Gildea, M. Ehsan Hoque

Abstract: Automated prediction of public speaking performance enables novel systems for tutoring public speaking skills. We use the largest open repository---TED Talks---to predict the ratings provided by the online viewers. The dataset contains over 2200 talk transcripts and the associated meta information including over 5.5 million ratings from spontaneous visitors to the website. We carefully removed the… ▽ More Automated prediction of public speaking performance enables novel systems for tutoring public speaking skills. We use the largest open repository---TED Talks---to predict the ratings provided by the online viewers. The dataset contains over 2200 talk transcripts and the associated meta information including over 5.5 million ratings from spontaneous visitors to the website. We carefully removed the bias present in the dataset (e.g., the speakers' reputations, popularity gained by publicity, etc.) by modeling the data generating process using a causal diagram. We use a word sequence based recurrent architecture and a dependency tree based recursive architecture as the neural networks for predicting the TED talk ratings. Our neural network models can predict the ratings with an average F-score of 0.77 which largely outperforms the competitive baseline method. △ Less

Submitted 20 May, 2019; originally announced May 2019.

arXiv:1904.06618 [pdf, other]

doi 10.18653/v1/D19-1211

UR-FUNNY: A Multimodal Language Dataset for Understanding Humor

Authors: Md Kamrul Hasan, Wasifur Rahman, Amir Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, Louis-Philippe Morency, Mohammed, Hoque

Abstract: Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it… ▽ More Humor is a unique and creative communicative behavior displayed during social interactions. It is produced in a multimodal manner, through the usage of words (text), gestures (vision) and prosodic cues (acoustic). Understanding humor from these three modalities falls within boundaries of multimodal language; a recent research trend in natural language processing that models natural language as it happens in face-to-face communication. Although humor detection is an established research area in NLP, in a multimodal context it is an understudied area. This paper presents a diverse multimodal dataset, called UR-FUNNY, to open the door to understanding multimodal language used in expressing humor. The dataset and accompanying studies, present a framework in multimodal humor detection for the natural language processing community. UR-FUNNY is publicly available for research. △ Less

Submitted 13 April, 2019; originally announced April 2019.

Journal ref: EMNLP-IJCNLP, 2019, 2046-2056

arXiv:1904.03075 [pdf, other]

Comparative Analysis of Automatic Skin Lesion Segmentation with Two Different Implementations

Authors: Md. Kamrul Hasan, Basel Alyafi, Fakrul Islam Tushar

Abstract: Lesion segmentation from the surrounding skin is the first task for develo** automatic Computer-Aided Diagnosis of skin cancer. Variant features of lesion like uneven distribution of color, irregular shape, border and texture make this task challenging. The contribution of this paper is to present and compare two different approaches to skin lesion segmentation. The first approach uses watershed… ▽ More Lesion segmentation from the surrounding skin is the first task for develo** automatic Computer-Aided Diagnosis of skin cancer. Variant features of lesion like uneven distribution of color, irregular shape, border and texture make this task challenging. The contribution of this paper is to present and compare two different approaches to skin lesion segmentation. The first approach uses watershed, while the second approach uses mean-shift. Pre-processing steps were performed in both approaches for removing hair and dark borders of microscopic images. The Evaluation of the proposed approaches was performed using Jaccard Index (Intersection over Union or IoU). An additional contribution of this paper is to present pipelines for performing pre-processing and segmentation applying existing segmentation and morphological algorithms which led to promising results. On average, the first approach showed better performance than the second one with average Jaccard Index over 200 ISIC-2017 challenge images are 89.16% and 76.94% respectively. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: 4 pages, 4 figures, 4 tables, 4 sections

MSC Class: 68U10 ACM Class: I.4.6; I.5.3

arXiv:1904.00068 [pdf]

Brain Tissue Segmentation Using NeuroNet With Different Pre-processing Techniques

Authors: Fakrul Islam Tushar, Basel Alyafi, Md. Kamrul Hasan, Lavsen Dahal

Abstract: Automatic segmentation of brain Magnetic Resonance Imaging (MRI) images is one of the vital steps for quantitative analysis of brain for further inspection. In this paper, NeuroNet has been adopted to segment the brain tissues (white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF)) which uses Residual Network (ResNet) in encoder and Fully Convolution Network (FCN) in the decoder. To ac… ▽ More Automatic segmentation of brain Magnetic Resonance Imaging (MRI) images is one of the vital steps for quantitative analysis of brain for further inspection. In this paper, NeuroNet has been adopted to segment the brain tissues (white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF)) which uses Residual Network (ResNet) in encoder and Fully Convolution Network (FCN) in the decoder. To achieve the best performance, various hyper-parameters have been tuned, while, network parameters (kernel and bias) were initialized using the NeuroNet pre-trained model. Different pre-processing pipelines have also been introduced to get a robust trained model. The model has been trained and tested on IBSR18 data-set. To validate the research outcome, performance was measured quantitatively using Dice Similarity Coefficient (DSC) and is reported on average as 0.84 for CSF, 0.94 for GM, and 0.94 for WM. The outcome of the research indicates that for the IBSR18 data-set, pre-processing and proper tuning of hyper-parameters for NeuroNet model have improvement in DSC for the brain tissue segmentation. △ Less

Submitted 29 March, 2019; originally announced April 2019.

Comments: 3rd International Conference on Imaging, Vision & Pattern Recognition (IVPR)2019

arXiv:1902.08994 [pdf, other]

U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instrument

Authors: S. M. Kamrul Hasan, Cristian A. Linte

Abstract: Conventional therapy approaches limit surgeons' dexterity control due to limited field-of-view. With the advent of robot-assisted surgery, there has been a paradigm shift in medical technology for minimally invasive surgery. However, it is very challenging to track the position of the surgical instruments in a surgical scene, and accurate detection & identification of surgical tools is paramount.… ▽ More Conventional therapy approaches limit surgeons' dexterity control due to limited field-of-view. With the advent of robot-assisted surgery, there has been a paradigm shift in medical technology for minimally invasive surgery. However, it is very challenging to track the position of the surgical instruments in a surgical scene, and accurate detection & identification of surgical tools is paramount. Deep learning-based semantic segmentation in frames of surgery videos has the potential to facilitate this task. In this work, we modify the U-Net architecture named U-NetPlus, by introducing a pre-trained encoder and re-design the decoder part, by replacing the transposed convolution operation with an upsampling operation based on nearest-neighbor (NN) interpolation. To further improve performance, we also employ a very fast and flexible data augmentation technique. We trained the framework on 8 x 225 frame sequences of robotic surgical videos, available through the MICCAI 2017 EndoVis Challenge dataset and tested it on 8 x 75 frame and 2 x 300 frame videos. Using our U-NetPlus architecture, we report a 90.20% DICE for binary segmentation, 76.26% DICE for instrument part segmentation, and 46.07% for instrument type (i.e., all instruments) segmentation, outperforming the results of previous techniques implemented and tested on these data. △ Less

Submitted 24 February, 2019; originally announced February 2019.

Comments: 7 pages, 6 figures, IEEE conference submission

arXiv:1810.04637 [pdf]

Quantification of Trabeculae Inside the Heart from MRI Using Fractal Analysis

Authors: Md. Kamrul Hasan, Fakrul Islam Tushar

Abstract: Left ventricular non-compaction (LVNC) is a rare cardiomyopathy (CMP) that should be considered as a possible diagnosis because of its potential complications which are heart failure, ventricular arrhythmias, and embolic events. For analysis cardiac functionality, extracting information from the Left ventricular (LV) is already a broad field of Medical Imaging. Different algorithms and strategies… ▽ More Left ventricular non-compaction (LVNC) is a rare cardiomyopathy (CMP) that should be considered as a possible diagnosis because of its potential complications which are heart failure, ventricular arrhythmias, and embolic events. For analysis cardiac functionality, extracting information from the Left ventricular (LV) is already a broad field of Medical Imaging. Different algorithms and strategies ranging that is semiautomated or automated has already been developed to get useful information from such a critical structure of heart. Trabeculae in the heart undergoes difference changes like solid from spongy. Due to failure of this process left ventricle non-compaction occurred. In this project, we will demonstrate the fractal dimension (FD) and manual segmentation of the Magnetic Resonance Imaging (MRI) of the heart that quantify amount of trabeculae inside the heart. The greater the value of fractal dimension inside the heart indicates the greater complex pattern of the trabeculae in the heart. △ Less

Submitted 14 October, 2018; v1 submitted 30 September, 2018; originally announced October 2018.

arXiv:1810.02600 [pdf]

An Implementation Approach and Performance Analysis of Image Sensor Based Multilateral Indoor Localization and Navigation System

Authors: Md. Shahjalal, Md. Tanvir Hossan, Moh. Khalid Hasan, Mostafa Zaman Chowdhury, Nam Tuan Le, Yeong Min Jang

Abstract: Optical camera communication (OCC) exhibits considerable importance nowadays in various indoor camera based services such as smart home and robot-based automation. An android smart phone camera that is mounted on a mobile robot (MR) offers a uniform communication distance when the camera remains at the same level that can reduce the communication error rate. Indoor mobile robot navigation (MRN) is… ▽ More Optical camera communication (OCC) exhibits considerable importance nowadays in various indoor camera based services such as smart home and robot-based automation. An android smart phone camera that is mounted on a mobile robot (MR) offers a uniform communication distance when the camera remains at the same level that can reduce the communication error rate. Indoor mobile robot navigation (MRN) is considered to be a promising OCC application in which the white light emitting diodes (LEDs) and an MR camera are used as transmitters and receiver respectively. Positioning is a key issue in MRN systems in terms of accuracy, data rate, and distance. We propose an indoor navigation and positioning combined algorithm and further evaluate its performance. An android application is developed to support data acquisition from multiple simultaneous transmitter links. Experimentally, we received data from four links which are required to ensure a higher positioning accuracy. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Journal ref: Wireless Communications and Mobile Computing, 2018

arXiv:1810.02597 [pdf]

doi 10.1007/s11277-018-5971-3

Integrated RF/Optical Wireless Networks for Improving QoS in Indoor and Transportation Applications

Authors: Mostafa Zaman Chowdhury, Md. Tanvir Hossan, Moh. Khalid Hasan, Yeong Min Jang

Abstract: Communications based solely on radio frequency (RF) networks cannot provide adequate quality of service for the rapidly growing demands of wireless connectivity. Since devices operating in the optical spectrum do not interfere with those using the RF spectrum, wireless networks based on the optical spectrum can be added to existing RF networks to fulfill this demand. Hence, optical wireless commun… ▽ More Communications based solely on radio frequency (RF) networks cannot provide adequate quality of service for the rapidly growing demands of wireless connectivity. Since devices operating in the optical spectrum do not interfere with those using the RF spectrum, wireless networks based on the optical spectrum can be added to existing RF networks to fulfill this demand. Hence, optical wireless communication (OWC) technology can be an excellent complement to RF-based technology to provide improved service. Promising OWC systems include light fidelity (LiFi), visible light communication, optical camera communication (OCC), and free-space optical communication (FSOC). OWC and RF systems have differing limitations, and the integration of RF and optical wireless networks can overcome the limitations of both systems. This paper describes an LiFi/femtocell hybrid network system for indoor environments. Low signal-to-interference-plus-noise ratios and the shortage bandwidth problems of existing RF femtocell networks can be overcome with the proposed hybrid model. Moreover, we describe an integrated RF/optical wireless system that can be employed for users inside a vehicle, remote monitoring of ambulance patients, vehicle tracking, and vehicle-to-vehicle communications. We consider LiFi, OCC, and FSOC as the optical wireless technologies to be used for communication support in transportation, and assume macrocells, femtocells, and wireless fidelity to be the corresponding RF technologies. We describe handover management-including detailed call flow, interference management, link reliability improvement, and group handover provisioning-for integrated networks. Performance analyses demonstrate the significance of the proposed integrated RF/optical wireless systems. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Journal ref: Sept. 2018

arXiv:1810.02589 [pdf]

doi 10.1155/2018/8501898

A New Vehicle Localization Scheme Based on Combined Optical Camera Communication and Photogrammetry

Authors: Md. Tanvir Hossan, Mostafa Zaman Chowdhury, Moh. Khalid Hasan, Md. Shahjalal, Trang Nguyen, Nam Tuan Le, Yeong Min Jang

Abstract: The demand for autonomous vehicles is increasing gradually owing to their enormous potential benefits. However, several challenges, such as vehicle localization, are involved in the development of autonomous vehicles. A simple and secure algorithm for vehicle positioning is proposed herein without massively modifying the existing transportation infrastructure. For vehicle localization, vehicles on… ▽ More The demand for autonomous vehicles is increasing gradually owing to their enormous potential benefits. However, several challenges, such as vehicle localization, are involved in the development of autonomous vehicles. A simple and secure algorithm for vehicle positioning is proposed herein without massively modifying the existing transportation infrastructure. For vehicle localization, vehicles on the road are classified into two categories: host vehicles (HVs) are the ones used to estimate other vehicles' positions and forwarding vehicles (FVs) are the ones that move in front of the HVs. The FV transmits modulated data from the tail (or back) light, and the camera of the HV receives that signal using optical camera communication (OCC). In addition, the streetlight (SL) data are considered to ensure the position accuracy of the HV. Determining the HV position minimizes the relative position variation between the HV and FV. Using photogrammetry, the distance between FV or SL and the camera of the HV is calculated by measuring the occupied image area on the image sensor. Comparing the change in distance between HV and SLs with the change in distance between HV and FV, the positions of FVs are determined. The performance of the proposed technique is analyzed, and the results indicate a significant improvement in performance. The experimental distance measurement validated the feasibility of the proposed scheme. △ Less

Submitted 5 October, 2018; originally announced October 2018.

Journal ref: Mobile Information Systems, vol. 2018, March 2018

arXiv:1709.02414 [pdf]

Automated Dyadic Data Recorder (ADDR) Framework and Analysis of Facial Cues in Deceptive Communication

Authors: Tayan Sen, Md Kamrul Hasan, Zach Teicher, M. Ehsan Hoque

Abstract: We developed an online framework that can automatically pair two crowd-sourced participants, prompt them to follow a research protocol, and record their audio and video on a remote server. The framework comprises two web applications: an Automatic Quality Gatekeeper for ensuring only high quality crowd-sourced participants are recruited for the study, and a Session Controller which directs partici… ▽ More We developed an online framework that can automatically pair two crowd-sourced participants, prompt them to follow a research protocol, and record their audio and video on a remote server. The framework comprises two web applications: an Automatic Quality Gatekeeper for ensuring only high quality crowd-sourced participants are recruited for the study, and a Session Controller which directs participants to play a research protocol, such as an interrogation game. This framework was used to run a research study for analyzing facial expressions during honest and deceptive communication using a novel interrogation protocol. The protocol gathers two sets of nonverbal facial cues in participants: features expressed during questions relating to the interrogation topic and features expressed during control questions. The framework and protocol were used to gather 151 dyadic conversations (1.3 million video frames). Interrogators who were lied to expressed the smile-related lip corner puller cue more often than interrogators who were being told the truth, suggesting that facial cues from interrogators may be useful in evaluating the honesty of witnesses in some contexts. Overall, these results demonstrate that this framework is capable of gathering high quality data which can identify statistically significant results in a communication study. △ Less

Submitted 7 September, 2017; originally announced September 2017.

arXiv:1707.01886 [pdf, ps, other]

doi 10.1057/s41599-018-0116-6

Buildup of Speaking Skills in an Online Learning Community: A Network-Analytic Exploration

Authors: Rasoul Shafipour, Raiyan Abdul Baten, Md Kamrul Hasan, Gourab Ghoshal, Gonzalo Mateos, Mohammed Ehsan Hoque

Abstract: In this study, we explore peer-interaction effects in online networks on speaking skill development. In particular, we present an evidence for gradual buildup of skills in a small-group setting that has not been reported in the literature. We introduce a novel dataset of six online communities consisting of 158 participants focusing on improving their speaking skills. They video-record speeches fo… ▽ More In this study, we explore peer-interaction effects in online networks on speaking skill development. In particular, we present an evidence for gradual buildup of skills in a small-group setting that has not been reported in the literature. We introduce a novel dataset of six online communities consisting of 158 participants focusing on improving their speaking skills. They video-record speeches for 5 prompts in 10 days and exchange comments and performance-ratings with their peers. We ask (i) whether the participants' ratings are affected by their interaction patterns with peers, and (ii) whether there is any gradual buildup of speaking skills in the communities towards homogeneity. To analyze the data, we employ tools from the emerging field of Graph Signal Processing (GSP). GSP enjoys a distinction from Social Network Analysis in that the latter is concerned primarily with the connection structures of graphs, while the former studies signals on top of graphs. We study the performance ratings of the participants as graph signals atop underlying interaction topologies. Total variation analysis of the graph signals show that the participants' rating differences decrease with time (slope=-0.04, p<0.01), while average ratings increase (slope=0.07, p<0.05)--thereby gradually building up the ratings towards community-wide homogeneity. We provide evidence for peer-influence through a prediction formulation. Our consensus-based prediction model outperforms baseline network-agnostic regression models by about 23% in predicting performance ratings. This, in turn, shows that participants' ratings are affected by their peers' ratings and the associated interaction patterns, corroborating previous findings. Then, we formulate a consensus-based diffusion model that captures these observations of peer-influence from our analyses. △ Less

Submitted 12 March, 2018; v1 submitted 6 July, 2017; originally announced July 2017.

Journal ref: Palgrave Communications, vol. 4, June 2018

Showing 1–50 of 54 results for author: Hasan, M K