Search | arXiv e-print repository

arXiv:2406.07716 [pdf]

Unleashing the Power of Transfer Learning Model for Sophisticated Insect Detection: Revolutionizing Insect Classification

Authors: Md. Mahmudul Hasan, SM Shaqib, Ms. Sharmin Akter, Rabiul Alam, Afraz Ul Haque, Shahrun akter khushbu

Abstract: The purpose of the Insect Detection System for Crop and Plant Health is to keep an eye out for and identify insect infestations in farming areas. By utilizing cutting-edge technology like computer vision and machine learning, the system seeks to identify hazardous insects early and accurately. This would enable prompt response to save crops and maintain optimal plant health. The Method of this stu… ▽ More The purpose of the Insect Detection System for Crop and Plant Health is to keep an eye out for and identify insect infestations in farming areas. By utilizing cutting-edge technology like computer vision and machine learning, the system seeks to identify hazardous insects early and accurately. This would enable prompt response to save crops and maintain optimal plant health. The Method of this study includes Data Acquisition, Preprocessing, Data splitting, Model Implementation and Model evaluation. Different models like MobileNetV2, ResNet152V2, Xecption, Custom CNN was used in this study. In order to categorize insect photos, a Convolutional Neural Network (CNN) based on the ResNet152V2 architecture is constructed and evaluated in this work. Achieving 99% training accuracy and 97% testing accuracy, ResNet152V2 demonstrates superior performance among four implemented models. The results highlight its potential for real-world applications in insect classification and entomology studies, emphasizing efficiency and accuracy. To ensure food security and sustain agricultural output globally, finding insects is crucial. Cutting-edge technology, such as ResNet152V2 models, greatly influence automating and improving the accuracy of insect identification. Efficient insect detection not only minimizes crop losses but also enhances agricultural productivity, contributing to sustainable food production. This underscores the pivotal role of technology in addressing challenges related to global food security. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2402.09795 [pdf, other]

doi 10.1016/j.inffus.2023.102004

An advanced data fabric architecture leveraging homomorphic encryption and federated learning

Authors: Sakib Anwar Rieyan, Md. Raisul Kabir News, A. B. M. Muntasir Rahman, Sadia Afrin Khan, Sultan Tasneem Jawad Zaarif, Md. Golam Rabiul Alam, Mohammad Mehedi Hassan, Michele Ianni, Giancarlo Fortino

Abstract: Data fabric is an automated and AI-driven data fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems. In a Federated learning architecture, the global model is trained based on the learned parameters of several local models that eliminate the necessity of moving data to a centralized repository for machine learning… ▽ More Data fabric is an automated and AI-driven data fusion approach to accomplish data management unification without moving data to a centralized location for solving complex data problems. In a Federated learning architecture, the global model is trained based on the learned parameters of several local models that eliminate the necessity of moving data to a centralized repository for machine learning. This paper introduces a secure approach for medical image analysis using federated learning and partially homomorphic encryption within a distributed data fabric architecture. With this method, multiple parties can collaborate in training a machine-learning model without exchanging raw data but using the learned or fused features. The approach complies with laws and regulations such as HIPAA and GDPR, ensuring the privacy and security of the data. The study demonstrates the method's effectiveness through a case study on pituitary tumor classification, achieving a significant level of accuracy. However, the primary focus of the study is on the development and evaluation of federated learning and partially homomorphic encryption as tools for secure medical image analysis. The results highlight the potential of these techniques to be applied to other privacy-sensitive domains and contribute to the growing body of research on secure and privacy-preserving machine learning. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Journal ref: Information Fusion, 102, 102004 (2024)

arXiv:2402.03417 [pdf, other]

A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model

Authors: Murad Hasan, Shahriar Iqbal, Md. Billal Hossain Faisal, Md. Musnad Hossin Neloy, Md. Tonmoy Kabir, Md. Tanzim Reza, Md. Golam Rabiul Alam, Md Zia Uddin

Abstract: Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most aff… ▽ More Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnap**, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Under review for publication in the PLOS ONE journal, 17 pages, 9 figures

arXiv:2401.05378 [pdf]

Detecting QT prolongation From a Single-lead ECG With Deep Learning

Authors: Ridwan Alam, Aaron Aguirre, Collin Stultz

Abstract: For a number of antiarrhythmics, drug loading requires a 3 day hospitalization with monitoring for QT prolongation. Automated QT monitoring with wearable ECG monitors would facilitate out-of-hospital care. We develop a deep learning model that infers QT intervals from ECG lead-I - the lead most often acquired from ambulatory ECG monitors - and to use this model to detect clinically meaningful QT-p… ▽ More For a number of antiarrhythmics, drug loading requires a 3 day hospitalization with monitoring for QT prolongation. Automated QT monitoring with wearable ECG monitors would facilitate out-of-hospital care. We develop a deep learning model that infers QT intervals from ECG lead-I - the lead most often acquired from ambulatory ECG monitors - and to use this model to detect clinically meaningful QT-prolongation episodes during Dofetilide drug loading. Using 4.22 million 12-lead ECG recordings from 903.6 thousand patients at the Massachusetts General Hospital, we develop a deep learning model, QTNet, that infers QT intervals from lead-I. Over 3 million ECGs from 653 thousand patients are used to train the model and an internal-test set containing 633 thousand ECGs from 135 thousand patients was used for testing. QTNet is further evaluated on an external-validation set containing 3.1 million ECGs from 667 thousand patients at another institution. QTNet was used to detect Dofetilide-induced QT prolongation in a publicly available database (ECGRDVQ-dataset) containing ECGs from subjects enrolled in a clinical trial evaluating the effects of antiarrhythmic drugs. QTNet achieves mean absolute errors of 12.63ms (internal-test) and 12.30ms (external-validation) for estimating absolute QT intervals. The associated Pearson correlation coefficients are 0.91 (internal-test) and 0.92 (external-validation). For the ECGRDVQ-dataset, QTNet detects Dofetilide-induced QTc prolongation with 87% sensitivity and 77% specificity. The negative predictive value of the model is greater than 95% when the pre-test probability of drug-induced QTc prolongation is below 25%. Drug-induced QT prolongation risk can be tracked from ECG lead-I using deep learning. △ Less

Submitted 17 December, 2023; originally announced January 2024.

arXiv:2311.07750 [pdf, other]

doi 10.1109/ICCIT60459.2023.10441433

SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification

Authors: S. M. Nabil Ashraf, Md. Adyelullahil Mamun, Hasnat Md. Abdullah, Md. Golam Rabiul Alam

Abstract: Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We co… ▽ More Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We conducted experiments on the "ChestX-ray14" dataset using various pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical models. The best individual model was the CoAtNet, which achieved an area under the receiver operating characteristic curve (AUROC) of 84.2%. By combining the predictions of all trained models using a weighted average ensemble where the weight of each model was determined using differential evolution, we further improved the AUROC to 85.4%, outperforming other state-of-the-art methods in this field. Our findings demonstrate the potential of deep learning techniques, particularly ensemble deep learning, for improving the accuracy of automatic diagnosis of thoracic diseases from chest X-rays. Code available at:https://github.com/syednabilashraf/SynthEnsemble △ Less

Submitted 22 May, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Published in International Conference on Computer and Information Technology (ICCIT) 2023

ACM Class: I.4; I.5

Journal ref: 2023 26th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2023, pp. 1-6

arXiv:2309.05235 [pdf, other]

P2LSG: Powers-of-2 Low-Discrepancy Sequence Generator for Stochastic Computing

Authors: Mehran Shoushtari Moghadam, Sercan Aygun, Mohsen Riahi Alam, M. Hassan Najafi

Abstract: Stochastic Computing (SC) is an unconventional computing paradigm processing data in the form of random bit-streams. The accuracy and energy efficiency of SC systems highly depend on the stochastic number generator (SNG) unit that converts the data from conventional binary to stochastic bit-streams. Recent work has shown significant improvement in the efficiency of SC systems by employing low-disc… ▽ More Stochastic Computing (SC) is an unconventional computing paradigm processing data in the form of random bit-streams. The accuracy and energy efficiency of SC systems highly depend on the stochastic number generator (SNG) unit that converts the data from conventional binary to stochastic bit-streams. Recent work has shown significant improvement in the efficiency of SC systems by employing low-discrepancy (LD) sequences such as Sobol and Halton sequences in the SNG unit. Still, the usage of many well-known random sequences for SC remains unexplored. This work studies some new random sequences for potential application in SC. Our design space exploration proposes a promising random number generator for accurate and energy-efficient SC. We propose P2LSG, a low-cost and energy-efficient Low-discrepancy Sequence Generator derived from Powers-of-2 VDC (Van der Corput) sequences. We evaluate the performance of our novel bit-stream generator for two SC image and video processing case studies: image scaling and scene merging. For the scene merging task, we propose a novel SC design for the first time. Our experimental results show higher accuracy and lower hardware cost and energy consumption compared to the state-of-the-art. △ Less

Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 7 pages, 7 figures, Accepted in 29th ASP-DAC 2024 Conference

arXiv:2308.06549 [pdf, other]

Human Behavior-based Personalized Meal Recommendation and Menu Planning Social System

Authors: Tanvir Islam, Anika Rahman Joyita, Md. Golam Rabiul Alam, Mohammad Mehedi Hassan, Md. Rafiul Hassan, Raffaele Gravina

Abstract: The traditional dietary recommendation systems are basically nutrition or health-aware where the human feelings on food are ignored. Human affects vary when it comes to food cravings, and not all foods are appealing in all moods. A questionnaire-based and preference-aware meal recommendation system can be a solution. However, automated recognition of social affects on different foods and planning… ▽ More The traditional dietary recommendation systems are basically nutrition or health-aware where the human feelings on food are ignored. Human affects vary when it comes to food cravings, and not all foods are appealing in all moods. A questionnaire-based and preference-aware meal recommendation system can be a solution. However, automated recognition of social affects on different foods and planning the menu considering nutritional demand and social-affect has some significant benefits of the questionnaire-based and preference-aware meal recommendations. A patient with severe illness, a person in a coma, or patients with locked-in syndrome and amyotrophic lateral sclerosis (ALS) cannot express their meal preferences. Therefore, the proposed framework includes a social-affective computing module to recognize the affects of different meals where the person's affect is detected using electroencephalography signals. EEG allows to capture the brain signals and analyze them to anticipate affective toward a food. In this study, we have used a 14-channel wireless Emotive Epoc+ to measure affectivity for different food items. A hierarchical ensemble method is applied to predict affectivity upon multiple feature extraction methods and TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) is used to generate a food list based on the predicted affectivity. In addition to the meal recommendation, an automated menu planning approach is also proposed considering a person's energy intake requirement, affectivity, and nutritional values of the different menus. The bin-packing algorithm is used for the personalized menu planning of breakfast, lunch, dinner, and snacks. The experimental findings reveal that the suggested affective computing, meal recommendation, and menu planning algorithms perform well across a variety of assessment parameters. △ Less

Submitted 12 August, 2023; originally announced August 2023.

Journal ref: IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS. 2022

arXiv:2307.11519 [pdf, other]

doi 10.1109/BigData52589.2021.9671955

Multi-modal Hate Speech Detection using Machine Learning

Authors: Fariha Tahosin Boishakhi, Ponkoj Chandra Shill, Md. Golam Rabiul Alam

Abstract: With the continuous growth of internet users and media content, it is very hard to track down hateful speech in audio and video. Converting video or audio into text does not detect hate speech accurately as human sometimes uses hateful words as humorous or pleasant in sense and also uses different voice tones or show different action in the video. The state-ofthe-art hate speech detection models w… ▽ More With the continuous growth of internet users and media content, it is very hard to track down hateful speech in audio and video. Converting video or audio into text does not detect hate speech accurately as human sometimes uses hateful words as humorous or pleasant in sense and also uses different voice tones or show different action in the video. The state-ofthe-art hate speech detection models were mostly developed on a single modality. In this research, a combined approach of multimodal system has been proposed to detect hate speech from video contents by extracting feature images, feature values extracted from the audio, text and used machine learning and Natural language processing. △ Less

Submitted 15 June, 2023; originally announced July 2023.

Comments: 5 pages, 2 figures, conference

arXiv:2307.10923 [pdf, other]

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

Authors: Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin M. Stultz

Abstract: Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab… ▽ More Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: ICML 2023

arXiv:2305.10468 [pdf, other]

Connected Hidden Neurons (CHNNet): An Artificial Neural Network for Rapid Convergence

Authors: Rafiad Sadat Shahir, Zayed Humayun, Mashrufa Akter Tamim, Shouri Saha, Md. Golam Rabiul Alam

Abstract: Despite artificial neural networks being inspired by the functionalities of biological neural networks, unlike biological neural networks, conventional artificial neural networks are often structured hierarchically, which can impede the flow of information between neurons as the neurons in the same layer have no connections between them. Hence, we propose a more robust model of artificial neural n… ▽ More Despite artificial neural networks being inspired by the functionalities of biological neural networks, unlike biological neural networks, conventional artificial neural networks are often structured hierarchically, which can impede the flow of information between neurons as the neurons in the same layer have no connections between them. Hence, we propose a more robust model of artificial neural networks where the hidden neurons, residing in the same hidden layer, are interconnected that leads to rapid convergence. With the experimental study of our proposed model in deep networks, we demonstrate that the model results in a noticeable increase in convergence rate compared to the conventional feed-forward neural network. △ Less

Submitted 24 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2304.13511 [pdf]

A Secure Medical Record Sharing Scheme Based on Blockchain and Two-fold Encryption

Authors: Md. Ahsan Habib, Kazi Md. Rokibul Alam, Yasuhiko Morimoto

Abstract: Usually, a medical record (MR) contains the patients disease-oriented sensitive information. In addition, the MR needs to be shared among different bodies, e.g., diagnostic centres, hospitals, physicians, etc. Hence, retaining the privacy and integrity of MR is crucial. A blockchain based secure MR sharing system can manage these aspects properly. This paper proposes a blockchain based electronic… ▽ More Usually, a medical record (MR) contains the patients disease-oriented sensitive information. In addition, the MR needs to be shared among different bodies, e.g., diagnostic centres, hospitals, physicians, etc. Hence, retaining the privacy and integrity of MR is crucial. A blockchain based secure MR sharing system can manage these aspects properly. This paper proposes a blockchain based electronic (e-) MR sharing scheme that (i) considers the medical image and the text as the input, (ii) enriches the data privacy through a two-fold encryption mechanism consisting of an asymmetric cryptosystem and the dynamic DNA encoding, (iii) assures data integrity by storing the encrypted e-MR in the distinct block designated for each user in the blockchain, and (iv) eventually, enables authorized entities to regain the e-MR through decryption. Preliminary evaluations, analyses, comparisons with state-of-the-art works, etc., imply the efficacy of the proposed scheme. △ Less

Submitted 9 February, 2023; originally announced April 2023.

Comments: 6 pages, 3 tables, 8 figures, ICCIT 2022

arXiv:2304.11046 [pdf]

doi 10.1007/s11042-023-14597-6

Affective social anthropomorphic intelligent system

Authors: Md. Adyelullahil Mamun, Hasnat Md. Abdullah, Md. Golam Rabiul Alam, Muhammad Mehedi Hassan, Md. Zia Uddin

Abstract: Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that ca… ▽ More Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible map** of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system's effectiveness. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: Multimedia Tools and Applications (2023)

arXiv:2303.12772 [pdf, other]

Interpretable Bangla Sarcasm Detection using BERT and Explainable AI

Authors: Ramisa Anan, Tasnim Sakib Apon, Zeba Tahsin Hossain, Elizabeth Antora Modhu, Sudipta Mondal, MD. Golam Rabiul Alam

Abstract: A positive phrase or a sentence with an underlying negative motive is usually defined as sarcasm that is widely used in today's social media platforms such as Facebook, Twitter, Reddit, etc. In recent times active users in social media platforms are increasing dramatically which raises the need for an automated NLP-based system that can be utilized in various tasks such as determining market deman… ▽ More A positive phrase or a sentence with an underlying negative motive is usually defined as sarcasm that is widely used in today's social media platforms such as Facebook, Twitter, Reddit, etc. In recent times active users in social media platforms are increasing dramatically which raises the need for an automated NLP-based system that can be utilized in various tasks such as determining market demand, sentiment analysis, threat detection, etc. However, since sarcasm usually implies the opposite meaning and its detection is frequently a challenging issue, data meaning extraction through an NLP-based model becomes more complicated. As a result, there has been a lot of study on sarcasm detection in English over the past several years, and there's been a noticeable improvement and yet sarcasm detection in the Bangla language's state remains the same. In this article, we present a BERT-based system that can achieve 99.60\% while the utilized traditional machine learning algorithms are only capable of achieving 89.93\%. Additionally, we have employed Local Interpretable Model-Agnostic Explanations that introduce explainability to our system. Moreover, we have utilized a newly collected bangla sarcasm dataset, BanglaSarc that was constructed specifically for the evaluation of this study. This dataset consists of fresh records of sarcastic and non-sarcastic comments, the majority of which are acquired from Facebook and YouTube comment sections. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.09306 [pdf, ps, other]

BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition

Authors: HAZ Sameen Shahgir, Ramisa Alam, Md. Zarif Ul Alam

Abstract: Named Entity Recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying named entities in text. But much work hasn't been done for complex named entity recognition in Bangla, despite being the seventh most spoken language globally. CNER is a more challenging task than traditional NER as it involves identifying and classifying complex and compou… ▽ More Named Entity Recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying named entities in text. But much work hasn't been done for complex named entity recognition in Bangla, despite being the seventh most spoken language globally. CNER is a more challenging task than traditional NER as it involves identifying and classifying complex and compound entities, which are not common in Bangla language. In this paper, we present the winning solution of Bangla Complex Named Entity Recognition Challenge - addressing the CNER task on BanglaCoNER dataset using two different approaches, namely Conditional Random Fields (CRF) and finetuning transformer based Deep Learning models such as BanglaBERT. The dataset consisted of 15300 sentences for training and 800 sentences for validation, in the .conll format. Exploratory Data Analysis (EDA) on the dataset revealed that the dataset had 7 different NER tags, with notable presence of English words, suggesting that the dataset is synthetic and likely a product of translation. We experimented with a variety of feature combinations including Part of Speech (POS) tags, word suffixes, Gazetteers, and cluster information from embeddings, while also finetuning the BanglaBERT (large) model for NER. We found that not all linguistic patterns are immediately apparent or even intuitive to humans, which is why Deep Learning based models has proved to be the more effective model in NLP, including CNER task. Our fine tuned BanglaBERT (large) model achieves an F1 Score of 0.79 on the validation set. Overall, our study highlights the importance of Bangla Complex Named Entity Recognition, particularly in the context of synthetic datasets. Our findings also demonstrate the efficacy of Deep Learning models such as BanglaBERT for NER in Bangla language. △ Less

Submitted 17 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Winning Solution for the Bangla Complex Named Entity Recognition Challenge

arXiv:2302.08911 [pdf, other]

DSE Stock Price Prediction using Hidden Markov Model

Authors: Raihan Tanvir, Md Tanvir Rouf Shawon, Md. Golam Rabiul Alam

Abstract: Stock market forecasting is a classic problem that has been thoroughly investigated using machine learning and artificial neural network based tools and techniques. Interesting aspects of this problem include its time reliance as well as its volatility and other complex relationships. To combine them, hidden markov models (HMMs) have been utilized to anticipate the price of stocks. We demonstrated… ▽ More Stock market forecasting is a classic problem that has been thoroughly investigated using machine learning and artificial neural network based tools and techniques. Interesting aspects of this problem include its time reliance as well as its volatility and other complex relationships. To combine them, hidden markov models (HMMs) have been utilized to anticipate the price of stocks. We demonstrated the Maximum A Posteriori (MAP) HMM method for predicting stock prices for the next day based on previous data. An HMM is trained by analyzing the fractional change in the stock price as well as the intraday high and low values. It is then utilized to produce a MAP estimate across all possible stock prices for the next day. The approach demonstrated in our work is quite generalized and can be used to predict the stock price for any company, given that the HMM is trained on the dataset of that company's stocks dataset. We evaluated the accuracy of our models using some extensively used accuracy metrics for regression problems and came up with a satisfactory outcome. △ Less

Submitted 26 January, 2023; originally announced February 2023.

Comments: 6 pages

arXiv:2212.12146 [pdf, other]

Bengali Handwritten Digit Recognition using CNN with Explainable AI

Authors: Md Tanvir Rouf Shawon, Raihan Tanvir, Md. Golam Rabiul Alam

Abstract: Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good q… ▽ More Handwritten character recognition is a hot topic for research nowadays. If we can convert a handwritten piece of paper into a text-searchable document using the Optical Character Recognition (OCR) technique, we can easily understand the content and do not need to read the handwritten document. OCR in the English language is very common, but in the Bengali language, it is very hard to find a good quality OCR application. If we can merge machine learning and deep learning with OCR, it could be a huge contribution to this field. Various researchers have proposed a number of strategies for recognizing Bengali handwritten characters. A lot of ML algorithms and deep neural networks were used in their work, but the explanations of their models are not available. In our work, we have used various machine learning algorithms and CNN to recognize handwritten Bengali digits. We have got acceptable accuracy from some ML models, and CNN has given us great testing accuracy. Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model and helped us detect the origin of interest for recognizing a digit from an image. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: 2022 4th International Conference on Sustainable Technologies for Industry 4.0 (STI), pp. 1-6

arXiv:2211.14607 [pdf, other]

Sketch2FullStack: Generating Skeleton Code of Full Stack Website and Application from Sketch using Deep Learning and Computer Vision

Authors: Somoy Subandhu Barua, Imam Mohammad Zulkarnain, Abhishek Roy, Md. Golam Rabiul Alam, Md Zia Uddin

Abstract: For a full-stack web or app development, it requires a software firm or more specifically a team of experienced developers to contribute a large portion of their time and resources to design the website and then convert it to code. As a result, the efficiency of the development team is significantly reduced when it comes to converting UI wireframes and database schemas into an actual working syste… ▽ More For a full-stack web or app development, it requires a software firm or more specifically a team of experienced developers to contribute a large portion of their time and resources to design the website and then convert it to code. As a result, the efficiency of the development team is significantly reduced when it comes to converting UI wireframes and database schemas into an actual working system. It would save valuable resources and fasten the overall workflow if the clients or developers can automate this process of converting the pre-made full-stack website design to get a partially working if not fully working code. In this paper, we present a novel approach of generating the skeleton code from sketched images using Deep Learning and Computer Vision approaches. The dataset for training are first-hand sketched images of low fidelity wireframes, database schemas and class diagrams. The approach consists of three parts. First, the front-end or UI elements detection and extraction from custom-made UI wireframes. Second, individual database table creation from schema designs and lastly, creating a class file from class diagrams. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: 12 pages, 10 figures, preprint

MSC Class: 68T07 (Primary) ACM Class: I.2.2; I.2.10; I.2.5; I.4.0; I.4.9; I.7.0; D.2.1; D.2.2

arXiv:2210.03332 [pdf, other]

Explainable AI based Glaucoma Detection using Transfer Learning and LIME

Authors: Touhidul Islam Chayan, Anita Islam, Eftykhar Rahman, Md. Tanzim Reza, Tasnim Sakib Apon, MD. Golam Rabiul Alam

Abstract: Glaucoma is the second driving reason for partial or complete blindness among all the visual deficiencies which mainly occurs because of excessive pressure in the eye due to anxiety or depression which damages the optic nerve and creates complications in vision. Traditional glaucoma screening is a time-consuming process that necessitates the medical professionals' constant attention, and even so t… ▽ More Glaucoma is the second driving reason for partial or complete blindness among all the visual deficiencies which mainly occurs because of excessive pressure in the eye due to anxiety or depression which damages the optic nerve and creates complications in vision. Traditional glaucoma screening is a time-consuming process that necessitates the medical professionals' constant attention, and even so time to time due to the time constrains and pressure they fail to classify correctly that leads to wrong treatment. Numerous efforts have been made to automate the entire glaucoma classification procedure however, these existing models in general have a black box characteristics that prevents users from understanding the key reasons behind the prediction and thus medical practitioners generally can not rely on these system. In this article after comparing with various pre-trained models, we propose a transfer learning model that is able to classify Glaucoma with 94.71\% accuracy. In addition, we have utilized Local Interpretable Model-Agnostic Explanations(LIME) that introduces explainability in our system. This improvement enables medical professionals obtain important and comprehensive information that aid them in making judgments. It also lessen the opacity and fragility of the traditional deep learning models. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2209.13461 [pdf, other]

BanglaSarc: A Dataset for Sarcasm Detection

Authors: Tasnim Sakib Apon, Ramisa Anan, Elizabeth Antora Modhu, Arjun Suter, Ifrit Jamal Sneha, MD. Golam Rabiul Alam

Abstract: Being one of the most widely spoken language in the world, the use of Bangla has been increasing in the world of social media as well. Sarcasm is a positive statement or remark with an underlying negative motivation that is extensively employed in today's social media platforms. There has been a significant improvement in sarcasm detection in English over the previous many years, however the situa… ▽ More Being one of the most widely spoken language in the world, the use of Bangla has been increasing in the world of social media as well. Sarcasm is a positive statement or remark with an underlying negative motivation that is extensively employed in today's social media platforms. There has been a significant improvement in sarcasm detection in English over the previous many years, however the situation regarding Bangla sarcasm detection remains unchanged. As a result, it is still difficult to identify sarcasm in bangla, and a lack of high-quality data is a major contributing factor. This article proposes BanglaSarc, a dataset constructed specifically for bangla textual data sarcasm detection. This dataset contains of 5112 comments/status and contents collected from various online social platforms such as Facebook, YouTube, along with a few online blogs. Due to the limited amount of data collection of categorized comments in Bengali, this dataset will aid in the of study identifying sarcasm, recognizing people's emotion, detecting various types of Bengali expressions, and other domains. The dataset is publicly available at https://www.kaggle.com/datasets/sakibapon/banglasarc. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.12652 [pdf, other]

AI-powered Language Assessment Tools for Dementia

Authors: Mahboobeh Parsapoor, Muhammad Raisul Alam, Alex Mihailidis

Abstract: The main objective of this paper is to propose an approach for develo** an Artificial Intelligence (AI)-powered Language Assessment (LA) tool. Such tools can be used to assess language impairments associated with dementia in older adults. The Machine Learning (ML) classifiers are the main parts of our proposed approach, therefore to develop an accurate tool with high sensitivity and specificity,… ▽ More The main objective of this paper is to propose an approach for develo** an Artificial Intelligence (AI)-powered Language Assessment (LA) tool. Such tools can be used to assess language impairments associated with dementia in older adults. The Machine Learning (ML) classifiers are the main parts of our proposed approach, therefore to develop an accurate tool with high sensitivity and specificity, we consider different binary classifiers and evaluate their performances. We also assess the reliability and validity of our approach by comparing the impact of different types of language tasks, features, and recording media on the performance of ML classifiers. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 27 Pages, 11 Tables, 16 Figures

arXiv:2112.06456 [pdf, other]

Real Time Action Recognition from Video Footage

Authors: Tasnim Sakib Apon, Mushfiqul Islam Chowdhury, MD Zubair Reza, Arpita Datta, Syeda Tan**a Hasan, MD. Golam Rabiul Alam

Abstract: Crime rate is increasing proportionally with the increasing rate of the population. The most prominent approach was to introduce Closed-Circuit Television (CCTV) camera-based surveillance to tackle the issue. Video surveillance cameras have added a new dimension to detect crime. Several research works on autonomous security camera surveillance are currently ongoing, where the fundamental goal is t… ▽ More Crime rate is increasing proportionally with the increasing rate of the population. The most prominent approach was to introduce Closed-Circuit Television (CCTV) camera-based surveillance to tackle the issue. Video surveillance cameras have added a new dimension to detect crime. Several research works on autonomous security camera surveillance are currently ongoing, where the fundamental goal is to discover violent activity from video feeds. From the technical viewpoint, this is a challenging problem because analyzing a set of frames, i.e., videos in temporal dimension to detect violence might need careful machine learning model training to reduce false results. This research focuses on this problem by integrating state-of-the-art Deep Learning methods to ensure a robust pipeline for autonomous surveillance for detecting violent activities, e.g., kicking, punching, and slap**. Initially, we designed a dataset of this specific interest, which contains 600 videos (200 for each action). Later, we have utilized existing pre-trained model architectures to extract features, and later used deep learning network for classification. Also, We have classified our models' accuracy, and confusion matrix on different pre-trained architectures like VGG16, InceptionV3, ResNet50, Xception and MobileNet V2 among which VGG16 and MobileNet V2 performed better. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2111.03890 [pdf, other]

doi 10.1109/CSDE53843.2021.9718400

Demystifying Deep Learning Models for Retinal OCT Disease Classification using Explainable AI

Authors: Tasnim Sakib Apon, Mohammad Mahmudul Hasan, Abrar Islam, MD. Golam Rabiul Alam

Abstract: In the world of medical diagnostics, the adoption of various deep learning techniques is quite common as well as effective, and its statement is equally true when it comes to implementing it into the retina Optical Coherence Tomography (OCT) sector, but (i)These techniques have the black box characteristics that prevent the medical professionals to completely trust the results generated from them… ▽ More In the world of medical diagnostics, the adoption of various deep learning techniques is quite common as well as effective, and its statement is equally true when it comes to implementing it into the retina Optical Coherence Tomography (OCT) sector, but (i)These techniques have the black box characteristics that prevent the medical professionals to completely trust the results generated from them (ii)Lack of precision of these methods restricts their implementation in clinical and complex cases (iii)The existing works and models on the OCT classification are substantially large and complicated and they require a considerable amount of memory and computational power, reducing the quality of classifiers in real-time applications. To meet these problems, in this paper a self-developed CNN model has been proposed which is comparatively smaller and simpler along with the use of Lime that introduces Explainable AI to the study and helps to increase the interpretability of the model. This addition will be an asset to the medical experts for getting major and detailed information and will help them in making final decisions and will also reduce the opacity and vulnerability of the conventional deep learning models. △ Less

Submitted 6 November, 2021; originally announced November 2021.

arXiv:2111.03882 [pdf, other]

doi 10.1109/ICTS52701.2021.9608407

Action Recognition using Transfer Learning and Majority Voting for CSGO

Authors: Tasnim Sakib Apon, Abrar Islam, MD. Golam Rabiul Alam

Abstract: Presently online video games have become a progressively favorite source of recreation and Counter Strike: Global Offensive (CS: GO) is one of the top-listed online first-person shooting games. Numerous competitive games are arranged every year by Esports. Nonetheless, (i) No study has been conducted on video analysis and action recognition of CS: GO game-play which can play a substantial role in… ▽ More Presently online video games have become a progressively favorite source of recreation and Counter Strike: Global Offensive (CS: GO) is one of the top-listed online first-person shooting games. Numerous competitive games are arranged every year by Esports. Nonetheless, (i) No study has been conducted on video analysis and action recognition of CS: GO game-play which can play a substantial role in the gaming industry for prediction model (ii) No work has been done on the real-time application on the actions and results of a CS: GO match (iii) Game data of a match is usually available in the HLTV as a CSV formatted file however it does not have open access and HLTV tends to prevent users from taking data. This manuscript aims to develop a model for accurate prediction of 4 different actions and compare the performance among the five different transfer learning models with our self-developed deep neural network and identify the best-fitted model and also including major voting later on, which is qualified to provide real time prediction and the result of this model aids to the construction of the automated system of gathering and processing more data alongside solving the issue of collecting data from HLTV. △ Less

Submitted 6 November, 2021; originally announced November 2021.

arXiv:2109.02838 [pdf]

Understanding the Social Determinants of Mental Health of the Undergraduate Students in Bangladesh: Interview Study

Authors: Ananya Bhattacharjee, S M Taiabul Haque, Abdul Hady, S. M. Raihanul Alam, Mashfiqui Rabbi, Muhammad Ashad Kabir, Syed Ishtiaque Ahmed

Abstract: Objective: This study aims to identify the social determinants of mental health among undergraduate students in Bangladesh, a develo** nation in South Asia. Our goal is to identify the broader social determinants of mental health among this population, study the manifestation of these determinants in their day-to-day life, and explore the feasibility of self-monitoring tools in hel** them iden… ▽ More Objective: This study aims to identify the social determinants of mental health among undergraduate students in Bangladesh, a develo** nation in South Asia. Our goal is to identify the broader social determinants of mental health among this population, study the manifestation of these determinants in their day-to-day life, and explore the feasibility of self-monitoring tools in hel** them identify the specific factors or relationships that impact their mental health. Methods: We conducted a 21-day study with 38 undergraduate students from seven universities in Bangladesh. We conducted two semi-structured interviews: one pre-study and one post-study. During the 21-day study, participants used an Android application to self-report and self-monitor their mood after each phone conversation. The app prompted participants to report their mood after each phone conversation and provided graphs and charts so that participants could independently review their mood and conversation patterns. Results: Our results show that academics, family, job and economic condition, romantic relationships, and religion are the major social determinants of mental health among undergraduate students in Bangladesh. Our app helped the participants pinpoint the specific issues related to these factors as participants could review the pattern of their moods and emotions from past conversation history. Although our app does not provide any explicit recommendation, participants took certain steps on their own to improve their mental health (e.g., reduced the frequency of communication with certain persons). Conclusions: Overall, the findings from this study would provide better insights for the researchers to design better solutions to help the younger population from this part of the world. △ Less

Submitted 6 September, 2021; originally announced September 2021.

arXiv:2107.13148 [pdf, other]

doi 10.1504/IJCSE.2023.129152

Combining Machine Learning Classifiers for Stock Trading with Effective Feature Extraction

Authors: A. K. M. Amanat Ullah, Fahim Imtiaz, Miftah Uddin Md Ihsan, Md. Golam Rabiul Alam, Mahbub Majumdar

Abstract: The unpredictability and volatility of the stock market render it challenging to make a substantial profit using any generalised scheme. Many previous studies tried different techniques to build a machine learning model, which can make a significant profit in the US stock market by performing live trading. However, very few studies have focused on the importance of finding the best features for a… ▽ More The unpredictability and volatility of the stock market render it challenging to make a substantial profit using any generalised scheme. Many previous studies tried different techniques to build a machine learning model, which can make a significant profit in the US stock market by performing live trading. However, very few studies have focused on the importance of finding the best features for a particular trading period. Our top approach used the performance to narrow down the features from a total of 148 to about 30. Furthermore, the top 25 features were dynamically selected before each time training our machine learning model. It uses ensemble learning with four classifiers: Gaussian Naive Bayes, Decision Tree, Logistic Regression with L1 regularization, and Stochastic Gradient Descent, to decide whether to go long or short on a particular stock. Our best model performed daily trade between July 2011 and January 2019, generating 54.35% profit. Finally, our work showcased that mixtures of weighted classifiers perform better than any individual predictor of making trading decisions in the stock market. △ Less

Submitted 11 August, 2023; v1 submitted 27 July, 2021; originally announced July 2021.

Journal ref: Int. J. Computational Science and Engineering, Vol. 26 No.1, (2023)

arXiv:2106.09076 [pdf, other]

doi 10.1002/mp.15075

Deformation Driven Seq2Seq Longitudinal Tumor and Organs-at-Risk Prediction for Radiotherapy

Authors: Donghoon Lee, Sadegh R Alam, Jue Jiang, Pengpeng Zhang, Saad Nadeem, Yu-Chi Hu

Abstract: Purpose: Radiotherapy presents unique challenges and clinical requirements for longitudinal tumor and organ-at-risk (OAR) prediction during treatment. The challenges include tumor inflammation/edema and radiation-induced changes in organ geometry, whereas the clinical requirements demand flexibility in input/output sequence timepoints to update the predictions on rolling basis and the grounding of… ▽ More Purpose: Radiotherapy presents unique challenges and clinical requirements for longitudinal tumor and organ-at-risk (OAR) prediction during treatment. The challenges include tumor inflammation/edema and radiation-induced changes in organ geometry, whereas the clinical requirements demand flexibility in input/output sequence timepoints to update the predictions on rolling basis and the grounding of all predictions in relationship to the pre-treatment imaging information for response and toxicity assessment in adaptive radiotherapy. Methods: To deal with the aforementioned challenges and to comply with the clinical requirements, we present a novel 3D sequence-to-sequence model based on Convolution Long Short Term Memory (ConvLSTM) that makes use of series of deformation vector fields (DVF) between individual timepoints and reference pre-treatment/planning CTs to predict future anatomical deformations and changes in gross tumor volume as well as critical OARs. High-quality DVF training data is created by employing hyper-parameter optimization on the subset of the training data with DICE coefficient and mutual information metric. We validated our model on two radiotherapy datasets: a publicly available head-and-neck dataset (28 patients with manually contoured pre-, mid-, and post-treatment CTs), and an internal non-small cell lung cancer dataset (63 patients with manually contoured planning CT and 6 weekly CBCTs). Results: The use of DVF representation and skip connections overcomes the blurring issue of ConvLSTM prediction with the traditional image representation. The mean and standard deviation of DICE for predictions of lung GTV at week 4, 5, and 6 were 0.83$\pm$0.09, 0.82$\pm$0.08, and 0.81$\pm$0.10, respectively, and for post-treatment ipsilateral and contralateral parotids, were 0.81$\pm$0.06 and 0.85$\pm$0.02. △ Less

Submitted 30 August, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Medical Physics 2021, Saad Nadeem and Yu-Chi Hu contributed equally

arXiv:2103.05690 [pdf, other]

doi 10.1002/mp.15083

Multitask 3D CBCT-to-CT Translation and Organs-at-Risk Segmentation Using Physics-Based Data Augmentation

Authors: Navdeep Dahiya, Sadegh R Alam, Pengpeng Zhang, Si-Yuan Zhang, Anthony Yezzi, Saad Nadeem

Abstract: In current clinical practice, noisy and artifact-ridden weekly cone-beam computed tomography (CBCT) images are only used for patient setup during radiotherapy. Treatment planning is done once at the beginning of the treatment using high-quality planning CT (pCT) images and manual contours for organs-at-risk (OARs) structures. If the quality of the weekly CBCT images can be improved while simultane… ▽ More In current clinical practice, noisy and artifact-ridden weekly cone-beam computed tomography (CBCT) images are only used for patient setup during radiotherapy. Treatment planning is done once at the beginning of the treatment using high-quality planning CT (pCT) images and manual contours for organs-at-risk (OARs) structures. If the quality of the weekly CBCT images can be improved while simultaneously segmenting OAR structures, this can provide critical information for adapting radiotherapy mid-treatment as well as for deriving biomarkers for treatment response. Using a novel physics-based data augmentation strategy, we synthesize a large dataset of perfectly/inherently registered planning CT and synthetic-CBCT pairs for locally advanced lung cancer patient cohort, which are then used in a multitask 3D deep learning framework to simultaneously segment and translate real weekly CBCT images to high-quality planning CT-like images. We compared the synthetic CT and OAR segmentations generated by the model to real planning CT and manual OAR segmentations and showed promising results. The real week 1 (baseline) CBCT images which had an average MAE of 162.77 HU compared to pCT images are translated to synthetic CT images that exhibit a drastically improved average MAE of 29.31 HU and average structural similarity of 92% with the pCT images. The average DICE scores of the 3D organs-at-risk segmentations are: lungs 0.96, heart 0.88, spinal cord 0.83 and esophagus 0.66. This approach could allow clinicians to adjust treatment plans using only the routine low-quality CBCT images, potentially improving patient outcomes. Our code, data, and pre-trained models will be made available via our physics-based data augmentation library, Physics-ArX, at https://github.com/nadeemlab/Physics-ArX. △ Less

Submitted 30 August, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Comments: Medical Physics 2021

arXiv:2102.08556 [pdf, other]

doi 10.1002/mp.14902

Deep cross-modality (MR-CT) educed distillation learning for cone beam CT lung tumor segmentation

Authors: Jue Jiang, Sadegh Riyahi Alam, Ishita Chen, Perry Zhang, Andreas Rimner, Joseph O. Deasy, Harini Veeraraghavan

Abstract: Despite the widespread availability of in-treatment room cone beam computed tomography (CBCT) imaging, due to the lack of reliable segmentation methods, CBCT is only used for gross set up corrections in lung radiotherapies. Accurate and reliable auto-segmentation tools could potentiate volumetric response assessment and geometry-guided adaptive radiation therapies. Therefore, we developed a new de… ▽ More Despite the widespread availability of in-treatment room cone beam computed tomography (CBCT) imaging, due to the lack of reliable segmentation methods, CBCT is only used for gross set up corrections in lung radiotherapies. Accurate and reliable auto-segmentation tools could potentiate volumetric response assessment and geometry-guided adaptive radiation therapies. Therefore, we developed a new deep learning CBCT lung tumor segmentation method. Methods: The key idea of our approach called cross modality educed distillation (CMEDL) is to use magnetic resonance imaging (MRI) to guide a CBCT segmentation network training to extract more informative features during training. We accomplish this by training an end-to-end network comprised of unpaired domain adaptation (UDA) and cross-domain segmentation distillation networks (SDN) using unpaired CBCT and MRI datasets. Feature distillation regularizes the student network to extract CBCT features that match the statistical distribution of MRI features extracted by the teacher network and obtain better differentiation of tumor from background.} We also compared against an alternative framework that used UDA with MR segmentation network, whereby segmentation was done on the synthesized pseudo MRI representation. All networks were trained with 216 weekly CBCTs and 82 T2-weighted turbo spin echo MRI acquired from different patient cohorts. Validation was done on 20 weekly CBCTs from patients not used in training. Independent testing was done on 38 weekly CBCTs from patients not used in training or validation. Segmentation accuracy was measured using surface Dice similarity coefficient (SDSC) and Hausdroff distance at 95th percentile (HD95) metrics. △ Less

Submitted 20 April, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: The paper has been accepted to Medical Physics

arXiv:2102.07127 [pdf]

Affective State Recognition through EEG Signals Feature Level Fusion and Ensemble Classifier

Authors: Md. Mahbubur Rahman, Akash Poddar, Md. Golam Rabiul Alam, Samrat Kumar Dey

Abstract: Human affects are complex paradox and an active research domain in affective computing. Affects are traditionally determined through a self-report based psychometric questionnaire or through facial expression recognition. However, few state-of-the-arts pieces of research have shown the possibilities of recognizing human affects from psychophysiological and neurological signals. In this article, el… ▽ More Human affects are complex paradox and an active research domain in affective computing. Affects are traditionally determined through a self-report based psychometric questionnaire or through facial expression recognition. However, few state-of-the-arts pieces of research have shown the possibilities of recognizing human affects from psychophysiological and neurological signals. In this article, electroencephalogram (EEG) signals are used to recognize human affects. The electroencephalogram (EEG) of 100 participants are collected where they are given to watch one-minute video stimuli to induce different affective states. The videos with emotional tags have a variety range of affects including happy, sad, disgust, and peaceful. The experimental stimuli are collected and analyzed intensively. The interrelationship between the EEG signal frequencies and the ratings given by the participants are taken into consideration for classifying affective states. Advanced feature extraction techniques are applied along with the statistical features to prepare a fused feature vector of affective state recognition. Factor analysis methods are also applied to select discriminative features. Finally, several popular supervised machine learning classifier is applied to recognize different affective states from the discriminative feature vector. Based on the experiment, the designed random forest classifier produces 89.06% accuracy in classifying four basic affective states. △ Less

Submitted 14 February, 2021; originally announced February 2021.

Comments: 18 pages, 7 figures

arXiv:2012.09918 [pdf, other]

Sorting in Memristive Memory

Authors: Mohsen Riahi Alam, M. Hassan Najafi, Nima TaheriNejad

Abstract: Sorting is needed in many application domains. The data is read from memory and sent to a general purpose processor or application specific hardware for sorting. The sorted data is then written back to the memory. Reading/writing data from/to memory and transferring data between memory and processing unit incur a large latency and energy overhead. In this work, we develop, to the best of our knowl… ▽ More Sorting is needed in many application domains. The data is read from memory and sent to a general purpose processor or application specific hardware for sorting. The sorted data is then written back to the memory. Reading/writing data from/to memory and transferring data between memory and processing unit incur a large latency and energy overhead. In this work, we develop, to the best of our knowledge, the first architectures for in-memory sorting of data. We propose two architectures. The first architecture is applicable to the conventional format of representing data, weighted binary radix. The second architecture is proposed for the develo** unary processing systems where data is encoded as uniform unary bitstreams. The two architectures have different advantages and disadvantages, making one or the other more suitable for a specific application. However, the common property of both is a significant reduction in the processing time compared to prior sorting designs. Our evaluations show on average 37x and 138x energy reduction for binary and unary designs, respectively, as compared to conventional CMOS off-memory sorting systems. △ Less

Submitted 4 February, 2022; v1 submitted 17 December, 2020; originally announced December 2020.

arXiv:2008.10148 [pdf, other]

Drive Safe: Cognitive-Behavioral Mining for Intelligent Transportation Cyber-Physical System

Authors: Md. Shirajum Munir, Sarder Fakhrul Abedin, Ki Tae Kim, Do Hyeon Kim, Md. Golam Rabiul Alam, Choong Seon Hong

Abstract: This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further,… ▽ More This paper presents a cognitive behavioral-based driver mood repairment platform in intelligent transportation cyber-physical systems (IT-CPS) for road safety. In particular, we propose a driving safety platform for distracted drivers, namely \emph{drive safe}, in IT-CPS. The proposed platform recognizes the distracting activities of the drivers as well as their emotions for mood repair. Further, we develop a prototype of the proposed drive safe platform to establish proof-of-concept (PoC) for the road safety in IT-CPS. In the developed driving safety platform, we employ five AI and statistical-based models to infer a vehicle driver's cognitive-behavioral mining to ensure safe driving during the drive. Especially, capsule network (CN), maximum likelihood (ML), convolutional neural network (CNN), Apriori algorithm, and Bayesian network (BN) are deployed for driver activity recognition, environmental feature extraction, mood recognition, sequential pattern mining, and content recommendation for affective mood repairment of the driver, respectively. Besides, we develop a communication module to interact with the systems in IT-CPS asynchronously. Thus, the developed drive safe PoC can guide the vehicle drivers when they are distracted from driving due to the cognitive-behavioral factors. Finally, we have performed a qualitative evaluation to measure the usability and effectiveness of the developed drive safe platform. We observe that the P-value is 0.0041 (i.e., < 0.05) in the ANOVA test. Moreover, the confidence interval analysis also shows significant gains in prevalence value which is around 0.93 for a 95% confidence level. The aforementioned statistical results indicate high reliability in terms of driver's safety and mental state. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Comments: Submitted to IEEE Transactions on Intelligent Transportation Systems, Special Issue on Technologies for risk mitigation and support of impaired drivers

arXiv:2007.05831 [pdf, other]

MFED: A System for Monitoring Family Eating Dynamics

Authors: Md Abu Sayeed Mondol, Brooke Bell, Meiyi Ma, Ridwan Alam, Ifat Emi, Sarah Masud Preum, Kayla de la Haye, Donna Spruijt-Metz, John C. Lach, John A. Stankovic

Abstract: Obesity is a risk factor for many health issues, including heart disease, diabetes, osteoarthritis, and certain cancers. One of the primary behavioral causes, dietary intake, has proven particularly challenging to measure and track. Current behavioral science suggests that family eating dynamics (FED) have high potential to impact child and parent dietary intake, and ultimately the risk of obesity… ▽ More Obesity is a risk factor for many health issues, including heart disease, diabetes, osteoarthritis, and certain cancers. One of the primary behavioral causes, dietary intake, has proven particularly challenging to measure and track. Current behavioral science suggests that family eating dynamics (FED) have high potential to impact child and parent dietary intake, and ultimately the risk of obesity. Monitoring FED requires information about when and where eating events are occurring, the presence or absence of family members during eating events, and some person-level states such as stress, mood, and hunger. To date, there exists no system for real-time monitoring of FED. This paper presents MFED, the first of its kind of system for monitoring FED in the wild in real-time. Smart wearables and Bluetooth beacons are used to monitor and detect eating activities and the location of the users at home. A smartphone is used for the Ecological Momentary Assessment (EMA) of a number of behaviors, states, and situations. While the system itself is novel, we also present a novel and efficient algorithm for detecting eating events from wrist-worn accelerometer data. The algorithm improves eating gesture detection F1-score by 19% with less than 20% computation compared to the state-of-the-art methods. To date, the MFED system has been deployed in 20 homes with a total of 74 participants, and responses from 4750 EMA surveys have been collected. This paper describes the system components, reports on the eating detection results from the deployments, proposes two techniques for improving ground truth collection after the system is deployed, and provides an overview of the FED data, generated from the multi-component system, that can be used to model and more comprehensively understand insights into the monitoring of family eating dynamics. △ Less

Submitted 11 July, 2020; originally announced July 2020.

arXiv:2007.01413 [pdf]

doi 10.1109/JBHI.2020.3035776

Wearable Respiration Monitoring: Interpretable Inference with Context and Sensor Biomarkers

Authors: Ridwan Alam, David B. Peden, John C. Lach

Abstract: Breathing rate (BR), minute ventilation (VE), and other respiratory parameters are essential for real-time patient monitoring in many acute health conditions, such as asthma. The clinical standard for measuring respiration, namely Spirometry, is hardly suitable for continuous use. Wearables can track many physiological signals, like ECG and motion, yet not respiration. Deriving respiration from ot… ▽ More Breathing rate (BR), minute ventilation (VE), and other respiratory parameters are essential for real-time patient monitoring in many acute health conditions, such as asthma. The clinical standard for measuring respiration, namely Spirometry, is hardly suitable for continuous use. Wearables can track many physiological signals, like ECG and motion, yet not respiration. Deriving respiration from other modalities has become an area of active research. In this work, we infer respiratory parameters from wearable ECG and wrist motion signals. We propose a modular and generalizable classification-regression pipeline to utilize available context information, such as physical activity, in learning context-conditioned inference models. Morphological and power domain novel features from the wearable ECG are extracted to use with these models. Exploratory feature selection methods are incorporated in this pipeline to discover application-specific interpretable biomarkers. Using data from 15 subjects, we evaluate two implementations of the proposed pipeline: for inferring BR and VE. Each implementation compares generalized linear model, random forest, support vector machine, Gaussian process regression, and neighborhood component analysis as contextual regression models. Permutation, regularization, and relevance determination methods are used to rank the ECG features to identify robust ECG biomarkers across models and activities. This work demonstrates the potential of wearable sensors not only in continuous monitoring, but also in designing biomarker-driven preventive measures. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: 10 pages, 10 figures

ACM Class: I.2.1

arXiv:2006.15713 [pdf]

Generalizable Cone Beam CT Esophagus Segmentation Using Physics-Based Data Augmentation

Authors: Sadegh R Alam, Tianfang Li, Pengpeng Zhang, Si-Yuan Zhang, Saad Nadeem

Abstract: Automated segmentation of esophagus is critical in image guided/adaptive radiotherapy of lung cancer to minimize radiation-induced toxicities such as acute esophagitis. We developed a semantic physics-based data augmentation method for segmenting esophagus in both planning CT (pCT) and cone-beam CT (CBCT) using 3D convolutional neural networks. 191 cases with their pCT and CBCTs from four independ… ▽ More Automated segmentation of esophagus is critical in image guided/adaptive radiotherapy of lung cancer to minimize radiation-induced toxicities such as acute esophagitis. We developed a semantic physics-based data augmentation method for segmenting esophagus in both planning CT (pCT) and cone-beam CT (CBCT) using 3D convolutional neural networks. 191 cases with their pCT and CBCTs from four independent datasets were used to train a modified 3D-Unet architecture with a multi-objective loss function specifically designed for soft-tissue organs such as esophagus. Scatter artifacts and noise were extracted from week 1 CBCTs using power law adaptive histogram equalization method and induced to the corresponding pCT followed by reconstruction using CBCT reconstruction parameters. Moreover, we leverage physics-based artifact induced pCTs to drive the esophagus segmentation in real weekly CBCTs. Segmentations were evaluated using geometric Dice and Hausdorff distance as well as dosimetrically using mean esophagus dose and D5cc. Due to the physics-based data augmentation, our model trained just on the synthetic CBCTs was robust and generalizable enough to also produce state-of-the-art results on the pCTs and CBCTs, achieving 0.81 and 0.74 Dice overlap. Our physics-based data augmentation spans the realistic noise/artifact spectrum across patient CBCT/pCT data and can generalize well across modalities with the potential to improve the accuracy of treatment setup and response analysis. △ Less

Submitted 30 January, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

Comments: Accepted to Physics in Medicine & Biology 2021

arXiv:1807.03909 [pdf]

doi 10.1109/ICIEV.2014.6850685

Emotion Recognition from Speech based on Relevant Feature and Majority Voting

Authors: Md. Kamruzzaman Sarker, Kazi Md. Rokibul Alam, Md. Arifuzzaman

Abstract: This paper proposes an approach to detect emotion from human speech employing majority voting technique over several machine learning techniques. The contribution of this work is in two folds: firstly it selects those features of speech which is most promising for classification and secondly it uses the majority voting technique that selects the exact class of emotion. Here, majority voting techni… ▽ More This paper proposes an approach to detect emotion from human speech employing majority voting technique over several machine learning techniques. The contribution of this work is in two folds: firstly it selects those features of speech which is most promising for classification and secondly it uses the majority voting technique that selects the exact class of emotion. Here, majority voting technique has been applied over Neural Network (NN), Decision Tree (DT), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). Input vector of NN, DT, SVM and KNN consists of various acoustic and prosodic features like Pitch, Mel-Frequency Cepstral coefficients etc. From speech signal many feature have been extracted and only promising features have been selected. To consider a feature as promising, Fast Correlation based feature selection (FCBF) and Fisher score algorithms have been used and only those features are selected which are highly ranked by both of them. The proposed approach has been tested on Berlin dataset of emotional speech [3] and Electromagnetic Articulography (EMA) dataset [4]. The experimental result shows that majority voting technique attains better accuracy over individual machine learning techniques. The employment of the proposed approach can effectively recognize the emotion of human beings in case of social robot, intelligent chat client, call-center of a company etc. △ Less

Submitted 10 July, 2018; originally announced July 2018.

Journal ref: International Conference on Informatics, Electronics & Vision (ICIEV) (2014) 1-5

arXiv:1512.05596 [pdf]

doi 10.12691/iscf-3-2-2

An Incoercible E-Voting Scheme Based on Revised Simplified Verifiable Re-encryption Mix-nets

Authors: Shinsuke Tamura, Hazim A. Haddad, Nazmul Islam, Kazi Md. Rokibul Alam

Abstract: Simplified verifiable re-encryption mix-net (SVRM) is revised and a scheme for e-voting systems is developed based on it. The developed scheme enables e-voting systems to satisfy all essential requirements of elections. Namely, they satisfy requirements about privacy, verifiability, fairness and robustness. It also successfully protects voters from coercers except cases where the coercers force vo… ▽ More Simplified verifiable re-encryption mix-net (SVRM) is revised and a scheme for e-voting systems is developed based on it. The developed scheme enables e-voting systems to satisfy all essential requirements of elections. Namely, they satisfy requirements about privacy, verifiability, fairness and robustness. It also successfully protects voters from coercers except cases where the coercers force voters to abstain from elections. In detail, voters can conceal correspondences between them and their votes, anyone can verify the accuracy of election results, and interim election results are concealed from any entity. About incoercibility, provided that erasable-state voting booths which disable voters to memorize complete information exchanged between them and election authorities for constructing votes are available, coercer C cannot know candidates that voters coerced by C had chosen even if the candidates are unique to the voters. In addition, elections can be completed without reelections even when votes were handled illegitimately. △ Less

Submitted 17 December, 2015; originally announced December 2015.

Journal ref: Information Security and Computer Fraud 3 (2015) 32-38

arXiv:1205.6229 [pdf]

An Approach of Digital Image Copyright Protection by Using Watermarking Technology

Authors: Md. Selim Reza, Mohammed Shafiul Alam Khan, Md. Golam Robiul Alam, Serajul Islam

Abstract: Digital watermarking system is a paramount for safeguarding valuable resources and information. Digital watermarks are generally imperceptible to the human eye and ear. Digital watermark can be used in video, audio and digital images for a wide variety of applications such as copy prevention right management, authentication and filtering of internet content. The proposed system is able to protect… ▽ More Digital watermarking system is a paramount for safeguarding valuable resources and information. Digital watermarks are generally imperceptible to the human eye and ear. Digital watermark can be used in video, audio and digital images for a wide variety of applications such as copy prevention right management, authentication and filtering of internet content. The proposed system is able to protect copyright or owner identification of digital media, such as audio, image, video, or text. The system permutated the watermark and embed the permutated watermark into the wavelet coefficients of the original image by using a key. The key is randomly generated and used to select the locations in the wavelet domain in which to embed the permutated watermark. Finally, the system combines the concept of cryptography and digital watermarking techniques to implement a more secure digital watermarking system. △ Less

Submitted 28 May, 2012; originally announced May 2012.

Comments: 7 pages, 6 figures. arXiv admin note: text overlap with arXiv:1103.3802 by other authors

Journal ref: International Journal of Computer Science Issues, Vol. 9, Issue 2, No 2, 2012, pp:280-286

arXiv:1202.1918 [pdf]

A Reliable Semi-Distributed Load Balancing Architecture of Heterogeneous Wireless Networks

Authors: Md. Golam Rabiul Alam, Chayan Biswas, Naushin Nower, Mohammed Shafiul Alam Khan

Abstract: Now a day's Heterogeneous wireless network is a promising field of research interest. Various challenges exist in this hybrid combination like load balancing, resource management and so on. In this paper we introduce a reliable load balancing architecture for heterogeneous wireless communications to ensure certain level of quality of service. To conquer the problem of centralized and distributed d… ▽ More Now a day's Heterogeneous wireless network is a promising field of research interest. Various challenges exist in this hybrid combination like load balancing, resource management and so on. In this paper we introduce a reliable load balancing architecture for heterogeneous wireless communications to ensure certain level of quality of service. To conquer the problem of centralized and distributed design, a semi distributed load balancing architecture for multiple access networks is introduced. In this grid based design multiple Load and Mobile Agent Management Units is incorporated. To prove the compactness of the design, integrated reliability, signalling overhead and total processing time is calculated. And finally simulation result shows that overall system performance is improved by enhancing reliability, reducing signalling overhead and processing time. △ Less

Submitted 9 February, 2012; originally announced February 2012.

Comments: Page 15 No of figure: 8

Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.4, No.1, January 2012

Showing 1–38 of 38 results for author: Alam, R