-
Sound Tagging in Infant-centric Home Soundscapes
Authors:
Mohammad Nur Hossain Khan,
Jialu Li,
Nancy L. McElwain,
Mark Hasegawa-Johnson,
Bashima Islam
Abstract:
Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or…
▽ More
Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors
Authors:
Sheikh Asif Imran,
Mohammad Nur Hossain Khan,
Subrata Biswas,
Bashima Islam
Abstract:
Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpret…
▽ More
Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpreting and responding to activity and motion analysis queries. Our evaluation demonstrates LLaSA's effectiveness in activity classification and question answering, highlighting its potential in healthcare, sports science, and human-computer interaction. These contributions advance sensor-aware language models and open new research avenues. Our code repository and datasets can be found on https://github.com/BASHLab/LLaSA.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Authors:
Payal Mohapatra,
Shamika Likhite,
Subrata Biswas,
Bashima Islam,
Qi Zhu
Abstract:
Most existing speech disfluency detection techniques only rely upon acoustic data. In this work, we present a practical multimodal disfluency detection approach that leverages available video data together with audio. We curate an audiovisual dataset and propose a novel fusion technique with unified weight-sharing modality-agnostic encoders to learn the temporal and semantic context. Our resilient…
▽ More
Most existing speech disfluency detection techniques only rely upon acoustic data. In this work, we present a practical multimodal disfluency detection approach that leverages available video data together with audio. We curate an audiovisual dataset and propose a novel fusion technique with unified weight-sharing modality-agnostic encoders to learn the temporal and semantic context. Our resilient design accommodates real-world scenarios where the video modality may sometimes be missing during inference. We also present alternative fusion strategies when both modalities are assured to be complete. In experiments across five disfluency-detection tasks, our unified multimodal approach significantly outperforms Audio-only unimodal methods, yielding an average absolute improvement of 10% (i.e., 10 percentage point increase) when both video and audio modalities are always available, and 7% even when video modality is missing in half of the samples.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems
Authors:
Pietro Farina,
Subrata Biswas,
Eren Yıldız,
Khakim Akhunov,
Saad Ahmed,
Bashima Islam,
Kasım Sinan Yıldırım
Abstract:
Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compressio…
▽ More
Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Offensive Language Identification in Transliterated and Code-Mixed Bangla
Authors:
Md Nishat Raihan,
Umma Hani Tanmoy,
Anika Binte Islam,
Kai North,
Tharindu Ranasinghe,
Antonios Anastasopoulos,
Marcos Zampieri
Abstract:
Identifying offensive content in social media is vital for creating safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introdu…
▽ More
Identifying offensive content in social media is vital for creating safe online communities. Several recent studies have addressed this problem by creating datasets for various languages. In this paper, we explore offensive language identification in texts with transliterations and code-mixing, linguistic phenomena common in multilingual societies, and a known challenge for NLP systems. We introduce TB-OLID, a transliterated Bangla offensive language dataset containing 5,000 manually annotated comments. We train and fine-tune machine learning models on TB-OLID, and we evaluate their results on this dataset. Our results show that English pre-trained transformer-based models, such as fBERT and HateBERT achieve the best performance on this dataset.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
An Empirical Analysis on Remittances and Financial Development in Latin American Countries
Authors:
Sumaiya Binta Islam,
Laboni Mondal
Abstract:
Remittances have become one of the driving forces of development for countries all over the world, especially in lower-middle-income nations. This paper empirically investigates the association between remittance flows and financial development in 4 lower-middle-income countries of Latin America. By using a panel data set from 1996 to 2019, the study revealed that remittances and financial develop…
▽ More
Remittances have become one of the driving forces of development for countries all over the world, especially in lower-middle-income nations. This paper empirically investigates the association between remittance flows and financial development in 4 lower-middle-income countries of Latin America. By using a panel data set from 1996 to 2019, the study revealed that remittances and financial development are positively associated in these countries. The study also discovered that foreign direct investment and inflation were positively correlated with financial development while trade openness had a negative association with financial development. Therefore, policymakers of these countries should implement and formulate such policies so that migrant workers would have the incentives to send money through formal channels, which will augment the effect of remittances on the recipient country.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents
Authors:
Imam Mohammad Zulkarnain,
Shayekh Bin Islam,
Md. Zami Al Zunaed Farabe,
Md. Mehedi Hasan Shawon,
Jawaril Munshad Abedin,
Beig Rajibul Hasan,
Marsia Haque,
Istiak Shihab,
Syed Mobassir,
MD. Nazmuddoha Ansary,
Asif Sushmit,
Farig Sadeque
Abstract:
Despite the existence of numerous Optical Character Recognition (OCR) tools, the lack of comprehensive open-source systems hampers the progress of document digitization in various low-resource languages, including Bengali. Low-resource languages, especially those with an alphasyllabary writing system, suffer from the lack of large-scale datasets for various document OCR components such as word-lev…
▽ More
Despite the existence of numerous Optical Character Recognition (OCR) tools, the lack of comprehensive open-source systems hampers the progress of document digitization in various low-resource languages, including Bengali. Low-resource languages, especially those with an alphasyllabary writing system, suffer from the lack of large-scale datasets for various document OCR components such as word-level OCR, document layout extraction, and distortion correction; which are available as individual modules in high-resource languages. In this paper, we introduce Bengali$.$AI-BRACU-OCR (bbOCR): an open-source scalable document OCR system that can reconstruct Bengali documents into a structured searchable digitized format that leverages a novel Bengali text recognition model and two novel synthetic datasets. We present extensive component-level and system-level evaluation: both use a novel diversified evaluation dataset and comprehensive evaluation metrics. Our extensive evaluation suggests that our proposed solution is preferable over the current state-of-the-art Bengali OCR systems. The source codes and datasets are available here: https://bengaliai.github.io/bbocr.
△ Less
Submitted 21 August, 2023; v1 submitted 21 August, 2023;
originally announced August 2023.
-
Denoising Diffusion Probabilistic Model for Retinal Image Generation and Segmentation
Authors:
Alnur Alimanov,
Md Baharul Islam
Abstract:
Experts use retinal images and vessel trees to detect and diagnose various eye, blood circulation, and brain-related diseases. However, manual segmentation of retinal images is a time-consuming process that requires high expertise and is difficult due to privacy issues. Many methods have been proposed to segment images, but the need for large retinal image datasets limits the performance of these…
▽ More
Experts use retinal images and vessel trees to detect and diagnose various eye, blood circulation, and brain-related diseases. However, manual segmentation of retinal images is a time-consuming process that requires high expertise and is difficult due to privacy issues. Many methods have been proposed to segment images, but the need for large retinal image datasets limits the performance of these methods. Several methods synthesize deep learning models based on Generative Adversarial Networks (GAN) to generate limited sample varieties. This paper proposes a novel Denoising Diffusion Probabilistic Model (DDPM) that outperformed GANs in image synthesis. We developed a Retinal Trees (ReTree) dataset consisting of retinal images, corresponding vessel trees, and a segmentation network based on DDPM trained with images from the ReTree dataset. In the first stage, we develop a two-stage DDPM that generates vessel trees from random numbers belonging to a standard normal distribution. Later, the model is guided to generate fundus images from given vessel trees and random distribution. The proposed dataset has been evaluated quantitatively and qualitatively. Quantitative evaluation metrics include Frechet Inception Distance (FID) score, Jaccard similarity coefficient, Cohen's kappa, Matthew's Correlation Coefficient (MCC), precision, recall, F1-score, and accuracy. We trained the vessel segmentation model with synthetic data to validate our dataset's efficiency and tested it on authentic data. Our developed dataset and source code is available at https://github.com/AAleka/retree.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Uncovering local aggregated air quality index with smartphone captured images leveraging efficient deep convolutional neural network
Authors:
Joyanta Jyoti Mondal,
Md. Farhadul Islam,
Raima Islam,
Nowsin Kabir Rhidi,
Sarfaraz Newaz,
Meem Arafat Manab,
A. B. M. Alim Al Islam,
Jannatun Noor
Abstract:
The prevalence and mobility of smartphones make these a widely used tool for environmental health research. However, their potential for determining aggregated air quality index (AQI) based on PM2.5 concentration in specific locations remains largely unexplored in the existing literature. In this paper, we thoroughly examine the challenges associated with predicting location-specific PM2.5 concent…
▽ More
The prevalence and mobility of smartphones make these a widely used tool for environmental health research. However, their potential for determining aggregated air quality index (AQI) based on PM2.5 concentration in specific locations remains largely unexplored in the existing literature. In this paper, we thoroughly examine the challenges associated with predicting location-specific PM2.5 concentration using images taken with smartphone cameras. The focus of our study is on Dhaka, the capital of Bangladesh, due to its significant air pollution levels and the large population exposed to it. Our research involves the development of a Deep Convolutional Neural Network (DCNN), which we train using over a thousand outdoor images taken and annotated. These photos are captured at various locations in Dhaka, and their labels are based on PM2.5 concentration data obtained from the local US consulate, calculated using the NowCast algorithm. Through supervised learning, our model establishes a correlation index during training, enhancing its ability to function as a Picture-based Predictor of PM2.5 Concentration (PPPC). This enables the algorithm to calculate an equivalent daily averaged AQI index from a smartphone image. Unlike, popular overly parameterized models, our model shows resource efficiency since it uses fewer parameters. Furthermore, test results indicate that our model outperforms popular models like ViT and INN, as well as popular CNN-based models such as VGG19, ResNet50, and MobileNetV2, in predicting location-specific PM2.5 concentration. Our dataset is the first publicly available collection that includes atmospheric images and corresponding PM2.5 measurements from Dhaka. Our codes and dataset are available at https://github.com/lepotatoguy/aqi.
△ Less
Submitted 18 January, 2024; v1 submitted 6 August, 2023;
originally announced August 2023.
-
Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Authors:
Kai Chieh Chang,
Mark Hasegawa-Johnson,
Nancy L. McElwain,
Bashima Islam
Abstract:
Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal w…
▽ More
Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal wearable device, LittleBeats (LB), to collect audio, electrocardiogram (ECG), and inertial measurement unit (IMU) data among a cohort of 28 infants. We employed a 3-branch (audio/ECG/IMU) large scale transformer-based neural network (NN) to demonstrate the potential of such multi-modal data. We pretrained each branch independently with its respective modality, then finetuned the model by fusing the pretrained transformer layers with cross-attention. We show that multi-modal data significantly improves sleep/wake classification (accuracy = 0.880), compared with use of a single modality (accuracy = 0.732). Our approach to multi-modal mid-level fusion may be adaptable to a diverse range of architectures and tasks, expanding future directions of infant behavioral research.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Saliency-aware Stereoscopic Video Retargeting
Authors:
Hassan Imani,
Md Baharul Islam,
Lai-Kuan Wong
Abstract:
Stereo video retargeting aims to resize an image to a desired aspect ratio. The quality of retargeted videos can be significantly impacted by the stereo videos spatial, temporal, and disparity coherence, all of which can be impacted by the retargeting process. Due to the lack of a publicly accessible annotated dataset, there is little research on deep learning-based methods for stereo video retarg…
▽ More
Stereo video retargeting aims to resize an image to a desired aspect ratio. The quality of retargeted videos can be significantly impacted by the stereo videos spatial, temporal, and disparity coherence, all of which can be impacted by the retargeting process. Due to the lack of a publicly accessible annotated dataset, there is little research on deep learning-based methods for stereo video retargeting. This paper proposes an unsupervised deep learning-based stereo video retargeting network. Our model first detects the salient objects and shifts and warps all objects such that it minimizes the distortion of the salient parts of the stereo frames. We use 1D convolution for shifting the salient objects and design a stereo video Transformer to assist the retargeting process. To train the network, we use the parallax attention mechanism to fuse the left and right views and feed the retargeted frames to a reconstruction module that reverses the retargeted frames to the input frames. Therefore, the network is trained in an unsupervised manner. Extensive qualitative and quantitative experiments and ablation studies on KITTI stereo 2012 and 2015 datasets demonstrate the efficiency of the proposed method over the existing state-of-the-art methods. The code is available at https://github.com/z65451/SVR/.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Amalgamated Intermittent Computing Systems
Authors:
Bashima Islam,
Yubo Luo,
Shahriar Nirjon
Abstract:
Intermittent computing systems undergo frequent power failure, hindering necessary data sample capture or timely on-device computation. These missing samples and deadlines limit the potential usage of intermittent computing systems in many time-sensitive and fault-tolerant applications. However, a group/swarm of intermittent nodes may amalgamate to sense and process all the samples by taking turns…
▽ More
Intermittent computing systems undergo frequent power failure, hindering necessary data sample capture or timely on-device computation. These missing samples and deadlines limit the potential usage of intermittent computing systems in many time-sensitive and fault-tolerant applications. However, a group/swarm of intermittent nodes may amalgamate to sense and process all the samples by taking turns in waking up and extending their collective on-time. However, coordinating a swarm of intermittent computing nodes requires frequent and power-hungry communication, often infeasible with limited energy. Though previous works have shown promises when all intermittent nodes have access to the same amount of energy to harvest, work has yet to be looked into scenarios when the available energy distribution is different for each node. The proposed AICS framework provides an amalgamated intermittent computing system where each node schedules its wake-up schedules based on the duty cycle without communication overhead. We propose one offline tailored duty cycle selection method (Prime-Co-Prime), which schedules wake-up and sleep cycles for each node based on the measured energy to harvest for each node and the prior knowledge or estimation regarding the relative energy distribution. However, when the energy is variable, the problem is formulated as a Decentralized-Partially Observable Markov Decision Process (Dec-POMDP). Each node uses a group of heuristics to solve the Dec-POMDP and schedule its wake-up cycle.
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
Sherlock in OSS: A Novel Approach of Content-Based Searching in Object Storage System
Authors:
Jannatun Noor,
Rizwanul Haque Ratul,
Mir Rownak Ali Uday,
Joyanta Jyoti Mondal,
Md. Sadiqul Islam Sakif,
A. B. M. Alim Al Islam
Abstract:
Object Storage Systems (OSS) inside a cloud promise scalability, durability, availability, and concurrency. However, open-source OSS does not have a specific approach to letting users and administrators search based on the data, which is contained inside the object storage, without involving the entire cloud infrastructure. Therefore, in this paper, we propose Sherlock, a novel Content-Based Searc…
▽ More
Object Storage Systems (OSS) inside a cloud promise scalability, durability, availability, and concurrency. However, open-source OSS does not have a specific approach to letting users and administrators search based on the data, which is contained inside the object storage, without involving the entire cloud infrastructure. Therefore, in this paper, we propose Sherlock, a novel Content-Based Searching (CoBS) architecture to extract additional information from images and documents. Here, we store the additional information in an Elasticsearch-enabled database, which helps us to search for our desired data based on its contents. This approach works in two sequential stages. First, the data will be uploaded to a classifier that will determine the data type and send it to the specific model for the data. Here, the images that are being uploaded are sent to our trained model for object detection, and the documents are sent for keyword extraction. Next, the extracted information is sent to Elasticsearch, which enables searching based on the contents. Because the precision of the models is so fundamental to the search's correctness, we train our models with comprehensive datasets (Microsoft COCO Dataset for multimedia data and SemEval2017 Dataset for document data). Furthermore, we put our designed architecture to the test with a real-world implementation of an open-source OSS called OpenStack Swift. We upload images into the dataset of our implementation in various segments to find out the efficacy of our proposed model in real-life Swift object storage.
△ Less
Submitted 6 May, 2023; v1 submitted 24 January, 2023;
originally announced March 2023.
-
Retinal Image Restoration using Transformer and Cycle-Consistent Generative Adversarial Network
Authors:
Alnur Alimanov,
Md Baharul Islam
Abstract:
Medical imaging plays a significant role in detecting and treating various diseases. However, these images often happen to be of too poor quality, leading to decreased efficiency, extra expenses, and even incorrect diagnoses. Therefore, we propose a retinal image enhancement method using a vision transformer and convolutional neural network. It builds a cycle-consistent generative adversarial netw…
▽ More
Medical imaging plays a significant role in detecting and treating various diseases. However, these images often happen to be of too poor quality, leading to decreased efficiency, extra expenses, and even incorrect diagnoses. Therefore, we propose a retinal image enhancement method using a vision transformer and convolutional neural network. It builds a cycle-consistent generative adversarial network that relies on unpaired datasets. It consists of two generators that translate images from one domain to another (e.g., low- to high-quality and vice versa), playing an adversarial game with two discriminators. Generators produce indistinguishable images for discriminators that predict the original images from generated ones. Generators are a combination of vision transformer (ViT) encoder and convolutional neural network (CNN) decoder. Discriminators include traditional CNN encoders. The resulting improved images have been tested quantitatively using such evaluation metrics as peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and qualitatively, i.e., vessel segmentation. The proposed method successfully reduces the adverse effects of blurring, noise, illumination disturbances, and color distortions while significantly preserving structural and color information. Experimental results show the superiority of the proposed method. Our testing PSNR is 31.138 dB for the first and 27.798 dB for the second dataset. Testing SSIM is 0.919 and 0.904, respectively.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
RecNet: Early Attention Guided Feature Recovery
Authors:
Subrata Biswas,
Bashima Islam
Abstract:
Uncertainty in sensors results in corrupted input streams and hinders the performance of Deep Neural Networks (DNN), which focus on deducing information from data. However, for sensors with multiple input streams, the relevant information among the streams correlates and hence contains mutual information. This paper utilizes this opportunity to recover the perturbed information due to corrupted in…
▽ More
Uncertainty in sensors results in corrupted input streams and hinders the performance of Deep Neural Networks (DNN), which focus on deducing information from data. However, for sensors with multiple input streams, the relevant information among the streams correlates and hence contains mutual information. This paper utilizes this opportunity to recover the perturbed information due to corrupted input streams. We propose RecNet, which estimates the information entropy at every element of the input feature to the network and interpolates the missing information in the input feature matrix. Finally, using the estimated information entropy and interpolated data, we introduce a novel guided replacement procedure to recover the complete information that is the input to the downstream DNN task. We evaluate the proposed algorithm on a sound event detection and localization application where audio streams from the microphone array are corrupted. We have recovered the performance drop due to the corrupted input stream and reduced the localization error with non-corrupted input streams.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
HiMFR: A Hybrid Masked Face Recognition Through Face Inpainting
Authors:
Md Imran Hosen,
Md Baharul Islam
Abstract:
To recognize the masked face, one of the possible solutions could be to restore the occluded part of the face first and then apply the face recognition method. Inspired by the recent image inpainting methods, we propose an end-to-end hybrid masked face recognition system, namely HiMFR, consisting of three significant parts: masked face detector, face inpainting, and face recognition. The masked fa…
▽ More
To recognize the masked face, one of the possible solutions could be to restore the occluded part of the face first and then apply the face recognition method. Inspired by the recent image inpainting methods, we propose an end-to-end hybrid masked face recognition system, namely HiMFR, consisting of three significant parts: masked face detector, face inpainting, and face recognition. The masked face detector module applies a pretrained Vision Transformer (ViT\_b32) to detect whether faces are covered with masked or not. The inpainting module uses a fine-tune image inpainting model based on a Generative Adversarial Network (GAN) to restore faces. Finally, the hybrid face recognition module based on ViT with an EfficientNetB3 backbone recognizes the faces. We have implemented and evaluated our proposed method on four different publicly available datasets: CelebA, SSDMNV2, MAFA, {Pubfig83} with our locally collected small dataset, namely Face5. Comprehensive experimental results show the efficacy of the proposed HiMFR method with competitive performance. Code is available at https://github.com/mdhosen/HiMFR
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Masked Face Inpainting Through Residual Attention UNet
Authors:
Md Imran Hosen,
Md Baharul Islam
Abstract:
Realistic image restoration with high texture areas such as removing face masks is challenging. The state-of-the-art deep learning-based methods fail to guarantee high-fidelity, cause training instability due to vanishing gradient problems (e.g., weights are updated slightly in initial layers) and spatial information loss. They also depend on intermediary stage such as segmentation meaning require…
▽ More
Realistic image restoration with high texture areas such as removing face masks is challenging. The state-of-the-art deep learning-based methods fail to guarantee high-fidelity, cause training instability due to vanishing gradient problems (e.g., weights are updated slightly in initial layers) and spatial information loss. They also depend on intermediary stage such as segmentation meaning require external mask. This paper proposes a blind mask face inpainting method using residual attention UNet to remove the face mask and restore the face with fine details while minimizing the gap with the ground truth face structure. A residual block feeds info to the next layer and directly into the layers about two hops away to solve the gradient vanishing problem. Besides, the attention unit helps the model focus on the relevant mask region, reducing resources and making the model faster. Extensive experiments on the publicly available CelebA dataset show the feasibility and robustness of our proposed model. Code is available at \url{https://github.com/mdhosen/Mask-Face-Inpainting-Using-Residual-Attention-Unet}
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Retinal Image Restoration and Vessel Segmentation using Modified Cycle-CBAM and CBAM-UNet
Authors:
Alnur Alimanov,
Md Baharul Islam
Abstract:
Clinical screening with low-quality fundus images is challenging and significantly leads to misdiagnosis. This paper addresses the issue of improving the retinal image quality and vessel segmentation through retinal image restoration. More specifically, a cycle-consistent generative adversarial network (CycleGAN) with a convolution block attention module (CBAM) is used for retinal image restoratio…
▽ More
Clinical screening with low-quality fundus images is challenging and significantly leads to misdiagnosis. This paper addresses the issue of improving the retinal image quality and vessel segmentation through retinal image restoration. More specifically, a cycle-consistent generative adversarial network (CycleGAN) with a convolution block attention module (CBAM) is used for retinal image restoration. A modified UNet is used for retinal vessel segmentation for the restored retinal images (CBAM-UNet). The proposed model consists of two generators and two discriminators. Generators translate images from one domain to another, i.e., from low to high quality and vice versa. Discriminators classify generated and original images. The retinal vessel segmentation model uses downsampling, bottlenecking, and upsampling layers to generate segmented images. The CBAM has been used to enhance the feature extraction of these models. The proposed method does not require paired image datasets, which are challenging to produce. Instead, it uses unpaired data that consists of low- and high-quality fundus images retrieved from publicly available datasets. The restoration performance of the proposed method was evaluated using full-reference evaluation metrics, e.g., peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). The retinal vessel segmentation performance was compared with the ground-truth fundus images. The proposed method can significantly reduce the degradation effects caused by out-of-focus blurring, color distortion, low, high, and uneven illumination. Experimental results show the effectiveness of the proposed method for retinal image restoration and vessel segmentation.
△ Less
Submitted 5 October, 2022; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Shashthosheba: Dissecting Perception of Bangladeshi People towards Telemedicine Apps through the Lens of Features of the Apps
Authors:
Waqar Hassan Khan,
Md Al Imran,
Ahmed Nafis Fuad,
Mohammed Latif Siddiq,
A. B. M. Alim Al Islam
Abstract:
Bangladesh, a develo** country with a large and dense population, has recently seen significant economic as well as technological developments. The growth of technology has resulted in a dramatic increase in the number of smartphone users in Bangladesh, and as such, mobile apps have become an increasingly important part of peoples' life, even encompassing healthcare services. However, the apps u…
▽ More
Bangladesh, a develo** country with a large and dense population, has recently seen significant economic as well as technological developments. The growth of technology has resulted in a dramatic increase in the number of smartphone users in Bangladesh, and as such, mobile apps have become an increasingly important part of peoples' life, even encompassing healthcare services. However, the apps used in healthcare (telemedicine to be specific) in Bangladesh are yet to be studied from the perspective of their features as per the voices of the users as well as service providers. Therefore, in this study, we focus on the features of the telemedicine apps used in Bangladesh. First, we evaluated the present status of existing telemedicine apps in Bangladesh, as well as their benefits and drawbacks in the context of HCI. We analyzed publicly accessible reviews of several Bangladeshi telemedicine apps (N = 14) to evaluate the user impressions. Additionally, to ascertain the public opinion of these apps, we performed a survey in which the patients (N = 87) participated willingly. Our analysis of the collected opinions reveals what users experience, what they appreciate, and what they are concerned about when they use telemedicine apps. Additionally, our study demonstrates what users expect from telemedicine apps, independent of their past experience. Finally, we explore how to address the issues we discovered and how telemedicine may be used to effectively offer healthcare services throughout the country. To the best of our knowledge, this study is the first to analyze the perception of the people of Bangladesh towards telemedicine apps from the perspective of features of the apps.
△ Less
Submitted 6 May, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
Authors:
Hassan Imani,
Md Baharul Islam,
Lai-Kuan Wong
Abstract:
Stereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo…
▽ More
Stereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
An Efficient End-to-End Deep Neural Network for Interstitial Lung Disease Recognition and Classification
Authors:
Masum Shah Junayed,
Afsana Ahsan Jeny,
Md Baharul Islam,
Ikhtiar Ahmed,
A F M Shahen Shah
Abstract:
The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and R…
▽ More
The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and Rectified Linear Unit (ReLU) activation function, followed by batch normalization and max-pooling with a size equal to the final feature map size well as four dense layers. We used the ADAM optimizer to minimize categorical cross-entropy. A dataset consisting of 21328 image patches of 128 CT scans with five classes is taken to train and assess the proposed model. A comparison study showed that the presented model outperformed pre-trained CNNs and five-fold cross-validation on the same dataset. For ILDs pattern classification, the proposed approach achieved the accuracy scores of 99.09% and the average F score of 97.9%, outperforming three pre-trained CNNs. These outcomes show that the proposed model is relatively state-of-the-art in precision, recall, f score, and accuracy.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model
Authors:
Masum Shah Junayed,
Arezoo Sadeghzadeh,
Md Baharul Islam,
Lai-Kuan Wong,
Tarkan Aydin
Abstract:
Monocular omnidirectional depth estimation is receiving considerable research attention due to its broad applications for sensing 360° surroundings. Existing approaches in this field suffer from limitations in recovering small object details and data lost during the ground-truth depth map acquisition. In this paper, a novel monocular omnidirectional depth estimation model, namely HiMODE is propose…
▽ More
Monocular omnidirectional depth estimation is receiving considerable research attention due to its broad applications for sensing 360° surroundings. Existing approaches in this field suffer from limitations in recovering small object details and data lost during the ground-truth depth map acquisition. In this paper, a novel monocular omnidirectional depth estimation model, namely HiMODE is proposed based on a hybrid CNN+Transformer (encoder-decoder) architecture whose modules are efficiently designed to mitigate distortion and computational cost, without performance degradation. Firstly, we design a feature pyramid network based on the HNet block to extract high-resolution features near the edges. The performance is further improved, benefiting from a self and cross attention layer and spatial/temporal patches in the Transformer encoder and decoder, respectively. Besides, a spatial residual block is employed to reduce the number of parameters. By jointly passing the deep features extracted from an input image at each backbone block, along with the raw depth maps predicted by the transformer encoder-decoder, through a context adjustment layer, our model can produce resulting depth maps with better visual quality than the ground-truth. Comprehensive ablation studies demonstrate the significance of each individual module. Extensive experiments conducted on three datasets; Stanford3D, Matterport3D, and SunCG, demonstrate that HiMODE can achieve state-of-the-art performance for 360° monocular depth estimation.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Let's Vibrate with Vibration: Augmenting Structural Engineering with Low-Cost Vibration Sensing
Authors:
Masfiqur Rahaman,
MD. Nazmul Hasan Sakib,
Nafisa Islam,
Saiful Islam Salim,
Uday Kamal,
Raihan Rasheed,
A. B. M. Alim Al Islam
Abstract:
Using low-cost piezoelectric sensors to sense structural vibration exhibits great potential in augmenting structural engineering, which is yet to be explored in the literature to the best of our knowledge. Examples of such unexplored augmentation include classifying diverse structures (such as building, flyover, foot over-bridge, etc.), and relating the extent of vibration generated at different h…
▽ More
Using low-cost piezoelectric sensors to sense structural vibration exhibits great potential in augmenting structural engineering, which is yet to be explored in the literature to the best of our knowledge. Examples of such unexplored augmentation include classifying diverse structures (such as building, flyover, foot over-bridge, etc.), and relating the extent of vibration generated at different height of a structure and the associated height. Accordingly, to explore these cases, we develop a low-cost piezoelectric sensor-based vibration sensing system aiming to remotely collect real vibration data from diversified civil structures. We dig into our collected sensed data to classify five different types of structures through rigorous statistical and machine-learning-based analyses. Our analyses achieve a classification accuracy of up to 97% with an F1 score of 0.97. Nonetheless, in the rarely explored time domain, our analyses reveal a novel modality of relation between vibration generated at different heights of a structure and the associated height, which was explored in the frequency domain earlier in the literature with expensive sensors.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
To Lane or Not to Lane? Comparing On-Road Experiences in Develo** and Developed Countries using a New Simulator "RoadBird"
Authors:
Md. Masum Mushfiq,
Tarik Reza Toha,
Saiful Islam Salim,
Aaiyeesha Mostak,
Masfiqur Rahaman,
Najla Abdulrahman Al-Nabhan,
Arif Mohamin Sadri,
A. B. M. Alim Al Islam
Abstract:
Even though the traffic systems in developed countries have been analyzed with rigor and operated efficiently, the same does not generally hold for develo** countries due to inadequate planning, design, and operations of their transportation systems. Because of inherent differences between internal infrastructures, the systems deployed in developed countries may not be amenable to develo** one…
▽ More
Even though the traffic systems in developed countries have been analyzed with rigor and operated efficiently, the same does not generally hold for develo** countries due to inadequate planning, design, and operations of their transportation systems. Because of inherent differences between internal infrastructures, the systems deployed in developed countries may not be amenable to develo** ones. Besides, the traffic systems of develo** countries are not well-studied in the literature to the best of our knowledge. For example, it is yet to explore how a developed country's lane-based traffic flow would perform in the context of a develo** country, which generally experiences non-lane-based traffic. As such, by using our newly developed traffic simulator 'RoadBird', we investigate outcomes of both lane-based and non-lane-based traffic from the contexts of both develo** and developed countries. To do so, we run simulations over real road topologies (extracted from the GIS maps of major cities such as Dhaka, Miami, and Riyadh) considering different scenarios such as lane-based or non-lane-based flows, homogeneous or heterogeneous traffic, with or without pedestrians, etc. We also incorporate different car-following and lane-changing models to mimic traffic behaviors and investigate their performances. While the lane changing dilemma remains an open research question, our experimental evidences indicate: (i) lane-based approaches will not necessarily perform better in the case of currently-adopted non-lane-based scenarios; and (ii) non-lane-based strategies may benefit system performance in lane-based scenarios while having heavy mixed traffic. Nonetheless, we reveal several new insights for on-road experiences both in develo** and developed countries.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Enhancing Fidelity of Quantum Cryptography using Maximally Entangled Qubits
Authors:
Saiful Islam Salim,
Adnan Quaium,
Sriram Chellappan,
A. B. M. Alim Al Islam
Abstract:
Securing information transmission is critical today. However, with rapidly develo** powerful quantum technologies, conventional cryptography techniques are becoming more prone to attacks each day. New techniques in the realm of quantum cryptography to preserve security against powerful attacks are slowly emerging. What is important though now is the fidelity of the cryptography, because security…
▽ More
Securing information transmission is critical today. However, with rapidly develo** powerful quantum technologies, conventional cryptography techniques are becoming more prone to attacks each day. New techniques in the realm of quantum cryptography to preserve security against powerful attacks are slowly emerging. What is important though now is the fidelity of the cryptography, because security with massive processing power is not worth much if it is not correct. Focusing on this issue, we propose a method to enhance the fidelity of quantum cryptography using maximally entangled qubit pairs. For doing so, we created a graph state along a path consisting of all the qubits of ibmqx4 and ibmq_16_melbourne respectively and we measure the strength of the entanglement using negativity measurement of the qubit pairs. Then, using the qubits with maximal entanglement, we send the modified encryption key to the receiver. The key is modified by permutation and superdense coding before transmission. The receiver reverts the process and gets the actual key. We carried out the complete experiment in the IBM Quantum Experience project. Our result shows a 15% to 20% higher fidelity of encryption and decryption than a random selection of qubits.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
The Past, Present, and Future of COVID-19: A Data-Driven Perspective
Authors:
Ajwad Akil,
Ishrat Jahan Eliza,
Md. Hasibul Hussain Hisham,
Fahim Morshed,
Nazmus Sakib,
Nuwaisir Rabi,
Abir Mohammad Turza,
Sriram Chellappan,
A. B. M. Alim Al Islam
Abstract:
Epidemics and pandemics have ravaged human life since time. To combat these, novel ideas have always been created and deployed by humanity, with varying degrees of success. At this very moment, the COVID-19 pandemic is the singular global health crisis. Now, perhaps for the first time in human history, almost the whole of humanity is experiencing some form of hardship as a result of one invisible…
▽ More
Epidemics and pandemics have ravaged human life since time. To combat these, novel ideas have always been created and deployed by humanity, with varying degrees of success. At this very moment, the COVID-19 pandemic is the singular global health crisis. Now, perhaps for the first time in human history, almost the whole of humanity is experiencing some form of hardship as a result of one invisible pathogen. This once again entails novel ideas for quick eradication, healing and recovery, whether it is healthcare, banking, travel, education or any other. For efficient policy-making, clear trends of past, present and future are vital for policy-makers. With the global impacts of COVID-19 so severe, equally important is the analysis of correlations between disease spread and various socio-economic and environmental factors. Furthermore, all of these need to be presented in an integrated manner in real-time to facilitate efficient policy making. To address these issues, in this study, we report results on our development and deployment of a web-based integrated real-time operational dashboard as an important decision support system for COVID-19. In our study, we conducted data-driven analysis based on available data from diverse authenticated sources to predict upcoming consequences of the pandemic through rigorous modeling and statistical analyses. We also explored correlations between pandemic spread and important socio-economic and environmental factors. Furthermore, we also present how outcomes of our work can facilitate efficient policy making in this critical hour.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.
-
As You Are, So Shall You Move Your Head: A System-Level Analysis between Head Movements and Corresponding Traits and Emotions
Authors:
Sharmin Akther Purabi,
Rayhan Rashed,
Md. Mirajul Islam,
Md. Nahiyan Uddin,
Mahmuda Naznin,
A. B. M. Alim Al Islam
Abstract:
Identifying physical traits and emotions based on system-sensed physical activities is a challenging problem in the realm of human-computer interaction. Our work contributes in this context by investigating an underlying connection between head movements and corresponding traits and emotions. To do so, we utilize a head movement measuring device called eSense, which gives acceleration and rotation…
▽ More
Identifying physical traits and emotions based on system-sensed physical activities is a challenging problem in the realm of human-computer interaction. Our work contributes in this context by investigating an underlying connection between head movements and corresponding traits and emotions. To do so, we utilize a head movement measuring device called eSense, which gives acceleration and rotation of a head. Here, first, we conduct a thorough study over head movement data collected from 46 persons using eSense while inducing five different emotional states over them in isolation. Our analysis reveals several new head movement based findings, which in turn, leads us to a novel unified solution for identifying different human traits and emotions through exploiting machine learning techniques over head movement data. Our analysis confirms that the proposed solution can result in high accuracy over the collected data. Accordingly, we develop an integrated unified solution for real-time emotion and trait identification using head movement data leveraging outcomes of our analysis.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
An Interactive Control Approach to 3D Shape Reconstruction
Authors:
Bipul Islam,
Ji Liu,
Anthony Yezzi,
Romeil Sandhu
Abstract:
The ability to accurately reconstruct the 3D facets of a scene is one of the key problems in robotic vision. However, even with recent advances with machine learning, there is no high-fidelity universal 3D reconstruction method for this optimization problem as schemes often cater to specific image modalities and are often biased by scene abnormalities. Simply put, there always remains an informati…
▽ More
The ability to accurately reconstruct the 3D facets of a scene is one of the key problems in robotic vision. However, even with recent advances with machine learning, there is no high-fidelity universal 3D reconstruction method for this optimization problem as schemes often cater to specific image modalities and are often biased by scene abnormalities. Simply put, there always remains an information gap due to the dynamic nature of real-world scenarios. To this end, we demonstrate a feedback control framework which invokes operator inputs (also prone to errors) in order to augment existing reconstruction schemes. For proof-of-concept, we choose a classical region-based stereoscopic reconstruction approach and show how an ill-posed model can be augmented with operator input to be much more robust to scene artifacts. We provide necessary conditions for stability via Lyapunov analysis and perhaps more importantly, we show that the stability depends on a notion of absolute curvature. Mathematically, this aligns with previous work that has shown Ricci curvature as proxy for functional robustness of dynamical networked systems. We conclude with results that show how our method can improve standalone reconstruction schemes.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
A Sweet Recipe for Consolidated Vulnerabilities: Attacking a Live Website by Harnessing a Killer Combination of Vulnerabilities
Authors:
Mazharul Islam,
MD. Nazmuddoha Ansary,
Novia Nurain,
Salauddin Parvez Shams,
A. B. M. Alim Al Islam
Abstract:
The recent emergence of new vulnerabilities is an epoch-making problem in the complex world of website security. Most of the websites are failing to keep updating to tackle their websites from these new vulnerabilities leaving without realizing the weakness of the websites. As a result, when cyber-criminals scour such vulnerable old version websites, the scanner will represent a set of vulnerabili…
▽ More
The recent emergence of new vulnerabilities is an epoch-making problem in the complex world of website security. Most of the websites are failing to keep updating to tackle their websites from these new vulnerabilities leaving without realizing the weakness of the websites. As a result, when cyber-criminals scour such vulnerable old version websites, the scanner will represent a set of vulnerabilities. Once found, these vulnerabilities are then exploited to steal data, distribute malicious content, or inject defacement and spam content into the vulnerable websites. Furthermore, a combination of different vulnerabilities is able to cause more damages than anticipation. Therefore, in this paper, we endeavor to find connections among various vulnerabilities such as cross-site scripting, local file inclusion, remote file inclusion, buffer overflow CSRF, etc. To do so, we develop a Finite State Machine (FSM) attacking model, which analyzes a set of vulnerabilities towards the road to finding connections. We demonstrate the efficacy of our model by applying it to the set of vulnerabilities found on two live websites.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
Zygarde: Time-Sensitive On-Device Deep Inference and Adaptation on Intermittently-Powered Systems
Authors:
Bashima Islam,
Shahriar Nirjon
Abstract:
We propose Zygarde -- which is an energy -- and accuracy-aware soft real-time task scheduling framework for batteryless systems that flexibly execute deep learning tasks1 that are suitable for running on microcontrollers. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (DNNs) pose a unique and challenging…
▽ More
We propose Zygarde -- which is an energy -- and accuracy-aware soft real-time task scheduling framework for batteryless systems that flexibly execute deep learning tasks1 that are suitable for running on microcontrollers. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (DNNs) pose a unique and challenging real-time scheduling problem for which no solutions have been proposed in the literature. We empirically study the problem and model the energy harvesting pattern as well as the trade-off between the accuracy and execution of a DNN. We develop an imprecise computing-based scheduling algorithm that improves the timeliness of DNN tasks on intermittently powered systems. We evaluate Zygarde using four standard datasets as well as by deploying it in six real-life applications involving audio and camera sensor systems. Results show that Zygarde decreases the execution time by up to 26% and schedules 9%-34% more tasks with up to 21% higher inference accuracy, compared to traditional schedulers such as the earliest deadline first (EDF).
△ Less
Submitted 7 September, 2020; v1 submitted 4 May, 2019;
originally announced May 2019.
-
Intermittent Learning: On-Device Machine Learning on Intermittently Powered System
Authors:
Seulki Lee,
Bashima Islam,
Yubo Luo,
Shahriar Nirjon
Abstract:
This paper introduces intermittent learning - the goal of which is to enable energy harvested computing platforms capable of executing certain classes of machine learning tasks effectively and efficiently. We identify unique challenges to intermittent learning relating to the data and application semantics of machine learning tasks, and to address these challenges, we devise 1) an algorithm that d…
▽ More
This paper introduces intermittent learning - the goal of which is to enable energy harvested computing platforms capable of executing certain classes of machine learning tasks effectively and efficiently. We identify unique challenges to intermittent learning relating to the data and application semantics of machine learning tasks, and to address these challenges, we devise 1) an algorithm that determines a sequence of actions to achieve the desired learning objective under tight energy constraints, and 2) propose three heuristics that help an intermittent learner decide whether to learn or discard training examples at run-time which increases the energy efficiency of the system. We implement and evaluate three intermittent learning applications that learn the 1) air quality, 2) human presence, and 3) vibration using solar, RF, and kinetic energy harvesters, respectively. We demonstrate that the proposed framework improves the energy efficiency of a learner by up to 100% and cuts down the number of learning examples by up to 50% when compared to state-of-the-art intermittent computing systems that do not implement the proposed intermittent learning framework.
△ Less
Submitted 15 December, 2019; v1 submitted 21 April, 2019;
originally announced April 2019.
-
Characterizing Distances of Networks on the Tensor Manifold
Authors:
Bipul Islam,
Ji Liu,
Romeil Sandhu
Abstract:
At the core of understanding dynamical systems is the ability to maintain and control the systems behavior that includes notions of robustness, heterogeneity, or regime-shift detection. Recently, to explore such functional properties, a convenient representation has been to model such dynamical systems as a weighted graph consisting of a finite, but very large number of interacting agents. This sa…
▽ More
At the core of understanding dynamical systems is the ability to maintain and control the systems behavior that includes notions of robustness, heterogeneity, or regime-shift detection. Recently, to explore such functional properties, a convenient representation has been to model such dynamical systems as a weighted graph consisting of a finite, but very large number of interacting agents. This said, there exists very limited relevant statistical theory that is able cope with real-life data, i.e., how does perform analysis and/or statistics over a family of networks as opposed to a specific network or network-to-network variation. Here, we are interested in the analysis of network families whereby each network represents a point on an underlying statistical manifold. To do so, we explore the Riemannian structure of the tensor manifold developed by Pennec previously applied to Diffusion Tensor Imaging (DTI) towards the problem of network analysis. In particular, while this note focuses on Pennec definition of geodesics amongst a family of networks, we show how it lays the foundation for future work for develo** measures of network robustness for regime-shift detection. We conclude with experiments highlighting the proposed distance on synthetic networks and an application towards biological (stem-cell) systems.
△ Less
Submitted 6 October, 2019; v1 submitted 25 August, 2017;
originally announced August 2017.
-
Folding of guanine quadruplex molecules -- funnel-like mechanism or kinetic partitioning? An overview from MD simulation studies
Authors:
Jiří Šponer,
Giovanni Bussi,
Petr Stadlbauer,
Petra Kührová,
Pavel Banáš,
Barira Islam,
Shozeb Haider,
Stephen Neidle,
Michal Otyepka
Abstract:
Background: Guanine quadruplexes (GQs) play vital roles in many cellular processes and are of much interest as drug targets. In contrast to the availability of many structural studies, there is still limited knowledge on GQ folding.
Scope of review: We review recent molecular dynamics (MD) simulation studies of the folding of GQs, with an emphasis paid to the human telomeric DNA GQ. We explain t…
▽ More
Background: Guanine quadruplexes (GQs) play vital roles in many cellular processes and are of much interest as drug targets. In contrast to the availability of many structural studies, there is still limited knowledge on GQ folding.
Scope of review: We review recent molecular dynamics (MD) simulation studies of the folding of GQs, with an emphasis paid to the human telomeric DNA GQ. We explain the basic principles and limitations of all types of MD methods used to study unfolding and folding in a way accessible to non-specialists. We discuss the potential role of G-hairpin, G-triplex and alternative GQ intermediates in the folding process. We argue that, in general, folding of GQs is fundamentally different from funneled folding of small fast-folding proteins, and can be best described by a kinetic partitioning (KP) mechanism. KP is a competition between at least two (but often many) well-separated and structurally different conformational ensembles.
Major conclusions: The KP mechanism is the only plausible way to explain experiments reporting long time-scales of GQ folding and the existence of long-lived sub-states. A significant part of the natural partitioning of the free energy landscape of GQs comes from the ability of the GQ-forming sequences to populate a large number of syn-anti patterns in their G-tracts. The extreme complexity of the KP of GQs typically prevents an appropriate description of the folding landscape using just a few order parameters or collective variables.
General significance: We reconcile available computational and experimental studies of GQ folding and formulate basic principles characterizing GQ folding landscapes
△ Less
Submitted 18 December, 2016;
originally announced December 2016.
-
A System for Identifying and Visualizing Influential Communities
Authors:
Md Tamzeed Islam,
Bashima Islam,
Mohammed Eunus Ali
Abstract:
In this paper, we introduce the concept of influential communities in a co-author network. We term a community as the most influential if the community has the highest influence among all other communities in the entire network. Influence of a community depends on the impact of the contents (e.g., citations of papers) generated by the members of that community. We propose an algorithm to identify…
▽ More
In this paper, we introduce the concept of influential communities in a co-author network. We term a community as the most influential if the community has the highest influence among all other communities in the entire network. Influence of a community depends on the impact of the contents (e.g., citations of papers) generated by the members of that community. We propose an algorithm to identify the top K influential communities of an online social network. As a working prototype, we develop a visualization system that allows a user to find the top K influential communities from a co-author network. A user can search top K influential communities of particular research fields and our system provides him/her with a visualization of these communities. A user can explore the details of a community, such as authors, citations, and collaborations with other communities.
△ Less
Submitted 20 October, 2016;
originally announced October 2016.
-
Bengali to Assamese Statistical Machine Translation using Moses (Corpus Based)
Authors:
Nayan Jyoti Kalita,
Baharul Islam
Abstract:
Machine dialect interpretation assumes a real part in encouraging man-machine correspondence and in addition men-men correspondence in Natural Language Processing (NLP). Machine Translation (MT) alludes to utilizing machine to change one dialect to an alternate. Statistical Machine Translation is a type of MT consisting of Language Model (LM), Translation Model (TM) and decoder. In this paper, Ben…
▽ More
Machine dialect interpretation assumes a real part in encouraging man-machine correspondence and in addition men-men correspondence in Natural Language Processing (NLP). Machine Translation (MT) alludes to utilizing machine to change one dialect to an alternate. Statistical Machine Translation is a type of MT consisting of Language Model (LM), Translation Model (TM) and decoder. In this paper, Bengali to Assamese Statistical Machine Translation Model has been created by utilizing Moses. Other translation tools like IRSTLM for Language Model and GIZA-PP-V1.0.7 for Translation model are utilized within this framework which is accessible in Linux situations. The purpose of the LM is to encourage fluent output and the purpose of TM is to encourage similarity between input and output, the decoder increases the probability of translated text in target language. A parallel corpus of 17100 sentences in Bengali and Assamese has been utilized for preparing within this framework. Measurable MT procedures have not so far been generally investigated for Indian dialects. It might be intriguing to discover to what degree these models can help the immense continuous MT deliberations in the nation.
△ Less
Submitted 5 April, 2015;
originally announced April 2015.
-
Interactive Digital Learning Materials for Kindergarten Students in Bangladesh
Authors:
Md. Baharul Islam,
Md. Kabirul Islam,
Arif Ahmed,
Abu Kalam Shamsuddin
Abstract:
The pedagogy of teaching and learning has changed with the proliferation of communication technology and it is necessary to develop interactive learning materials for children that may improve their learning, catching, and memorizing capabilities. Perhaps, one of the most important innovations in the age of technology is multimedia and its application. It is imperative to create high quality and r…
▽ More
The pedagogy of teaching and learning has changed with the proliferation of communication technology and it is necessary to develop interactive learning materials for children that may improve their learning, catching, and memorizing capabilities. Perhaps, one of the most important innovations in the age of technology is multimedia and its application. It is imperative to create high quality and realistic learning environment for children. Interactive learning materials can be easier to understand and deal with their first learning. We developed some interactive learning materials in the form of a video for Playgroup using multimedia application tools. This study investigated the impact of students' abilities to acquire new knowledge or skills through interactive learning materials. We visited one kindergartens (Nursery schools), interviewed class teachers about their teaching methods and level of students' ability of recognizing English alphabets, pictures, etc. The course teachers were provided interactive learning materials to show their playgroups for a number of sessions. The video included English alphabets with related words and pictures, and motivational fun. We noticed that almost all children were very interested to interact with their leaning video. The students were assessed individually and asked to recognize the alphabets, and pictures. The students adapted with their first alphabets very quickly. However, there were individual differences in their cognitive development. This interactive multimedia can be an alternative to traditional pedagogy for teaching playgroups.
△ Less
Submitted 7 November, 2014;
originally announced November 2014.
-
Child Education Through Animation: An Experimental Study
Authors:
Md Baharul Islam,
Arif Ahmed,
Md Kabirul Islam,
Abu Kalam Shamsuddin
Abstract:
Teachers have tried to teach their students by introducing text books along with verbal instructions in traditional education system. However, teaching and learning methods could be changed for develo** Information and Communication Technology. It's time to adapt students with interactive learning system so that they can improve their learning, catching, and memorizing capabilities. It is indisp…
▽ More
Teachers have tried to teach their students by introducing text books along with verbal instructions in traditional education system. However, teaching and learning methods could be changed for develo** Information and Communication Technology. It's time to adapt students with interactive learning system so that they can improve their learning, catching, and memorizing capabilities. It is indispensable to create high quality and realistic leaning environment for students. Visual learning can be easier to understand and deal with their learning. We developed visual learning materials in the form of video for students of primary level using different multimedia application tools. The objective of this paper is to examine the impact of students abilities to acquire new knowledge or skills through visual learning materials and blended leaning that is integration of visual learning materials with teachers instructions. We visited a primary school in Dhaka city for this study and conducted teaching with three different groups of students, (i) teacher taught students by traditional system on same materials and marked level of students ability to adapt by a set of questions, (ii) another group was taught with only visual learning material and assessment was done with 15 questionnaires, (iii) the third group was taught with the video of solar system combined with teachers instructions and assessed with the same questionnaires. This integration of visual materials with verbal instructions is a blended approach of learning. The interactive blended approach greatly promoted students ability of acquisition of knowledge and skills. Students response and perception were very positive towards the blended technique than the other two methods. This interactive blending leaning system may be an appropriate method especially for school children.
△ Less
Submitted 7 November, 2014;
originally announced November 2014.