Search | arXiv e-print repository

Data Quality Matters: Suicide Intention Detection on Social Media Posts Using a RoBERTa-CNN Model

Authors: Emily Lin, Jian Sun, Hsingyu Chen, Mohammad H. Mahoor

Abstract: Suicide remains a global health concern for the field of health, which urgently needs innovative approaches for early detection and intervention. In this paper, we focus on identifying suicidal intentions in SuicideWatch Reddit posts and present a novel approach to suicide detection using the cutting-edge RoBERTa-CNN model, a variant of RoBERTa (Robustly optimized BERT approach). RoBERTa is used f… ▽ More Suicide remains a global health concern for the field of health, which urgently needs innovative approaches for early detection and intervention. In this paper, we focus on identifying suicidal intentions in SuicideWatch Reddit posts and present a novel approach to suicide detection using the cutting-edge RoBERTa-CNN model, a variant of RoBERTa (Robustly optimized BERT approach). RoBERTa is used for various Natural Language Processing (NLP) tasks, including text classification and sentiment analysis. The effectiveness of the RoBERTa lies in its ability to capture textual information and form semantic relationships within texts. By adding the Convolution Neural Network (CNN) layer to the original model, the RoBERTa enhances its ability to capture important patterns from heavy datasets. To evaluate the RoBERTa-CNN, we experimented on the Suicide and Depression Detection dataset and obtained solid results. For example, RoBERTa-CNN achieves 98% mean accuracy with the standard deviation (STD) of 0.0009. It also reaches over 97.5% mean AUC value with an STD of 0.0013. In the meanwhile, RoBERTa-CNN outperforms competitive methods, demonstrating the robustness and ability to capture nuanced linguistic patterns for suicidal intentions. Therefore, RoBERTa-CNN can detect suicide intention on text data very well. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 4 pages, 1 figure, 4 tables

arXiv:2402.01690 [pdf, other]

Linguistic-Based Mild Cognitive Impairment Detection Using Informative Loss

Authors: Ali Pourramezan Fard, Mohammad H. Mahoor, Muath Alsuhaibani, Hiroko H. Dodgec

Abstract: This paper presents a deep learning method using Natural Language Processing (NLP) techniques, to distinguish between Mild Cognitive Impairment (MCI) and Normal Cognitive (NC) conditions in older adults. We propose a framework that analyzes transcripts generated from video interviews collected within the I-CONECT study project, a randomized controlled trial aimed at improving cognitive functions t… ▽ More This paper presents a deep learning method using Natural Language Processing (NLP) techniques, to distinguish between Mild Cognitive Impairment (MCI) and Normal Cognitive (NC) conditions in older adults. We propose a framework that analyzes transcripts generated from video interviews collected within the I-CONECT study project, a randomized controlled trial aimed at improving cognitive functions through video chats. Our proposed NLP framework consists of two Transformer-based modules, namely Sentence Embedding (SE) and Sentence Cross Attention (SCA). First, the SE module captures contextual relationships between words within each sentence. Subsequently, the SCA module extracts temporal features from a sequence of sentences. This feature is then used by a Multi-Layer Perceptron (MLP) for the classification of subjects into MCI or NC. To build a robust model, we propose a novel loss function, called InfoLoss, that considers the reduction in entropy by observing each sequence of sentences to ultimately enhance the classification accuracy. The results of our comprehensive model evaluation using the I-CONECT dataset show that our framework can distinguish between MCI and NC with an average area under the curve of 84.75%. △ Less

Submitted 23 January, 2024; originally announced February 2024.

arXiv:2308.15624 [pdf, other]

doi 10.1016/j.eswa.2024.124185

Detection of Mild Cognitive Impairment Using Facial Features in Video Conversations

Authors: Muath Alsuhaibani, Hiroko H. Dodge, Mohammad H. Mahoor

Abstract: Early detection of Mild Cognitive Impairment (MCI) leads to early interventions to slow the progression from MCI into dementia. Deep Learning (DL) algorithms could help achieve early non-invasive, low-cost detection of MCI. This paper presents the detection of MCI in older adults using DL models based only on facial features extracted from video-recorded conversations at home. We used the data col… ▽ More Early detection of Mild Cognitive Impairment (MCI) leads to early interventions to slow the progression from MCI into dementia. Deep Learning (DL) algorithms could help achieve early non-invasive, low-cost detection of MCI. This paper presents the detection of MCI in older adults using DL models based only on facial features extracted from video-recorded conversations at home. We used the data collected from the I-CONECT behavioral intervention study (NCT02871921), where several sessions of semi-structured interviews between socially isolated older individuals and interviewers were video recorded. We develop a framework that extracts spatial holistic facial features using a convolutional autoencoder and temporal information using transformers. Our proposed DL model was able to detect the I-CONECT study participants' cognitive conditions (MCI vs. those with normal cognition (NC)) using facial features. The segments and sequence information of the facial features improved the prediction performance compared with the non-temporal features. The detection accuracy using this combined method reached 88% whereas 84% is the accuracy without applying the segments and sequences information of the facial features within a video on a certain theme. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2304.05292 [pdf, other]

doi 10.1016/j.eswa.2023.121929

MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos

Authors: Jian Sun, Hiroko H. Dodge, Mohammad H. Mahoor

Abstract: Deep machine learning models including Convolutional Neural Networks (CNN) have been successful in the detection of Mild Cognitive Impairment (MCI) using medical images, questionnaires, and videos. This paper proposes a novel Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to distinguish MCI from those with normal cognition by analyzing facial features. The data comes from the I-… ▽ More Deep machine learning models including Convolutional Neural Networks (CNN) have been successful in the detection of Mild Cognitive Impairment (MCI) using medical images, questionnaires, and videos. This paper proposes a novel Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to distinguish MCI from those with normal cognition by analyzing facial features. The data comes from the I-CONECT, a behavioral intervention trial aimed at improving cognitive function by providing frequent video chats. MC-ViViT extracts spatiotemporal features of videos in one branch and augments representations by the MC module. The I-CONECT dataset is challenging as the dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE loss to address the imbalanced problem. Our experimental results on the I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a high accuracy of 90.63% accuracy on some of the interview videos. △ Less

Submitted 5 January, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: 13 pages, 7 tables, 7 figures, 9 equations

Journal ref: Expert Systems with Applications, 238, 121929 (2023)

arXiv:2302.00908 [pdf, other]

GANalyzer: Analysis and Manipulation of GANs Latent Space for Controllable Face Synthesis

Authors: Ali Pourramezan Fard, Mohammad H. Mahoor, Sarah Ariel Lamer, Timothy Sweeny

Abstract: Generative Adversarial Networks (GANs) are capable of synthesizing high-quality facial images. Despite their success, GANs do not provide any information about the relationship between the input vectors and the generated images. Currently, facial GANs are trained on imbalanced datasets, which generate less diverse images. For example, more than 77% of 100K images that we randomly synthesized using… ▽ More Generative Adversarial Networks (GANs) are capable of synthesizing high-quality facial images. Despite their success, GANs do not provide any information about the relationship between the input vectors and the generated images. Currently, facial GANs are trained on imbalanced datasets, which generate less diverse images. For example, more than 77% of 100K images that we randomly synthesized using the StyleGAN3 are classified as Happy, and only around 3% are Angry. The problem even becomes worse when a mixture of facial attributes is desired: less than 1% of the generated samples are Angry Woman, and only around 2% are Happy Black. To address these problems, this paper proposes a framework, called GANalyzer, for the analysis, and manipulation of the latent space of well-trained GANs. GANalyzer consists of a set of transformation functions designed to manipulate latent vectors for a specific facial attribute such as facial Expression, Age, Gender, and Race. We analyze facial attribute entanglement in the latent space of GANs and apply the proposed transformation for editing the disentangled facial attributes. Our experimental results demonstrate the strength of GANalyzer in editing facial attributes and generating any desired faces. We also create and release a balanced photo-realistic human face dataset. Our code is publicly available on GitHub. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2205.14935 [pdf, other]

doi 10.1080/17489725.2020.1817582

Deep Learning Methods for Fingerprint-Based Indoor Positioning: A Review

Authors: Fahad Alhomayani, Mohammad H. Mahoor

Abstract: Outdoor positioning systems based on the Global Navigation Satellite System have several shortcomings that have deemed their use for indoor positioning impractical. Location fingerprinting, which utilizes machine learning, has emerged as a viable method and solution for indoor positioning due to its simple concept and accurate performance. In the past, shallow learning algorithms were traditionall… ▽ More Outdoor positioning systems based on the Global Navigation Satellite System have several shortcomings that have deemed their use for indoor positioning impractical. Location fingerprinting, which utilizes machine learning, has emerged as a viable method and solution for indoor positioning due to its simple concept and accurate performance. In the past, shallow learning algorithms were traditionally used in location fingerprinting. Recently, the research community started utilizing deep learning methods for fingerprinting after witnessing the great success and superiority these methods have over traditional/shallow machine learning algorithms. This paper provides a comprehensive review of deep learning methods in indoor positioning. First, the advantages and disadvantages of various fingerprint types for indoor positioning are discussed. The solutions proposed in the literature are then analyzed, categorized, and compared against various performance evaluation metrics. Since data is key in fingerprinting, a detailed review of publicly available indoor positioning datasets is presented. While incorporating deep learning into fingerprinting has resulted in significant improvements, doing so, has also introduced new challenges. These challenges along with the common implementation pitfalls are discussed. Finally, the paper is concluded with some remarks as well as future research trends. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Journal ref: Journal of Location Based Services, 14:3, 129-200

arXiv:2205.14921 [pdf, other]

doi 10.1038/s41597-021-00832-y

OutFin, a multi-device and multi-modal dataset for outdoor localization based on the fingerprinting approach

Authors: Fahad Alhomayani, Mohammad H. Mahoor

Abstract: In recent years, fingerprint-based positioning has gained researchers attention since it is a promising alternative to the Global Navigation Satellite System and cellular network-based localization in urban areas. Despite this, the lack of publicly available datasets that researchers can use to develop, evaluate, and compare fingerprint-based positioning solutions constitutes a high entry barrier… ▽ More In recent years, fingerprint-based positioning has gained researchers attention since it is a promising alternative to the Global Navigation Satellite System and cellular network-based localization in urban areas. Despite this, the lack of publicly available datasets that researchers can use to develop, evaluate, and compare fingerprint-based positioning solutions constitutes a high entry barrier for studies. As an effort to overcome this barrier and foster new research efforts, this paper presents OutFin, a novel dataset of outdoor location fingerprints that were collected using two different smartphones. OutFin is comprised of diverse data types such as WiFi, Bluetooth, and cellular signal strengths, in addition to measurements from various sensors including the magnetometer, accelerometer, gyroscope, barometer, and ambient light sensor. The collection area spanned four dispersed sites with a total of 122 reference points. Each site is different in terms of its visibility to the Global Navigation Satellite System and reference points number, arrangement, and spacing. Before OutFin was made available to the public, several experiments were conducted to validate its technical quality. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Journal ref: Sci Data 8, 66 (2021)

arXiv:2205.04251 [pdf, ps, other]

A Music-Therapy Robotic Platform for Children with Autism: A Pilot Study

Authors: Huanghao Fengr, Mohammad H. Mahoor, Francesca Dino

Abstract: Children with Autism Spectrum Disorder (ASD) experience deficits in verbal and nonverbal communication skills including motor control, turn-taking, and emotion recognition. Innovative technology, such as socially assistive robots, has shown to be a viable method for Autism therapy. This paper presents a novel robot-based music-therapy platform for modeling and improving the social responses and be… ▽ More Children with Autism Spectrum Disorder (ASD) experience deficits in verbal and nonverbal communication skills including motor control, turn-taking, and emotion recognition. Innovative technology, such as socially assistive robots, has shown to be a viable method for Autism therapy. This paper presents a novel robot-based music-therapy platform for modeling and improving the social responses and behaviors of children with ASD. Our autonomous social interactive system consists of three modules. We adopted Short-time Fourier Transform and Levenshtein distance to fulfill the design requirements: a) "music detection" and b) "smart scoring and feedback", which allows NAO to understand music and provide additional practice and oral feedback to the users as applicable. We designed and implemented six Human-Robot-Interaction (HRI) sessions including four intervention sessions. Nine children with ASD and seven Typically Develo** participated in a total of fifty HRI experimental sessions. Using our platform, we collected and analyzed data on social behavioral changes and emotion recognition using Electrodermal Activity (EDA) signals. The results of our experiments demonstrate most of the participants were able to complete motor control tasks with ~70% accuracy. Six out of the 9 ASD participants showed stable turn-taking behavior when playing music. The results of automated emotion classification using Support Vector Machines illustrate that emotional arousal in the ASD group can be detected and well recognized via EDA bio-signals. In summary, the results of our data analyses, including emotion classification using EDA signals, indicate that the proposed robot-music based therapy platform is an attractive and promising assistive tool to facilitate the improvement of fine motor control and turn-taking skills in children with ASD. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2203.15835 [pdf, other]

ACR Loss: Adaptive Coordinate-based Regression Loss for Face Alignment

Authors: Ali Pourramezan Fard, Mohammad H. Mahoor

Abstract: Although deep neural networks have achieved reasonable accuracy in solving face alignment, it is still a challenging task, specifically when we deal with facial images, under occlusion, or extreme head poses. Heatmap-based Regression (HBR) and Coordinate-based Regression (CBR) are among the two mainly used methods for face alignment. CBR methods require less computer memory, though their performan… ▽ More Although deep neural networks have achieved reasonable accuracy in solving face alignment, it is still a challenging task, specifically when we deal with facial images, under occlusion, or extreme head poses. Heatmap-based Regression (HBR) and Coordinate-based Regression (CBR) are among the two mainly used methods for face alignment. CBR methods require less computer memory, though their performance is less than HBR methods. In this paper, we propose an Adaptive Coordinate-based Regression (ACR) loss to improve the accuracy of CBR for face alignment. Inspired by the Active Shape Model (ASM), we generate Smooth-Face objects, a set of facial landmark points with less variations compared to the ground truth landmark points. We then introduce a method to estimate the level of difficulty in predicting each landmark point for the network by comparing the distribution of the ground truth landmark points and the corresponding Smooth-Face objects. Our proposed ACR Loss can adaptively modify its curvature and the influence of the loss based on the difficulty level of predicting each landmark point in a face. Accordingly, the ACR Loss guides the network toward challenging points than easier points, which improves the accuracy of the face alignment task. Our extensive evaluation shows the capabilities of the proposed ACR Loss in predicting facial landmark points in various facial images. △ Less

Submitted 14 September, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted in International Conference on Pattern Recognition (ICPR) 2022

arXiv:2201.11167 [pdf, other]

doi 10.1109/TAFFC.2022.3143803

Artificial Emotional Intelligence in Socially Assistive Robots for Older Adults: A Pilot Study

Authors: Hojjat Abdollahi, Mohammad H. Mahoor, Rohola Zandie, Jarid Siewierski, Sara H. Qualls

Abstract: This paper presents our recent research on integrating artificial emotional intelligence in a social robot (Ryan) and studies the robot's effectiveness in engaging older adults. Ryan is a socially assistive robot designed to provide companionship for older adults with depression and dementia through conversation. We used two versions of Ryan for our study, empathic and non-empathic. The empathic R… ▽ More This paper presents our recent research on integrating artificial emotional intelligence in a social robot (Ryan) and studies the robot's effectiveness in engaging older adults. Ryan is a socially assistive robot designed to provide companionship for older adults with depression and dementia through conversation. We used two versions of Ryan for our study, empathic and non-empathic. The empathic Ryan utilizes a multimodal emotion recognition algorithm and a multimodal emotion expression system. Using different input modalities for emotion, i.e. facial expression and speech sentiment, the empathic Ryan detects users' emotional state and utilizes an affective dialogue manager to generate a response. On the other hand, the non-empathic Ryan lacks facial expression and uses scripted dialogues that do not factor in the users' emotional state. We studied these two versions of Ryan with 10 older adults living in a senior care facility. The statistically significant improvement in the users' reported face-scale mood measurement indicates an overall positive effect from the interaction with both the empathic and non-empathic versions of Ryan. However, the number of spoken words measurement and the exit survey analysis suggest that the users perceive the empathic Ryan as more engaging and likable. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: To be published in IEEE Transactions on Affective Computing

arXiv:2111.10854 [pdf, other]

doi 10.1007/s10846-023-01952-w

XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks

Authors: Jian Sun, Ali Pourramezan Fard, Mohammad H. Mahoor

Abstract: Capsule Network is powerful at defining the positional relationship between features in deep neural networks for visual recognition tasks, but it is computationally expensive and not suitable for running on mobile devices. The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between the capsules. On the other hand, XNOR-Net is fast and computationally efficient,… ▽ More Capsule Network is powerful at defining the positional relationship between features in deep neural networks for visual recognition tasks, but it is computationally expensive and not suitable for running on mobile devices. The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between the capsules. On the other hand, XNOR-Net is fast and computationally efficient, though it suffers from low accuracy due to information loss in the binarization process. To address the computational burdens of the Dynamic Routing mechanism, this paper proposes new Fully Connected (FC) layers by xnorizing the linear projection outside or inside the Dynamic Routing within the CapsFC layer. Specifically, our proposed FC layers have two versions, XnODR (Xnorize the Linear Projection Outside Dynamic Routing) and XnIDR (Xnorize the Linear Projection Inside Dynamic Routing). To test the generalization of both XnODR and XnIDR, we insert them into two different networks, MobileNetV2 and ResNet-50. Our experiments on three datasets, MNIST, CIFAR-10, and MultiMNIST validate their effectiveness. The results demonstrate that both XnODR and XnIDR help networks to have high accuracy with lower FLOPs and fewer parameters (e.g., 96.14% correctness with 2.99M parameters and 311.74M FLOPs on CIFAR-10). △ Less

Submitted 19 September, 2023; v1 submitted 21 November, 2021; originally announced November 2021.

Comments: 19 pages, 5 figures, 9 tables, 2 algorithms

Journal ref: J Intell Robot Syst 109, 17 (2023)

arXiv:2111.07047 [pdf, other]

Facial Landmark Points Detection Using Knowledge Distillation-Based Neural Networks

Authors: Ali Pourramezan Fard, Mohammad H. Mahoor

Abstract: Facial landmark detection is a vital step for numerous facial image analysis applications. Although some deep learning-based methods have achieved good performances in this task, they are often not suitable for running on mobile devices. Such methods rely on networks with many parameters, which makes the training and inference time-consuming. Training lightweight neural networks such as MobileNets… ▽ More Facial landmark detection is a vital step for numerous facial image analysis applications. Although some deep learning-based methods have achieved good performances in this task, they are often not suitable for running on mobile devices. Such methods rely on networks with many parameters, which makes the training and inference time-consuming. Training lightweight neural networks such as MobileNets are often challenging, and the models might have low accuracy. Inspired by knowledge distillation (KD), this paper presents a novel loss function to train a lightweight Student network (e.g., MobileNetV2) for facial landmark detection. We use two Teacher networks, a Tolerant-Teacher and a Tough-Teacher in conjunction with the Student network. The Tolerant-Teacher is trained using Soft-landmarks created by active shape models, while the Tough-Teacher is trained using the ground truth (aka Hard-landmarks) landmark points. To utilize the facial landmark points predicted by the Teacher networks, we define an Assistive Loss (ALoss) for each Teacher network. Moreover, we define a loss function called KD-Loss that utilizes the facial landmark points predicted by the two pre-trained Teacher networks (EfficientNet-b3) to guide the lightweight Student network towards predicting the Hard-landmarks. Our experimental results on three challenging facial datasets show that the proposed architecture will result in a better-trained Student network that can extract facial landmark points with high accuracy. △ Less

Submitted 13 November, 2021; originally announced November 2021.

Comments: Accepted in Computer Vision and Image Understanding Journal

arXiv:2108.13503 [pdf, other]

Oversampling Highly Imbalanced Indoor Positioning Data using Deep Generative Models

Authors: Fahad Alhomayani, Mohammad H. Mahoor

Abstract: The location fingerprinting method, which typically utilizes supervised learning, has been widely adopted as a viable solution for the indoor positioning problem. Many indoor positioning datasets are imbalanced. Models trained on imbalanced datasets may exhibit poor performance on the minority class(es). This problem, also known as the "curse of imbalanced data," becomes more evident when class di… ▽ More The location fingerprinting method, which typically utilizes supervised learning, has been widely adopted as a viable solution for the indoor positioning problem. Many indoor positioning datasets are imbalanced. Models trained on imbalanced datasets may exhibit poor performance on the minority class(es). This problem, also known as the "curse of imbalanced data," becomes more evident when class distributions are highly imbalanced. Motivated by the recent advances in deep generative modeling, this paper proposes using Variational Autoencoders and Conditional Variational Autoencoders as oversampling tools to produce class-balanced fingerprints. Experimental results based on Bluetooth Low Energy fingerprints demonstrate that the proposed method outperforms SMOTE and ADASYN in both minority class precision and overall precision. To promote reproducibility and foster new research efforts, we made all the codes associated with this work publicly available. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: to appear in IEEE SENSORS 2021

arXiv:2106.08468 [pdf, other]

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

Authors: Rohola Zandie, Mohammad H. Mahoor, Julia Madsen, Eshrat S. Emamian

Abstract: This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or lack quality male speech data. In order to meet the need for a high quality, publicly available male speech corpus within the field of speech recognition, we have designed and created RyanSpeech which conta… ▽ More This paper introduces RyanSpeech, a new speech corpus for research on automated text-to-speech (TTS) systems. Publicly available TTS corpora are often noisy, recorded with multiple speakers, or lack quality male speech data. In order to meet the need for a high quality, publicly available male speech corpus within the field of speech recognition, we have designed and created RyanSpeech which contains textual materials from real-world conversational settings. These materials contain over 10 hours of a professional male voice actor's speech recorded at 44.1 kHz. This corpus's design and pipeline make RyanSpeech ideal for develo** TTS systems in real-world applications. To provide a baseline for future research, protocols, and benchmarks, we trained 4 state-of-the-art speech models and a vocoder on RyanSpeech. The results show 3.36 in mean opinion scores (MOS) in our best model. We have made both the corpus and trained models for public use. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2104.12269 [pdf, other]

A Bi-Encoder LSTM Model For Learning Unstructured Dialogs

Authors: Diwanshu Shekhar, Pooran S. Negi, Mohammad Mahoor

Abstract: Creating a data-driven model that is trained on a large dataset of unstructured dialogs is a crucial step in develo** Retrieval-based Chatbot systems. This paper presents a Long Short Term Memory (LSTM) based architecture that learns unstructured multi-turn dialogs and provides results on the task of selecting the best response from a collection of given responses. Ubuntu Dialog Corpus Version 2… ▽ More Creating a data-driven model that is trained on a large dataset of unstructured dialogs is a crucial step in develo** Retrieval-based Chatbot systems. This paper presents a Long Short Term Memory (LSTM) based architecture that learns unstructured multi-turn dialogs and provides results on the task of selecting the best response from a collection of given responses. Ubuntu Dialog Corpus Version 2 was used as the corpus for training. We show that our model achieves 0.8%, 1.0% and 0.3% higher accuracy for Recall@1, Recall@2 and Recall@5 respectively than the benchmark model. We also show results on experiments performed by using several similarity functions, model hyper-parameters and word embeddings on the proposed architecture △ Less

Submitted 25 April, 2021; originally announced April 2021.

arXiv:2103.06434 [pdf, other]

Topical Language Generation using Transformers

Authors: Rohola Zandie, Mohammad H. Mahoor

Abstract: Large-scale transformer-based language models (LMs) demonstrate impressive capabilities in open text generation. However, controlling the generated text's properties such as the topic, style, and sentiment is challenging and often requires significant changes to the model architecture or retraining and fine-tuning the model on new supervised data. This paper presents a novel approach for Topical L… ▽ More Large-scale transformer-based language models (LMs) demonstrate impressive capabilities in open text generation. However, controlling the generated text's properties such as the topic, style, and sentiment is challenging and often requires significant changes to the model architecture or retraining and fine-tuning the model on new supervised data. This paper presents a novel approach for Topical Language Generation (TLG) by combining a pre-trained LM with topic modeling information. We cast the problem using Bayesian probability formulation with topic probabilities as a prior, LM probabilities as the likelihood, and topical language generation probability as the posterior. In learning the model, we derive the topic probability distribution from the user-provided document's natural structure. Furthermore, we extend our model by introducing new parameters and functions to influence the quantity of the topical features presented in the generated text. This feature would allow us to easily control the topical properties of the generated text. Our experimental results demonstrate that our model outperforms the state-of-the-art results on coherency, diversity, and fluency while being faster in decoding. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: Accepted in the Journal of Natural Language Engineering

arXiv:2103.00119 [pdf, other]

ASMNet: a Lightweight Deep Neural Network for Face Alignment and Pose Estimation

Authors: Ali Pourramezan Fard, Hojjat Abdollahi, Mohammad Mahoor

Abstract: Active Shape Model (ASM) is a statistical model of object shapes that represents a target structure. ASM can guide machine learning algorithms to fit a set of points representing an object (e.g., face) onto an image. This paper presents a lightweight Convolutional Neural Network (CNN) architecture with a loss function being assisted by ASM for face alignment and estimating head pose in the wild. W… ▽ More Active Shape Model (ASM) is a statistical model of object shapes that represents a target structure. ASM can guide machine learning algorithms to fit a set of points representing an object (e.g., face) onto an image. This paper presents a lightweight Convolutional Neural Network (CNN) architecture with a loss function being assisted by ASM for face alignment and estimating head pose in the wild. We use ASM to first guide the network towards learning a smoother distribution of the facial landmark points. Inspired by transfer learning, during the training process, we gradually harden the regression problem and guide the network towards learning the original landmark points distribution. We define multi-tasks in our loss function that are responsible for detecting facial landmark points as well as estimating the face pose. Learning multiple correlated tasks simultaneously builds synergy and improves the performance of individual tasks. We compare the performance of our proposed model called ASMNet with MobileNetV2 (which is about 2 times bigger than ASMNet) in both the face alignment and pose estimation tasks. Experimental results on challenging datasets show that by using the proposed ASM assisted loss function, the ASMNet performance is comparable with MobileNetV2 in the face alignment task. In addition, for face pose estimation, ASMNet performs much better than MobileNetV2. ASMNet achieves an acceptable performance for facial landmark points detection and pose estimation while having a significantly smaller number of parameters and floating-point operations compared to many CNN-based models. △ Less

Submitted 7 May, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021 Biometrics Workshop, jointly with the Workshop on Analysis and Modeling of Faces and Gestures

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 1521-1530

arXiv:2009.13675 [pdf, other]

Deep Learning-based Symbolic Indoor Positioning using the Serving eNodeB

Authors: Fahad Alhomayani, Mohammad Mahoor

Abstract: This paper presents a novel indoor positioning method designed for residential apartments. The proposed method makes use of cellular signals emitting from a serving eNodeB which eliminates the need for specialized positioning infrastructure. Additionally, it utilizes Denoising Autoencoders to mitigate the effects of cellular signal loss. We evaluated the proposed method using real-world data colle… ▽ More This paper presents a novel indoor positioning method designed for residential apartments. The proposed method makes use of cellular signals emitting from a serving eNodeB which eliminates the need for specialized positioning infrastructure. Additionally, it utilizes Denoising Autoencoders to mitigate the effects of cellular signal loss. We evaluated the proposed method using real-world data collected from two different smartphones inside a representative apartment of eight symbolic spaces. Experimental results verify that the proposed method outperforms conventional symbolic indoor positioning techniques in various performance metrics. To promote reproducibility and foster new research efforts, we made all the data and codes associated with this work publicly available. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: - accepted paper (ICMLA 2020) - dataset and code: https://doi.org/10.6084/m9.figshare.13010387.v1

arXiv:2004.08495 [pdf, other]

doi 10.1109/TAFFC.2020.2986440

BReG-NeXt: Facial Affect Computing Using Adaptive Residual Networks With Bounded Gradient

Authors: Behzad Hasani, Pooran Singh Negi, Mohammad H. Mahoor

Abstract: This paper introduces BReG-NeXt, a residual-based network architecture using a function wtih bounded derivative instead of a simple shortcut path (a.k.a. identity map**) in the residual units for automatic recognition of facial expressions based on the categorical and dimensional models of affect. Compared to ResNet, our proposed adaptive complex map** results in a shallower network with less… ▽ More This paper introduces BReG-NeXt, a residual-based network architecture using a function wtih bounded derivative instead of a simple shortcut path (a.k.a. identity map**) in the residual units for automatic recognition of facial expressions based on the categorical and dimensional models of affect. Compared to ResNet, our proposed adaptive complex map** results in a shallower network with less numbers of training parameters and floating point operations per second (FLOPs). Adding trainable parameters to the bypass function further improves fitting and training the network and hence recognizing subtle facial expressions such as contempt with a higher accuracy. We conducted comprehensive experiments on the categorical and dimensional models of affect on the challenging in-the-wild databases of AffectNet, FER2013, and Affect-in-Wild. Our experimental results show that our adaptive complex map** approach outperforms the original ResNet consisting of a simple identity map** as well as other state-of-the-art methods for Facial Expression Recognition (FER). Various metrics are reported in both affect models to provide a comprehensive evaluation of our method. In the categorical model, BReG-NeXt-50 with only 3.1M training parameters and 15 MFLOPs, achieves 68.50% and 71.53% accuracy on AffectNet and FER2013 databases, respectively. In the dimensional model, BReG-NeXt achieves 0.2577 and 0.2882 RMSE value on AffectNet and Affect-in-Wild databases, respectively. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: To appear in IEEE Transactions on Affective Computing journal

Journal ref: 2020 IEEE Transactions on Affective Computing

arXiv:2003.02958 [pdf, other]

EmpTransfo: A Multi-head Transformer Architecture for Creating Empathetic Dialog Systems

Authors: Rohola Zandie, Mohammad H. Mahoor

Abstract: Understanding emotions and responding accordingly is one of the biggest challenges of dialog systems. This paper presents EmpTransfo, a multi-head Transformer architecture for creating an empathetic dialog system. EmpTransfo utilizes state-of-the-art pre-trained models (e.g., OpenAI-GPT) for language generation, though models with different sizes can be used. We show that utilizing the history of… ▽ More Understanding emotions and responding accordingly is one of the biggest challenges of dialog systems. This paper presents EmpTransfo, a multi-head Transformer architecture for creating an empathetic dialog system. EmpTransfo utilizes state-of-the-art pre-trained models (e.g., OpenAI-GPT) for language generation, though models with different sizes can be used. We show that utilizing the history of emotions and other metadata can improve the quality of generated conversations by the dialog system. Our experimental results using a challenging language corpus show that the proposed approach outperforms other models in terms of Hit@1 and PPL (Perplexity). △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:1909.06670 [pdf, other]

Delivering Cognitive Behavioral Therapy Using A Conversational SocialRobot

Authors: Francesca Dino, Rohola Zandie, Hojjat Abdollahi, Sarah Schoeder, Mohammad H. Mahoor

Abstract: Social robots are becoming an integrated part of our daily life due to their ability to provide companionship and entertainment. A subfield of robotics, Socially Assistive Robotics (SAR), is particularly suitable for expanding these benefits into the healthcare setting because of its unique ability to provide cognitive, social, and emotional support. This paper presents our recent research on deve… ▽ More Social robots are becoming an integrated part of our daily life due to their ability to provide companionship and entertainment. A subfield of robotics, Socially Assistive Robotics (SAR), is particularly suitable for expanding these benefits into the healthcare setting because of its unique ability to provide cognitive, social, and emotional support. This paper presents our recent research on develo** SAR by evaluating the ability of a life-like conversational social robot, called Ryan, to administer internet-delivered cognitive behavioral therapy (iCBT) to older adults with depression. For Ryan to administer the therapy, we developed a dialogue-management system, called Program-R. Using an accredited CBT manual for the treatment of depression, we created seven hour-long iCBT dialogues and integrated them into Program-R using Artificial Intelligence Markup Language (AIML). To assess the effectiveness of Robot-based iCBT and users' likability of our approach, we conducted an HRI study with a cohort of elderly people with mild-to-moderate depression over a period of four weeks. Quantitative analyses of participant's spoken responses (e.g. word count and sentiment analysis), face-scale mood scores, and exit surveys, strongly support the notion robot-based iCBT is a viable alternative to traditional human-delivered therapy. △ Less

Submitted 14 September, 2019; originally announced September 2019.

Comments: Accepted in IROS 2019

arXiv:1903.02110 [pdf, other]

doi 10.1109/FG.2019.8756587

Bounded Residual Gradient Networks (BReG-Net) for Facial Affect Computing

Authors: Behzad Hasani, Pooran Singh Negi, Mohammad H. Mahoor

Abstract: Residual-based neural networks have shown remarkable results in various visual recognition tasks including Facial Expression Recognition (FER). Despite the tremendous efforts have been made to improve the performance of FER systems using DNNs, existing methods are not generalizable enough for practical applications. This paper introduces Bounded Residual Gradient Networks (BReG-Net) for facial exp… ▽ More Residual-based neural networks have shown remarkable results in various visual recognition tasks including Facial Expression Recognition (FER). Despite the tremendous efforts have been made to improve the performance of FER systems using DNNs, existing methods are not generalizable enough for practical applications. This paper introduces Bounded Residual Gradient Networks (BReG-Net) for facial expression recognition, in which the shortcut connection between the input and the output of the ResNet module is replaced with a differentiable function with a bounded gradient. This configuration prevents the network from facing the vanishing or exploding gradient problem. We show that utilizing such non-linear units will result in shallower networks with better performance. Further, by using a weighted loss function which gives a higher priority to less represented categories, we can achieve an overall better recognition rate. The results of our experiments show that BReG-Nets outperform state-of-the-art methods on three publicly available facial databases in the wild, on both the categorical and dimensional models of affect. △ Less

Submitted 5 March, 2019; originally announced March 2019.

Comments: To appear in 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)

Journal ref: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)

arXiv:1812.09744 [pdf, other]

Leveraging Class Similarity to Improve Deep Neural Network Robustness

Authors: Pooran Singh Negi, David chan, Mohammad Mahoor

Abstract: Traditionally artificial neural networks (ANNs) are trained by minimizing the cross-entropy between a provided groundtruth delta distribution (encoded as one-hot vector) and the ANN's predictive softmax distribution. It seems, however, unacceptable to penalize networks equally for missclassification between classes. Confusing the class "Automobile" with the class "Truck" should be penalized less t… ▽ More Traditionally artificial neural networks (ANNs) are trained by minimizing the cross-entropy between a provided groundtruth delta distribution (encoded as one-hot vector) and the ANN's predictive softmax distribution. It seems, however, unacceptable to penalize networks equally for missclassification between classes. Confusing the class "Automobile" with the class "Truck" should be penalized less than confusing the class "Automobile" with the class "Donkey". To avoid such representation issues and learn cleaner classification boundaries in the network, this paper presents a variation of cross-entropy loss which depends not only on the sample class but also on a data-driven prior "class-similarity distribution" across the classes encoded in a matrix form. We explore learning the class-similarity distribution using a datadriven method and then show that by training with our modified similarity-driven loss, we obtain slightly better generalization performance over multiple architectures and datasets as well as improved performance on noisy testing scenarios. △ Less

Submitted 27 December, 2018; v1 submitted 23 December, 2018; originally announced December 2018.

arXiv:1812.04087 [pdf]

doi 10.1049/iet-stg.2018.0076

Distribution asset management through coordinated microgrid scheduling

Authors: Mohsen Mahoor, Alireza Majzoobi, Amin Khodaei

Abstract: Distribution Asset Management is an important task performed by utility companies to prolong the lifetime of the critical distribution assets and to accordingly ensure grid reliability by preventing unplanned outages. This study focuses on microgrid applications for distribution asset management as a viable and less expensive alternative to traditional utility practices in this area. A microgrid i… ▽ More Distribution Asset Management is an important task performed by utility companies to prolong the lifetime of the critical distribution assets and to accordingly ensure grid reliability by preventing unplanned outages. This study focuses on microgrid applications for distribution asset management as a viable and less expensive alternative to traditional utility practices in this area. A microgrid is as an emerging distribution technology that encompasses a variety of distribution technologies including distributed generation, demand response, and energy storage. Moreover, the substation transformer, as the most critical component in a distribution grid, is selected as the component of the choice for asset management studies. The resulting model is a microgrid-based distribution transformer asset management model in which microgrid exchanged power with the utility grid is reshaped in such a way that the distribution transformer lifetime is maximised. Numerical simulations on a test utility-owned microgrid demonstrate the effectiveness of the proposed model to reshape the loading of the distribution transformer at the point of interconnection in order to increase its lifetime. △ Less

Submitted 7 December, 2018; originally announced December 2018.

Comments: This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License

Journal ref: IET Smart Grid, 2018, Vol. 1 Iss. 4, pp. 159-168

arXiv:1807.07902 [pdf]

Battery Swap** Station as an Energy Storage for Capturing Distribution-Integrated Solar Variability

Authors: Zohreh S. Hosseini, Mohsen Mahoor, Amin Khodaei

Abstract: Managing the inherent variability of solar generation is a critical challenge for utility grid operators, particularly as the distribution grid-integrated solar generation is making fast inroads in power systems. This paper proposes to leverage Battery Swap** Station (BSS) as an energy storage for mitigating solar photovoltaic (PV) output fluctuations. Using mixed-integer programming, a model fo… ▽ More Managing the inherent variability of solar generation is a critical challenge for utility grid operators, particularly as the distribution grid-integrated solar generation is making fast inroads in power systems. This paper proposes to leverage Battery Swap** Station (BSS) as an energy storage for mitigating solar photovoltaic (PV) output fluctuations. Using mixed-integer programming, a model for the BSS optimal scheduling is proposed to capture solar generation variability. The proposed model aims at minimizing the BSS total operation cost, which represents the accumulated cost of exchanging power with the utility grid. The model is subject to four sets of constraints associated with the utility grid, the BSS system, individual batteries, and solar variability. Numerical simulations on a test BSS demonstrate the effectiveness of the proposed model and show its viability in hel** the utility grids host a higher penetration of solar generation. △ Less

Submitted 18 July, 2018; originally announced July 2018.

arXiv:1804.03190 [pdf]

Studying the Effects of Deep Brain Stimulation and Medication on the Dynamics of STN-LFP Signals for Human Behavior Analysis

Authors: Hosein M. Golshan, Adam O. Hebb, Joshua Nedrud, Mohammad H. Mahoor

Abstract: This paper presents the results of our recent work on studying the effects of deep brain stimulation (DBS) and medication on the dynamics of brain local field potential (LFP) signals used for behavior analysis of patients with Parkinson s disease (PD). DBS is a technique used to alleviate the severe symptoms of PD when pharmacotherapy is not very effective. Behavior recognition from the LFP signal… ▽ More This paper presents the results of our recent work on studying the effects of deep brain stimulation (DBS) and medication on the dynamics of brain local field potential (LFP) signals used for behavior analysis of patients with Parkinson s disease (PD). DBS is a technique used to alleviate the severe symptoms of PD when pharmacotherapy is not very effective. Behavior recognition from the LFP signals recorded from the subthalamic nucleus (STN) has application in develo** closed-loop DBS systems, where the stimulation pulse is adaptively generated according to subjects performing behavior. Most of the existing studies on behavior recognition that use STN-LFPs are based on the DBS being off. This paper discovers how the performance and accuracy of automated behavior recognition from the LFP signals are affected under different paradigms of stimulation on/off. We first study the notion of beta power suppression in LFP signals under different scenarios (stimulation on/off and medication on/off). Afterward, we explore the accuracy of support vector machines in predicting human actions (button press and reach) using the spectrogram of STN-LFP signals. Our experiments on the recorded LFP signals of three subjects confirm that the beta power is suppressed significantly when the patients take medication (p-value<0.002) or stimulation (p-value<0.0003). The results also show that we can classify different behaviors with a reasonable accuracy of 85% even when the high-amplitude stimulation is applied. △ Less

Submitted 9 April, 2018; originally announced April 2018.

Comments: 40th IEEE International Conference on Engineering in Medicine and Biology (IEEE EMBC), Honolulu, Hawaii, July 17-21, 2018

arXiv:1712.02881 [pdf, other]

doi 10.1109/HUMANOIDS.2017.8246925

A Pilot Study on Using an Intelligent Life-like Robot as a Companion for Elderly Individuals with Dementia and Depression

Authors: Hojjat Abdollahi, Ali Mollahosseini, Josh T. Lane, Mohammad H. Mahoor

Abstract: This paper presents the design, development, methodology, and the results of a pilot study on using an intelligent, emotive and perceptive social robot (aka Companionbot) for improving the quality of life of elderly people with dementia and/or depression. Ryan Companionbot prototyped in this project, is a rear-projected life-like conversational robot. Ryan is equipped with features that can (1) in… ▽ More This paper presents the design, development, methodology, and the results of a pilot study on using an intelligent, emotive and perceptive social robot (aka Companionbot) for improving the quality of life of elderly people with dementia and/or depression. Ryan Companionbot prototyped in this project, is a rear-projected life-like conversational robot. Ryan is equipped with features that can (1) interpret and respond to users' emotions through facial expressions and spoken language, (2) proactively engage in conversations with users, and (3) remind them about their daily life schedules (e.g. taking their medicine on time). Ryan engages users in cognitive games and reminiscence activities. We conducted a pilot study with six elderly individuals with moderate dementia and/or depression living in a senior living facility in Denver. Each individual had 24/7 access to a Ryan in his/her room for a period of 4-6 weeks. Our observations of these individuals, interviews with them and their caregivers, and analyses of their interactions during this period revealed that they established rapport with the robot and greatly valued and enjoyed having a Companionbot in their room. △ Less

Submitted 7 December, 2017; originally announced December 2017.

Comments: Published in 2017 IEEE-RAS International Conference on Humanoid Robots

arXiv:1711.03606 [pdf]

Distribution market as a ram** aggregator for grid flexibility support

Authors: Alireza Majzoobi, Mohsen Mahoor, Amin Khodaei

Abstract: The growing proliferation of microgrids and distributed energy resources in distribution networks has resulted in the development of Distribution Market Operator (DMO). This new entity will facilitate the management of the distributed resources and their interactions with upstream network and the wholesale market. At the same time, DMOs can tap into the flexibility potential of these distributed r… ▽ More The growing proliferation of microgrids and distributed energy resources in distribution networks has resulted in the development of Distribution Market Operator (DMO). This new entity will facilitate the management of the distributed resources and their interactions with upstream network and the wholesale market. At the same time, DMOs can tap into the flexibility potential of these distributed resources to address many of the challenges that system operators are facing. This paper investigates this opportunity and develops a distribution market scheduling model based on upstream network ram** flexibility requirements. That is, the distribution network will play the role of a flexibility resource in the system, with a relatively large size and potential, to help bulk system operators to address emerging ram** concerns. Numerical simulations demonstrate the effectiveness of the proposed model on when tested on a distribution system with several microgrids. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: IEEE PES Transmission and Distribution Conference and Exposition (T&D), Denver, CO, 16-19 Apr. 2018

arXiv:1711.03398 [pdf]

Data Fusion and Machine Learning Integration for Transformer Loss of Life Estimation

Authors: Mohsen Mahoor, Amin Khodaei

Abstract: Rapid growth of machine learning methodologies and their applications offer new opportunity for improved transformer asset management. Accordingly, power system operators are currently looking for data-driven methods to make better-informed decisions in terms of network management. In this paper, machine learning and data fusion techniques are integrated to estimate transformer loss of life. Using… ▽ More Rapid growth of machine learning methodologies and their applications offer new opportunity for improved transformer asset management. Accordingly, power system operators are currently looking for data-driven methods to make better-informed decisions in terms of network management. In this paper, machine learning and data fusion techniques are integrated to estimate transformer loss of life. Using IEEE Std. C57.91-2011, a data synthesis process is proposed based on hourly transformer loading and ambient temperature values. This synthesized data is employed to estimate transformer loss of life by using Adaptive Network-Based Fuzzy Inference System (ANFIS) and Radial Basis Function (RBF) network, which are further fused together with the objective of improving the estimation accuracy. Among various data fusion techniques, Ordered Weighted Averaging (OWA) and sequential Kalman filter are selected to fuse the output results of the estimated ANFIS and RBF. Simulation results demonstrate the merit and the effectiveness of the proposed method. △ Less

Submitted 8 November, 2017; originally announced November 2017.

arXiv:1710.06895 [pdf]

Electric Vehicle Battery Swap** Station

Authors: Mohsen Mahoor, Zohreh S. Hosseini, Amin Khodaei, D. Kushner

Abstract: Providing adequate charging infrastructure plays a momentous role in rapid proliferation of Electric Vehicles (EVs). Easy access to such infrastructure would remove various obstacles regarding limited EV mobility range. A Battery Swap** Station (BSS) is an effective approach in supplying power to the EVs, while mitigating long waiting times in a Battery Charging Station (BCS). In contrast with t… ▽ More Providing adequate charging infrastructure plays a momentous role in rapid proliferation of Electric Vehicles (EVs). Easy access to such infrastructure would remove various obstacles regarding limited EV mobility range. A Battery Swap** Station (BSS) is an effective approach in supplying power to the EVs, while mitigating long waiting times in a Battery Charging Station (BCS). In contrast with the BCS, the BSS charges the batteries in advance and prepares them to be swapped in a considerably short time. Considering that these stations can serve as an intermediate entity between the EV owners and the power system, they can potentially provide unique benefits to the power system. This paper investigates the advantages of building the BSS from various perspectives. Accordingly, a model for the scheduling of battery charging from the station owner perspective is proposed. An illustrative example is provided to show how the proposed model would help BSS owners in managing their assets through scheduling battery charging time. △ Less

Submitted 2 October, 2017; originally announced October 2017.

arXiv:1710.03803 [pdf]

Day-Ahead Solar Forecasting Based on Multi-level Solar Measurements

Authors: Mohana Alanazi, Mohsen Mahoor, Amin Khodaei

Abstract: The growing proliferation in solar deployment, especially at distribution level, has made the case for power system operators to develop more accurate solar forecasting models. This paper proposes a solar photovoltaic (PV) generation forecasting model based on multi-level solar measurements and utilizing a nonlinear autoregressive with exogenous input (NARX) model to improve the training and achie… ▽ More The growing proliferation in solar deployment, especially at distribution level, has made the case for power system operators to develop more accurate solar forecasting models. This paper proposes a solar photovoltaic (PV) generation forecasting model based on multi-level solar measurements and utilizing a nonlinear autoregressive with exogenous input (NARX) model to improve the training and achieve better forecasts. The proposed model consists of four stages of data preparation, establishment of fitting model, model training, and forecasting. The model is tested under different weather conditions. Numerical simulations exhibit the acceptable performance of the model when compared to forecasting results obtained from two-level and single-level studies. △ Less

Submitted 10 October, 2017; originally announced October 2017.

arXiv:1708.03985 [pdf, other]

doi 10.1109/TAFFC.2017.2740923

AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild

Authors: Ali Mollahosseini, Behzad Hasani, Mohammad H. Mahoor

Abstract: Automated affective computing in the wild setting is a challenging problem in computer vision. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To meet this need, we collect… ▽ More Automated affective computing in the wild setting is a challenging problem in computer vision. Existing annotated databases of facial expressions in the wild are small and mostly cover discrete emotions (aka the categorical model). There are very limited annotated facial databases for affective computing in the continuous dimensional model (e.g., valence and arousal). To meet this need, we collected, annotated, and prepared for public distribution a new database of facial emotions in the wild (called AffectNet). AffectNet contains more than 1,000,000 facial images from the Internet by querying three major search engines using 1250 emotion related keywords in six different languages. About half of the retrieved images were manually annotated for the presence of seven discrete facial expressions and the intensity of valence and arousal. AffectNet is by far the largest database of facial expression, valence, and arousal in the wild enabling research in automated facial expression recognition in two different emotion models. Two baseline deep neural networks are used to classify images in the categorical model and predict the intensity of valence and arousal. Various evaluation metrics show that our deep neural network baselines can perform better than conventional machine learning methods and off-the-shelf facial expression recognition systems. △ Less

Submitted 9 October, 2017; v1 submitted 13 August, 2017; originally announced August 2017.

Comments: IEEE Transactions on Affective Computing, 2017

arXiv:1707.09680 [pdf]

Microgrid Value of Ram**

Authors: Alireza Majzoobi, Mohsen Mahoor, Amin Khodaei

Abstract: The growing penetration of renewable generation in distribution networks, primarily deployed by end-use electricity customers, is changing the traditional load profile and inevitably makes supply-load balancing more challenging for grid operators. Leveraging the potential flexibility of existing microgrids, that is to help with supply-load balance locally, is a viable solution to cope with this ch… ▽ More The growing penetration of renewable generation in distribution networks, primarily deployed by end-use electricity customers, is changing the traditional load profile and inevitably makes supply-load balancing more challenging for grid operators. Leveraging the potential flexibility of existing microgrids, that is to help with supply-load balance locally, is a viable solution to cope with this challenge and mitigate existing net load variability and intermittency in distribution networks. This paper discusses this timely topic and determines the microgrid value of ram** based on its available reserve using a cost-benefit analysis. To this end, a microgrid ram**-oriented optimal scheduling model is developed and tested through numerical simulations to prove the effectiveness and the merits of the proposed approach in microgrid ram** valuation. △ Less

Submitted 30 July, 2017; originally announced July 2017.

Comments: 2017 IEEE International Conference on Smart Grid Communications (IEEE SmartGridComm), Dresden, Germany, 23-26 Oct. 2017

arXiv:1707.01777 [pdf]

Improved Selective Harmonic Elimination for Reducing Torque Harmonics of Induction Motors in Wide DC Bus Voltage Variations

Authors: Hossein Valiyan Holagh, Tooraj Abbasian Najafabadi, Mohsen Mahoor

Abstract: Conventionally, Selective Harmonic Elimination (SHE) method in 2-level inverters, finds best switching angles to reach first voltage harmonic to reference level and eliminate other harmonics, simultaneously. Considering Induction Motor (IM) as the inverter load, and wide DC bus voltage variations, the inverter must operate in both over-modulation and linear modulation region. Main objective of the… ▽ More Conventionally, Selective Harmonic Elimination (SHE) method in 2-level inverters, finds best switching angles to reach first voltage harmonic to reference level and eliminate other harmonics, simultaneously. Considering Induction Motor (IM) as the inverter load, and wide DC bus voltage variations, the inverter must operate in both over-modulation and linear modulation region. Main objective of the modified SHE is to reduce harmonic torques through finding the best switching angles. In this paper, optimization is based on optimizing phasor equations in which harmonic torques are calculated. The procedure of this method is that, first, the ratio of the same torque harmonics is estimated, secondly, by using that estimation, the ratio of voltage harmonics that generates homogeneous torques is calculated. For the estimation and the calculation of the ratios motor parameter, mechanical speed of the rotor, the applied frequency, and the concept of slip are used. The advantage of this approach is highlighted when mechanical load and DC bus voltage variations are taken into consideration. Simulation results are presented under a wide range of working conditions in an induction motor to demonstrate the effectiveness of the proposed method. △ Less

Submitted 5 July, 2017; originally announced July 2017.

arXiv:1706.08699 [pdf]

Two-Stage Hybrid Day-Ahead Solar Forecasting

Authors: Mohana Alanazi, Mohsen Mahoor, Amin Khodaei

Abstract: Power supply from renewable resources is on a global rise where it is forecasted that renewable generation will surpass other types of generation in a foreseeable future. Increased generation from renewable resources, mainly solar and wind, exposes the power grid to more vulnerabilities, conceivably due to their variable generation, thus highlighting the importance of accurate forecasting methods.… ▽ More Power supply from renewable resources is on a global rise where it is forecasted that renewable generation will surpass other types of generation in a foreseeable future. Increased generation from renewable resources, mainly solar and wind, exposes the power grid to more vulnerabilities, conceivably due to their variable generation, thus highlighting the importance of accurate forecasting methods. This paper proposes a two-stage day-ahead solar forecasting method that breaks down the forecasting into linear and nonlinear parts, determines subsequent forecasts, and accordingly, improves accuracy of the obtained results. To further reduce the error resulted from nonstationarity of the historical solar radiation data, a data processing approach, including pre-process and post-process levels, is integrated with the proposed method. Numerical simulations on three test days with different weather conditions exhibit the effectiveness of the proposed two-stage model. △ Less

Submitted 27 June, 2017; originally announced June 2017.

arXiv:1706.06408 [pdf]

Leveraging Adaptive Model Predictive Controller for Active Cell Balancing in Li-ion Battery

Authors: Seyed Mahmoud Salamati, Seyed Ali Salamati, Mohsen Mahoor, Farzad Rajaei Salmasi

Abstract: Automotive industry is moving toward fully electric and hybrid electric vehicles. Accordingly, energy storage unit is one of the most important blocks in these electric drives. Battery stacks which contain a number of cells are being used for supplying the vehicles' energy. Charge equalization for series connected battery strings has a significant effect on battery life. In this paper, an adaptive… ▽ More Automotive industry is moving toward fully electric and hybrid electric vehicles. Accordingly, energy storage unit is one of the most important blocks in these electric drives. Battery stacks which contain a number of cells are being used for supplying the vehicles' energy. Charge equalization for series connected battery strings has a significant effect on battery life. In this paper, an adaptive model predictive controller (AMPC) is proposed to manage the cell equalizing process. The series connected cells' voltages and currents are collected, then leveraging Recursive Least Square (RLS) method, the future voltage samples for all of the cells are predicted. MPC controller specifies a sequence which results in the optimum balancing performance of the proposed circuit. Simulation results prove that using the suggested algorithm, the voltage set of the series cells has moved more uniformly. △ Less

Submitted 17 June, 2017; originally announced June 2017.

arXiv:1706.06255 [pdf]

Leveraging Sensory Data in Estimating Transformer Lifetime

Authors: Mohsen Mahoor, Alireza Majzoobi, Zohreh S. Hosseini, Amin Khodaei

Abstract: Transformer lifetime assessments plays a vital role in reliable operation of power systems. In this paper, leveraging sensory data, an approach in estimating transformer lifetime is presented. The winding hottest-spot temperature, which is the pivotal driver that impacts transformer aging, is measured hourly via a temperature sensor, then transformer loss of life is calculated based on the IEEE St… ▽ More Transformer lifetime assessments plays a vital role in reliable operation of power systems. In this paper, leveraging sensory data, an approach in estimating transformer lifetime is presented. The winding hottest-spot temperature, which is the pivotal driver that impacts transformer aging, is measured hourly via a temperature sensor, then transformer loss of life is calculated based on the IEEE Std. C57.91-2011. A Cumulative Moving Average (CMA) model is subsequently applied to the data stream of the transformer loss of life to provide hourly estimates until convergence. Numerical examples demonstrate the effectiveness of the proposed approach for the transformer lifetime estimation, and explores its efficiency and practical merits. △ Less

Submitted 19 June, 2017; originally announced June 2017.

Comments: 2017 North American Power Symposium (NAPS), Morgantown, WV, 17-19 Sep. 2017

arXiv:1705.07884 [pdf, other]

doi 10.1109/CVPRW.2017.245

Facial Affect Estimation in the Wild Using Deep Residual and Convolutional Networks

Authors: Behzad Hasani, Mohammad H. Mahoor

Abstract: Automated affective computing in the wild is a challenging task in the field of computer vision. This paper presents three neural network-based methods proposed for the task of facial affect estimation submitted to the First Affect-in-the-Wild challenge. These methods are based on Inception-ResNet modules redesigned specifically for the task of facial affect estimation. These methods are: Shallow… ▽ More Automated affective computing in the wild is a challenging task in the field of computer vision. This paper presents three neural network-based methods proposed for the task of facial affect estimation submitted to the First Affect-in-the-Wild challenge. These methods are based on Inception-ResNet modules redesigned specifically for the task of facial affect estimation. These methods are: Shallow Inception-ResNet, Deep Inception-ResNet, and Inception-ResNet with LSTMs. These networks extract facial features in different scales and simultaneously estimate both the valence and arousal in each frame. Root Mean Square Error (RMSE) rates of 0.4 and 0.3 are achieved for the valence and arousal respectively with corresponding Concordance Correlation Coefficient (CCC) rates of 0.04 and 0.29 using Deep Inception-ResNet method. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Journal ref: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

arXiv:1705.07871 [pdf, other]

doi 10.1109/CVPRW.2017.282

Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks

Authors: Behzad Hasani, Mohammad H. Mahoor

Abstract: Deep Neural Networks (DNNs) have shown to outperform traditional methods in various visual recognition tasks including Facial Expression Recognition (FER). In spite of efforts made to improve the accuracy of FER systems using DNN, existing methods still are not generalizable enough in practical applications. This paper proposes a 3D Convolutional Neural Network method for FER in videos. This new n… ▽ More Deep Neural Networks (DNNs) have shown to outperform traditional methods in various visual recognition tasks including Facial Expression Recognition (FER). In spite of efforts made to improve the accuracy of FER systems using DNN, existing methods still are not generalizable enough in practical applications. This paper proposes a 3D Convolutional Neural Network method for FER in videos. This new network architecture consists of 3D Inception-ResNet layers followed by an LSTM unit that together extracts the spatial relations within facial images as well as the temporal relations between different frames in the video. Facial landmark points are also used as inputs to our network which emphasize on the importance of facial components rather than the facial regions that may not contribute significantly to generating facial expressions. Our proposed method is evaluated using four publicly available databases in subject-independent and cross-database tasks and outperforms state-of-the-art methods. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Journal ref: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

arXiv:1703.06995 [pdf, other]

doi 10.1109/FG.2017.99

Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields

Authors: Behzad Hasani, Mohammad H. Mahoor

Abstract: Automated Facial Expression Recognition (FER) has been a challenging task for decades. Many of the existing works use hand-crafted features such as LBP, HOG, LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as Support Vector Machines for expression recognition. These methods often require rigorous hyperparameter tuning to achieve good results. Recently Deep Neural Networks (… ▽ More Automated Facial Expression Recognition (FER) has been a challenging task for decades. Many of the existing works use hand-crafted features such as LBP, HOG, LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as Support Vector Machines for expression recognition. These methods often require rigorous hyperparameter tuning to achieve good results. Recently Deep Neural Networks (DNN) have shown to outperform traditional methods in visual object recognition. In this paper, we propose a two-part network consisting of a DNN-based architecture followed by a Conditional Random Field (CRF) module for facial expression recognition in videos. The first part captures the spatial relation within facial images using convolutional layers followed by three Inception-ResNet modules and two fully-connected layers. To capture the temporal relation between the image frames, we use linear chain CRF in the second part of our network. We evaluate our proposed network on three publicly available databases, viz. CK+, MMI, and FERA. Experiments are performed in subject-independent and cross-database manners. Our experimental results show that cascading the deep network architecture with the CRF module considerably increases the recognition of facial expressions in videos and in particular it outperforms the state-of-the-art methods in the cross-database experiments and yields comparable results in the subject-independent experiments. △ Less

Submitted 24 April, 2017; v1 submitted 20 March, 2017; originally announced March 2017.

Comments: To appear in 12th IEEE Conference on Automatic Face and Gesture Recognition Workshop

Journal ref: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)

arXiv:1703.01397 [pdf]

Machine Learning Applications in Estimating Transformer Loss of Life

Authors: Alireza Majzoobi, Mohsen Mahoor, Amin Khodaei

Abstract: Transformer life assessment and failure diagnostics have always been important problems for electric utility companies. Ambient temperature and load profile are the main factors which affect aging of the transformer insulation, and consequently, the transformer lifetime. The IEEE Std. C57.911995 provides a model for calculating the transformer loss of life based on ambient temperature and transfor… ▽ More Transformer life assessment and failure diagnostics have always been important problems for electric utility companies. Ambient temperature and load profile are the main factors which affect aging of the transformer insulation, and consequently, the transformer lifetime. The IEEE Std. C57.911995 provides a model for calculating the transformer loss of life based on ambient temperature and transformer's loading. In this paper, this standard is used to develop a data-driven static model for hourly estimation of the transformer loss of life. Among various machine learning methods for develo** this static model, the Adaptive Network-Based Fuzzy Inference System (ANFIS) is selected. Numerical simulations demonstrate the effectiveness and the accuracy of the proposed ANFIS method compared with other relevant machine learning based methods to solve this problem. △ Less

Submitted 4 March, 2017; originally announced March 2017.

Comments: IEEE Power and Energy Society General Meeting, 2017

arXiv:1612.08780 [pdf]

An FFT-based Synchronization Approach to Recognize Human Behaviors using STN-LFP Signal

Authors: Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, Mohammad H. Mahoor

Abstract: Classification of human behavior is key to develo** closed-loop Deep Brain Stimulation (DBS) systems, which may be able to decrease the power consumption and side effects of the existing systems. Recent studies have shown that the Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the brain can be used to recognize human behavior. Since the DBS leads implanted in each STN… ▽ More Classification of human behavior is key to develo** closed-loop Deep Brain Stimulation (DBS) systems, which may be able to decrease the power consumption and side effects of the existing systems. Recent studies have shown that the Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the brain can be used to recognize human behavior. Since the DBS leads implanted in each STN can collect three bipolar signals, the selection of a suitable pair of LFPs that achieves optimal recognition performance is still an open problem to address. Considering the presence of synchronized aggregate activity in the basal ganglia, this paper presents an FFT-based synchronization approach to automatically select a relevant pair of LFPs and use the pair together with an SVM-based MKL classifier for behavior recognition purposes. Our experiments on five subjects show the superiority of the proposed approach compared to other methods used for behavior classification. △ Less

Submitted 27 December, 2016; originally announced December 2016.

Comments: IEEE Conf on ICASSP 2017

arXiv:1607.07987 [pdf]

doi 10.1109/EMBC.2016.7590878

A Multiple Kernel Learning Approach for Human Behavioral Task Classification using STN-LFP Signal

Authors: Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, Mohammad H. Mahoor

Abstract: Deep Brain Stimulation (DBS) has gained increasing attention as an effective method to mitigate Parkinsons disease (PD) disorders. Existing DBS systems are open-loop such that the system parameters are not adjusted automatically based on patients behavior. Classification of human behavior is an important step in the design of the next generation of DBS systems that are closed-loop. This paper pres… ▽ More Deep Brain Stimulation (DBS) has gained increasing attention as an effective method to mitigate Parkinsons disease (PD) disorders. Existing DBS systems are open-loop such that the system parameters are not adjusted automatically based on patients behavior. Classification of human behavior is an important step in the design of the next generation of DBS systems that are closed-loop. This paper presents a classification approach to recognize such behavioral tasks using the subthalamic nucleus (STN) Local Field Potential (LFP) signals. In our approach, we use the time-frequency representation (spectrogram) of the raw LFP signals recorded from left and right STNs as the feature vectors. Then these features are combined together via Support Vector Machines (SVM) with Multiple Kernel Learning (MKL) formulation. The MKL-based classification method is utilized to classify different tasks: button press, mouth movement, speech, and arm movement. Our experiments show that the lp-norm MKL significantly outperforms single kernel SVM-based classifiers in classifying behavioral tasks of five subjects even using signals acquired with a low sampling rate of 10 Hz. This leads to a lower computational cost. △ Less

Submitted 27 July, 2016; originally announced July 2016.

Comments: 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Scociety

arXiv:1605.03639 [pdf, other]

doi 10.1109/CVPRW.2016.188

Facial Expression Recognition from World Wild Web

Authors: Ali Mollahosseini, Behzad Hassani, Michelle J. Salvador, Hojjat Abdollahi, David Chan, Mohammad H. Mahoor

Abstract: Recognizing facial expression in a wild setting has remained a challenging task in computer vision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. This paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions… ▽ More Recognizing facial expression in a wild setting has remained a challenging task in computer vision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. This paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions from the web. Three search engines were queried using 1250 emotion related keywords in six different languages and the retrieved images were mapped by two annotators to six basic expressions and neutral. Deep neural networks and noise modeling were used in three different training scenarios to find how accurately facial expressions can be recognized when trained on noisy images collected from the web using query terms (e.g. happy face, laughing man, etc)? The results of our experiments show that deep neural networks can recognize wild facial expressions with an accuracy of 82.12%. △ Less

Submitted 5 January, 2017; v1 submitted 11 May, 2016; originally announced May 2016.

arXiv:1511.06502 [pdf, other]

doi 10.1109/HUMANOIDS.2014.7041505

ExpressionBot: An Emotive Lifelike Robotic Face for Face-to-Face Communication

Authors: Ali Mollahosseini, Gabriel Graitzer, Eric Borts, Stephen Conyers, Richard M. Voyles, Ronald Cole, Mohammad H. Mahoor

Abstract: This article proposes an emotive lifelike robotic face, called ExpressionBot, that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The proposed robotic head consists of two major components: 1) a hardware component that contains a small projector, a fish-eye lens, a custom… ▽ More This article proposes an emotive lifelike robotic face, called ExpressionBot, that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The proposed robotic head consists of two major components: 1) a hardware component that contains a small projector, a fish-eye lens, a custom-designed mask and a neck system with 3 degrees of freedom; 2) a facial animation system, projected onto the robotic mask, that is capable of presenting facial expressions, realistic eye movement, and accurate visual speech. We present three studies that compare Human-Robot Interaction with Human-Computer Interaction with a screen-based model of the avatar. The studies indicate that the robotic face is well accepted by users, with some advantages in recognition of facial expression and mutual eye gaze contact. △ Less

Submitted 20 November, 2015; originally announced November 2015.

Journal ref: 14th IEEE-RAS International Conference on Humanoid Robots (Humanoids), 2014

arXiv:1511.06494 [pdf, other]

doi 10.1109/CVPRW.2013.129

Bidirectional War** of Active Appearance Model

Authors: Ali Mollahosseini, Mohammad H. Mahoor

Abstract: Active Appearance Model (AAM) is a commonly used method for facial image analysis with applications in face identification and facial expression recognition. This paper proposes a new approach based on image alignment for AAM fitting called bidirectional war**. Previous approaches warp either the input image or the appearance template. We propose to warp both the input image, using incremental u… ▽ More Active Appearance Model (AAM) is a commonly used method for facial image analysis with applications in face identification and facial expression recognition. This paper proposes a new approach based on image alignment for AAM fitting called bidirectional war**. Previous approaches warp either the input image or the appearance template. We propose to warp both the input image, using incremental update by an affine transformation, and the appearance template, using an inverse compositional approach. Our experimental results on Multi-PIE face database show that the bidirectional approach outperforms state-of-the-art inverse compositional fitting approaches in extracting landmark points of faces with shape and pose variations. △ Less

Submitted 20 November, 2015; originally announced November 2015.

Journal ref: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

arXiv:1511.06491 [pdf, other]

doi 10.1109/ROMAN.2014.6926378

eBear: An Expressive Bear-Like Robot

Authors: Xiao Zhang, Ali Mollahosseini, Amir H. Kargar B., Evan Boucher, Richard M. Voyles, Rodney Nielsen, Mohammd H. Mahoor

Abstract: This paper presents an anthropomorphic robotic bear for the exploration of human-robot interaction including verbal and non-verbal communications. This robot is implemented with a hybrid face composed of a mechanical faceplate with 10 DOFs and an LCD-display-equipped mouth. The facial emotions of the bear are designed based on the description of the Facial Action Coding System as well as some anim… ▽ More This paper presents an anthropomorphic robotic bear for the exploration of human-robot interaction including verbal and non-verbal communications. This robot is implemented with a hybrid face composed of a mechanical faceplate with 10 DOFs and an LCD-display-equipped mouth. The facial emotions of the bear are designed based on the description of the Facial Action Coding System as well as some animal-like gestures described by Darwin. The mouth movements are realized by synthesizing emotions with speech. User acceptance investigations have been conducted to evaluate the likability of these facial behaviors exhibited by the eBear. Multiple Kernel Learning is proposed to fuse different features for recognizing user's facial expressions. Our experimental results show that the developed Bear-Like robot can perceive basic facial expressions and provide emotive conveyance towards human beings. △ Less

Submitted 20 November, 2015; originally announced November 2015.

Journal ref: The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014 RO-MAN

arXiv:1511.04110 [pdf, other]

doi 10.1109/WACV.2016.7477450

Going Deeper in Facial Expression Recognition using Deep Neural Networks

Authors: Ali Mollahosseini, David Chan, Mohammad H. Mahoor

Abstract: Automated Facial Expression Recognition (FER) has remained a challenging and interesting problem. Despite efforts made in develo** various methods for FER, existing approaches traditionally lack generalizability when applied to unseen images or those that are captured in wild setting. Most of the existing approaches are based on engineered features (e.g. HOG, LBPH, and Gabor) where the classifie… ▽ More Automated Facial Expression Recognition (FER) has remained a challenging and interesting problem. Despite efforts made in develo** various methods for FER, existing approaches traditionally lack generalizability when applied to unseen images or those that are captured in wild setting. Most of the existing approaches are based on engineered features (e.g. HOG, LBPH, and Gabor) where the classifier's hyperparameters are tuned to give best recognition accuracies across a single database, or a small collection of similar databases. Nevertheless, the results are not significant when they are applied to novel data. This paper proposes a deep neural network architecture to address the FER problem across multiple well-known standard face datasets. Specifically, our network consists of two convolutional layers each followed by max pooling and then four Inception layers. The network is a single component architecture that takes registered facial images as the input and classifies them into either of the six basic or the neutral expressions. We conducted comprehensive experiments on seven publically available facial expression databases, viz. MultiPIE, MMI, CK+, DISFA, FERA, SFEW, and FER2013. The results of proposed architecture are comparable to or better than the state-of-the-art methods and better than traditional convolutional neural networks and in both accuracy and training time. △ Less

Submitted 12 November, 2015; originally announced November 2015.

Comments: To be appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 2016 {Accepted in first round submission}

Journal ref: IEEE Winter Conference on Applications of Computer Vision (WACV), 2016

arXiv:1511.03603 [pdf, other]

doi 10.1109/EMBC.2014.6944375

Automatic Measurement of Physical Mobility in Get-Up-and-Go Test Using Kinect Sensor

Authors: Amir H. Kargar B., Ali Mollahosseini, Taylor Struemph, Wilson Pace, Rodney D. Nielsen, Mohammad H. Mahoor

Abstract: Get-Up-and-Go Test is commonly used for assessing the physical mobility of the elderly by physicians. This paper presents a method for automatic analysis and classification of human gait in the Get-Up-and-Go Test using a Microsoft Kinect sensor. Two types of features are automatically extracted from the human skeleton data provided by the Kinect sensor. The first type of feature is related to the… ▽ More Get-Up-and-Go Test is commonly used for assessing the physical mobility of the elderly by physicians. This paper presents a method for automatic analysis and classification of human gait in the Get-Up-and-Go Test using a Microsoft Kinect sensor. Two types of features are automatically extracted from the human skeleton data provided by the Kinect sensor. The first type of feature is related to the human gait (e.g., number of steps, step duration, and turning duration); whereas the other one describes the anatomical configuration (e.g., knee angles, leg angle, and distance between elbows). These features characterize the degree of human physical mobility. State-of-the-art machine learning algorithms (i.e. Bag of Words and Support Vector Machines) are used to classify the severity of gaits in 12 subjects with ages ranging between 65 and 90 enrolled in a pilot study. Our experimental results show that these features can discriminate between patients who have a high risk for falling and patients with a lower fall risk. △ Less

Submitted 11 November, 2015; originally announced November 2015.

Comments: Published in: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE

Showing 1–49 of 49 results for author: Mahoor, M