Search | arXiv e-print repository

Mood as a Contextual Cue for Improved Emotion Inference

Authors: Soujanya Narayana, Ibrahim Radwan, Ramanathan Subramanian, Roland Goecke

Abstract: Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} p… ▽ More Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} prediction from videos, fusing multimodal cues including \emph{mood} and \emph{emotion-change} ($Δ$) labels, (b) serially integrates spatial and channel attention for improved inference, and (c) demonstrates algorithmic generalisability with experiments on the \emph{EMMA} and \emph{AffWild2} datasets. Empirical results affirm that utilising mood labels is highly beneficial for dynamic valence prediction. Comparing \emph{unimodal} (training only with mood labels) vs \emph{multimodal} (training with mood and $Δ$ labels) results, inference performance improves for the latter, conveying that both long and short-term contextual cues are critical for time-continuous emotion inference. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 5 figures, 5 tables

arXiv:2308.02173 [pdf, other]

Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning

Authors: Ravikiran Parameshwara, Ibrahim Radwan, Akshay Asthana, Iman Abbasnejad, Ramanathan Subramanian, Roland Goecke

Abstract: Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (\textbf{MT-CLAR}) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network tra… ▽ More Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (\textbf{MT-CLAR}) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network trained via contrastive learning to infer from a pair of expressive facial images (a) the (dis)similarity between the facial expressions, and (b) the difference in valence and arousal levels of the two faces. We further extend the image-based MT-CLAR framework for automated video labelling where, given one or a few labelled video frames (termed \textit{support-set}), MT-CLAR labels the remainder of the video for valence and arousal. Experiments are performed on the AFEW-VA dataset with multiple support-set configurations; moreover, supervised learning on representations learnt via MT-CLAR are used for valence, arousal and categorical emotion prediction on the AffectNet and AFEW-VA datasets. The results show that valence and arousal predictions via MT-CLAR are very comparable to the state-of-the-art (SOTA), and we significantly outperform SOTA with a support-set $\approx$6\% the size of the video dataset. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 10 pages, 6 figures, to be published in Proceedings of the 31st ACM International Conference on Multimedia (MM '23)

arXiv:2307.12241 [pdf, other]

Explainable Depression Detection via Head Motion Patterns

Authors: Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke

Abstract: While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding t… ▽ More While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2306.06979 [pdf, other]

A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference

Authors: Soujanya Narayana, Ibrahim Radwan, Ravikiran Parameshwara, Iman Abbasnejad, Akshay Asthana, Ramanathan Subramanian, Roland Goecke

Abstract: Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips,… ▽ More Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips, in alignment with the characterisation of mood. We generate the emotion-change ($Δ$) labels via metric learning from a pre-trained Siamese Network, and use these in addition to mood labels for mood classification. Experiments evaluating \textit{unimodal} (training only using mood labels) vs \textit{multimodal} (training using mood plus $Δ$ labels) models show that mood prediction benefits from the incorporation of emotion-change information, emphasising the importance of modelling the mood-emotion interplay for effective mood inference. △ Less

Submitted 16 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Comments: 9 pages, 3 figures, 6 tables, published in IEEE International Conference on Affective Computing and Intelligent Interaction

arXiv:2303.06632 [pdf, other]

Focus on Change: Mood Prediction by Learning Emotion Changes via Spatio-Temporal Attention

Authors: Soujanya Narayana, Ramanathan Subramanian, Ibrahim Radwan, Roland Goecke

Abstract: While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform… ▽ More While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform mood prediction by incorporating both mood and emotion change information. We additionally explore spatial and temporal attention, and parallel/sequential arrangements of the spatial and temporal attention modules to improve mood prediction performance. To examine generalizability of the proposed method, we evaluate models trained on the AFEW dataset with EMMA. Experiments reveal that (a) emotion change information is inherently beneficial to mood prediction, and (b) prediction performance improves with the integration of sequential and parallel spatial-temporal attention modules. △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2302.09817 [pdf, other]

Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics

Authors: Surbhi Madan, Monika Gahalawat, Tanaya Guha, Roland Goecke, Ramanathan Subramanian

Abstract: We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors whi… ▽ More We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors while also enabling explainability in support of the predictions. For fusing cues, we explore decision and feature-level fusion, and an additive attention-based fusion strategy which quantifies the relative importance of the three modalities for trait prediction. Examining various long-short term memory (LSTM) architectures for classification and regression on the MIT Interview and First Impressions Candidate Screening (FICS) datasets, we note that: (1) Multimodal approaches outperform unimodal counterparts; (2) Efficient trait predictions and plausible explanations are achieved with both unimodal and multimodal approaches, and (3) Following the thin-slice approach, effective trait prediction is achieved even from two-second behavioral snippets. △ Less

Submitted 23 February, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2210.00719 [pdf, other]

To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion

Authors: Soujanya Narayana, Ramanathan Subramanian, Ibrahim Radwan, Roland Goecke

Abstract: Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framewor… ▽ More Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framework that utilises both dominant emotion (or mood) labels, and emotional change labels on the AFEW-VA database. Experiments evaluating unimodal (trained only using mood labels) and multimodal (trained with both mood and emotion change labels) convolutional neural networks confirm that incorporating emotional change information in the network training process can significantly improve the mood prediction performance, thus highlighting the importance of modelling emotion and mood simultaneously for improved performance in affective state recognition. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 5 pages, 5 figures

arXiv:2207.07297 [pdf, other]

Affective Computational Advertising Based on Perceptual Metrics

Authors: Soujanya Narayana, Shweta Jain, Harish Katti, Roland Goecke, Ramanathan Subramanian

Abstract: We present \textbf{ACAD}, an \textbf{a}ffective \textbf{c}omputational \textbf{ad}vertising framework expressly derived from perceptual metrics. Different from advertising methods which either ignore the emotional nature of (most) programs and ads, or are based on axiomatic rules, the ACAD formulation incorporates findings from a user study examining the effect of within-program ad placements on a… ▽ More We present \textbf{ACAD}, an \textbf{a}ffective \textbf{c}omputational \textbf{ad}vertising framework expressly derived from perceptual metrics. Different from advertising methods which either ignore the emotional nature of (most) programs and ads, or are based on axiomatic rules, the ACAD formulation incorporates findings from a user study examining the effect of within-program ad placements on ad perception. A linear program formulation seeking to achieve (a) \emph{genuine} ad assessments and (b) \emph{maximal} ad recall is then proposed. Effectiveness of the ACAD framework is confirmed via a validational user study, where ACAD-induced ad placements are found to be optimal with respect to objectives (a) and (b) against competing approaches. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: 13 pages, 13 figures

arXiv:2202.12936 [pdf, other]

Automated Parkinson's Disease Detection and Affective Analysis from Emotional EEG Signals

Authors: Ravikiran Parameshwara, Soujanya Narayana, Murugappan Murugappan, Ramanathan Subramanian, Ibrahim Radwan, Roland Goecke

Abstract: While Parkinson's disease (PD) is typically characterized by motor disorder, there is evidence of diminished emotion perception in PD patients. This study examines the utility of affective Electroencephalography (EEG) signals to understand emotional differences between PD vs Healthy Controls (HC), and for automated PD detection. Employing traditional machine learning and deep learning methods, we… ▽ More While Parkinson's disease (PD) is typically characterized by motor disorder, there is evidence of diminished emotion perception in PD patients. This study examines the utility of affective Electroencephalography (EEG) signals to understand emotional differences between PD vs Healthy Controls (HC), and for automated PD detection. Employing traditional machine learning and deep learning methods, we explore (a) dimensional and categorical emotion recognition, and (b) PD vs HC classification from emotional EEG signals. Our results reveal that PD patients comprehend arousal better than valence, and amongst emotion categories, \textit{fear}, \textit{disgust} and \textit{surprise} less accurately, and \textit{sadness} most accurately. Mislabeling analyses confirm confounds among opposite-valence emotions with PD data. Emotional EEG responses also achieve near-perfect PD vs HC recognition. {Cumulatively, our study demonstrates that (a) examining \textit{implicit} responses alone enables (i) discovery of valence-related impairments in PD patients, and (ii) differentiation of PD from HC, and (b) emotional EEG analysis is an ecologically-valid, effective, facile and sustainable tool for PD diagnosis vis-á-vis self reports, expert assessments and resting-state analysis.} △ Less

Submitted 20 February, 2022; originally announced February 2022.

Comments: 12 pages, 6 figures

arXiv:2006.12041 [pdf, other]

Characterizing Hirability via Personality and Behavior

Authors: Harshit Malik, Hersh Dhillon, Roland Goecke, Ramanathan Subramanian

Abstract: While personality traits have been extensively modeled as behavioral constructs, we model \textbf{\textit{job hirability}} as a \emph{personality construct}. On the {\emph{First Impressions Candidate Screening}} (FICS) dataset, we examine relationships among personality and hirability measures. Modeling hirability as a discrete/continuous variable with the \emph{big-five} personality traits as pre… ▽ More While personality traits have been extensively modeled as behavioral constructs, we model \textbf{\textit{job hirability}} as a \emph{personality construct}. On the {\emph{First Impressions Candidate Screening}} (FICS) dataset, we examine relationships among personality and hirability measures. Modeling hirability as a discrete/continuous variable with the \emph{big-five} personality traits as predictors, we utilize (a) apparent personality annotations, and (b) personality estimates obtained via audio, visual and textual cues for hirability prediction (HP). We also examine the efficacy of a two-step HP process involving (1) personality estimation from multimodal behavioral cues, followed by (2) HP from personality estimates. Interesting results from experiments performed on $\approx$~5000 FICS videos are as follows. (1) For each of the \emph{text}, \emph{audio} and \emph{visual} modalities, HP via the above two-step process is more effective than directly predicting from behavioral cues. Superior results are achieved when hirability is modeled as a continuous vis-á-vis categorical variable. (2) Among visual cues, eye and bodily information achieve performance comparable to face cues for predicting personality and hirability. (3) Explanatory analyses reveal the impact of multimodal behavior on personality impressions; \eg, Conscientiousness impressions are impacted by the use of \emph{cuss words} (verbal behavior), and \emph{eye movements} (non-verbal behavior), confirming prior observations. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: 9 pages

arXiv:1904.00887 [pdf, other]

Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks

Authors: Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianbing Shen, Ling Shao

Abstract: Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such pertur… ▽ More Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such perturbations is the close proximity of different class samples in the learned feature space. This allows model decisions to be totally changed by adding an imperceptible perturbation in the inputs. To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks. Specifically, we force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes. In this manner, the network is forced to learn distinct and distant decision regions for each class. We observe that this simple constraint on the features greatly enhances the robustness of learned models, even against the strongest white-box attacks, without degrading the classification performance on clean images. We report extensive evaluations in both black-box and white-box attack scenarios and show significant gains in comparison to state-of-the art defenses. △ Less

Submitted 28 July, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

Comments: Accepted at ICCV 2019

arXiv:1809.03313 [pdf, other]

A Global Alignment Kernel based Approach for Group-level Happiness Intensity Estimation

Authors: Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, Guoying Zhao

Abstract: With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment… ▽ More With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment kernels (GAKs) to estimate the happiness intensity of a group of people. We first exploit Riesz-based volume local binary pattern (RVLBP) and deep convolutional neural network (CNN) based features for characterizing facial images. Furthermore, we propose to use the GAK for RVLBP and deep CNN features, respectively for explicitly measuring the similarity of two group-level images. Specifically, we exploit the global weight sort scheme to sort the face images from group-level image according to their spatial weights, making an efficient data structure to GAK. Lastly, we propose Multiple kernel learning based on three combination strategies for combining two respective GAKs based on RVLBP and deep CNN features, such that enhancing the discriminative ability of each GAK. Intensive experiments are performed on the challenging group-level happiness intensity database, namely HAPPEI. Our experimental results demonstrate that the proposed approach achieves promising performance for group happiness intensity analysis, when compared with the recent state-of-the-art methods. △ Less

Submitted 3 September, 2018; originally announced September 2018.

arXiv:1808.07773 [pdf, other]

EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction

Authors: Abhinav Dhall, Amanjot Kaur, Roland Goecke, Tom Gedeon

Abstract: This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing a common platform to researchers working in the affective computing community to benchmark their algorithms on `in the wild' data. This year EmotiW contains three sub-chal… ▽ More This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing a common platform to researchers working in the affective computing community to benchmark their algorithms on `in the wild' data. This year EmotiW contains three sub-challenges: a) Audio-video based emotion recognition; b) Student engagement prediction; and c) Group-level emotion recognition. The databases, protocols and baselines are discussed in detail. △ Less

Submitted 23 August, 2018; originally announced August 2018.

arXiv:1610.03640 [pdf, other]

Analyzing the Affect of a Group of People Using Multi-modal Framework

Authors: Xiaohua Huang, Abhinav Dhall, Xin Liu, Guoying Zhao, **gang Shi, Roland Goecke, Matti Pietikainen

Abstract: Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing works… ▽ More Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing works on Group-level Emotion Recognition (GER) have investigated on face-level information. Due to the challenging environments, face may not provide enough information to GER. Relatively few studies have investigated multi-modal GER. Therefore, we propose a novel multi-modal approach based on a new feature description for understanding emotional state of a group of people in an image. In this paper, we firstly exploit three kinds of rich information containing face, upperbody and scene in a group-level image. Furthermore, in order to integrate multiple person's information in a group-level image, we propose an information aggregation method to generate three features for face, upperbody and scene, respectively. We fuse face, upperbody and scene information for robustness of GER against the challenging environments. Intensive experiments are performed on two challenging group-level emotion databases to investigate the role of face, upperbody and scene as well as multi-modal framework. Experimental results demonstrate that our framework achieves very promising performance for GER. △ Less

Submitted 13 October, 2016; v1 submitted 12 October, 2016; originally announced October 2016.

Comments: 11 pages. Submitted to the IEEE Transactions on Cybernetics

arXiv:1512.06498 [pdf, ps, other]

Harnessing the Deep Net Object Models for Enhancing Human Action Recognition

Authors: O. V. Ramana Murthy, Roland Goecke

Abstract: In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense traje… ▽ More In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense trajectories may fail to detect them. Hence we propose to detect objects using pre-trained object detectors in every frame statically. Trained Deep network models are used as object detectors. Information from different layers in conjunction with different encoding techniques is extensively studied to obtain the richest feature vectors. This technique is observed to yield state-of-the-art performance on HMDB51 and UCF101 datasets. △ Less

Submitted 23 December, 2015; v1 submitted 21 December, 2015; originally announced December 2015.

Comments: 6 pages. arXiv admin note: text overlap with arXiv:1411.4006 by other authors

arXiv:1512.01055 [pdf, other]

Occlusion-Aware Human Pose Estimation with Mixtures of Sub-Trees

Authors: Ibrahim Radwan, Abhinav Dhall, Roland Goecke

Abstract: In this paper, we study the problem of learning a model for human pose estimation as mixtures of compositional sub-trees in two layers of prediction. This involves estimating the pose of a sub-tree followed by identifying the relationships between sub-trees that are used to handle occlusions between different parts. The mixtures of the sub-trees are learnt utilising both geometric and appearance d… ▽ More In this paper, we study the problem of learning a model for human pose estimation as mixtures of compositional sub-trees in two layers of prediction. This involves estimating the pose of a sub-tree followed by identifying the relationships between sub-trees that are used to handle occlusions between different parts. The mixtures of the sub-trees are learnt utilising both geometric and appearance distances. The Chow-Liu (CL) algorithm is recursively applied to determine the inter-relations between the nodes and to build the structure of the sub-trees. These structures are used to learn the latent parameters of the sub-trees and the inference is done using a standard belief propagation technique. The proposed method handles occlusions during the inference process by identifying overlap** regions between different sub-trees and introducing a penalty term for overlap** parts. Experiments are performed on three different datasets: the Leeds Sports, Image Parse and UIUC People datasets. The results show the robustness of the proposed method to occlusions over the state-of-the-art approaches. △ Less

Submitted 3 December, 2015; originally announced December 2015.

Comments: 12 pages, 5 figures and 3 Tables

Showing 1–16 of 16 results for author: Goecke, R