-
Mood as a Contextual Cue for Improved Emotion Inference
Authors:
Soujanya Narayana,
Ibrahim Radwan,
Ramanathan Subramanian,
Roland Goecke
Abstract:
Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} p…
▽ More
Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} prediction from videos, fusing multimodal cues including \emph{mood} and \emph{emotion-change} ($Δ$) labels, (b) serially integrates spatial and channel attention for improved inference, and (c) demonstrates algorithmic generalisability with experiments on the \emph{EMMA} and \emph{AffWild2} datasets. Empirical results affirm that utilising mood labels is highly beneficial for dynamic valence prediction. Comparing \emph{unimodal} (training only with mood labels) vs \emph{multimodal} (training with mood and $Δ$ labels) results, inference performance improves for the latter, conveying that both long and short-term contextual cues are critical for time-continuous emotion inference.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning
Authors:
Ravikiran Parameshwara,
Ibrahim Radwan,
Akshay Asthana,
Iman Abbasnejad,
Ramanathan Subramanian,
Roland Goecke
Abstract:
Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (\textbf{MT-CLAR}) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network tra…
▽ More
Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (\textbf{MT-CLAR}) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network trained via contrastive learning to infer from a pair of expressive facial images (a) the (dis)similarity between the facial expressions, and (b) the difference in valence and arousal levels of the two faces. We further extend the image-based MT-CLAR framework for automated video labelling where, given one or a few labelled video frames (termed \textit{support-set}), MT-CLAR labels the remainder of the video for valence and arousal. Experiments are performed on the AFEW-VA dataset with multiple support-set configurations; moreover, supervised learning on representations learnt via MT-CLAR are used for valence, arousal and categorical emotion prediction on the AffectNet and AFEW-VA datasets. The results show that valence and arousal predictions via MT-CLAR are very comparable to the state-of-the-art (SOTA), and we significantly outperform SOTA with a support-set $\approx$6\% the size of the video dataset.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Explainable Depression Detection via Head Motion Patterns
Authors:
Monika Gahalawat,
Raul Fernandez Rojas,
Tanaya Guha,
Ramanathan Subramanian,
Roland Goecke
Abstract:
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding t…
▽ More
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference
Authors:
Soujanya Narayana,
Ibrahim Radwan,
Ravikiran Parameshwara,
Iman Abbasnejad,
Akshay Asthana,
Ramanathan Subramanian,
Roland Goecke
Abstract:
Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips,…
▽ More
Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips, in alignment with the characterisation of mood. We generate the emotion-change ($Δ$) labels via metric learning from a pre-trained Siamese Network, and use these in addition to mood labels for mood classification. Experiments evaluating \textit{unimodal} (training only using mood labels) vs \textit{multimodal} (training using mood plus $Δ$ labels) models show that mood prediction benefits from the incorporation of emotion-change information, emphasising the importance of modelling the mood-emotion interplay for effective mood inference.
△ Less
Submitted 16 August, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Focus on Change: Mood Prediction by Learning Emotion Changes via Spatio-Temporal Attention
Authors:
Soujanya Narayana,
Ramanathan Subramanian,
Ibrahim Radwan,
Roland Goecke
Abstract:
While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform…
▽ More
While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform mood prediction by incorporating both mood and emotion change information. We additionally explore spatial and temporal attention, and parallel/sequential arrangements of the spatial and temporal attention modules to improve mood prediction performance. To examine generalizability of the proposed method, we evaluate models trained on the AFEW dataset with EMMA. Experiments reveal that (a) emotion change information is inherently beneficial to mood prediction, and (b) prediction performance improves with the integration of sequential and parallel spatial-temporal attention modules.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics
Authors:
Surbhi Madan,
Monika Gahalawat,
Tanaya Guha,
Roland Goecke,
Ramanathan Subramanian
Abstract:
We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors whi…
▽ More
We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors while also enabling explainability in support of the predictions. For fusing cues, we explore decision and feature-level fusion, and an additive attention-based fusion strategy which quantifies the relative importance of the three modalities for trait prediction. Examining various long-short term memory (LSTM) architectures for classification and regression on the MIT Interview and First Impressions Candidate Screening (FICS) datasets, we note that: (1) Multimodal approaches outperform unimodal counterparts; (2) Efficient trait predictions and plausible explanations are achieved with both unimodal and multimodal approaches, and (3) Following the thin-slice approach, effective trait prediction is achieved even from two-second behavioral snippets.
△ Less
Submitted 23 February, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion
Authors:
Soujanya Narayana,
Ramanathan Subramanian,
Ibrahim Radwan,
Roland Goecke
Abstract:
Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framewor…
▽ More
Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framework that utilises both dominant emotion (or mood) labels, and emotional change labels on the AFEW-VA database. Experiments evaluating unimodal (trained only using mood labels) and multimodal (trained with both mood and emotion change labels) convolutional neural networks confirm that incorporating emotional change information in the network training process can significantly improve the mood prediction performance, thus highlighting the importance of modelling emotion and mood simultaneously for improved performance in affective state recognition.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Affective Computational Advertising Based on Perceptual Metrics
Authors:
Soujanya Narayana,
Shweta Jain,
Harish Katti,
Roland Goecke,
Ramanathan Subramanian
Abstract:
We present \textbf{ACAD}, an \textbf{a}ffective \textbf{c}omputational \textbf{ad}vertising framework expressly derived from perceptual metrics. Different from advertising methods which either ignore the emotional nature of (most) programs and ads, or are based on axiomatic rules, the ACAD formulation incorporates findings from a user study examining the effect of within-program ad placements on a…
▽ More
We present \textbf{ACAD}, an \textbf{a}ffective \textbf{c}omputational \textbf{ad}vertising framework expressly derived from perceptual metrics. Different from advertising methods which either ignore the emotional nature of (most) programs and ads, or are based on axiomatic rules, the ACAD formulation incorporates findings from a user study examining the effect of within-program ad placements on ad perception. A linear program formulation seeking to achieve (a) \emph{genuine} ad assessments and (b) \emph{maximal} ad recall is then proposed. Effectiveness of the ACAD framework is confirmed via a validational user study, where ACAD-induced ad placements are found to be optimal with respect to objectives (a) and (b) against competing approaches.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Automated Parkinson's Disease Detection and Affective Analysis from Emotional EEG Signals
Authors:
Ravikiran Parameshwara,
Soujanya Narayana,
Murugappan Murugappan,
Ramanathan Subramanian,
Ibrahim Radwan,
Roland Goecke
Abstract:
While Parkinson's disease (PD) is typically characterized by motor disorder, there is evidence of diminished emotion perception in PD patients. This study examines the utility of affective Electroencephalography (EEG) signals to understand emotional differences between PD vs Healthy Controls (HC), and for automated PD detection. Employing traditional machine learning and deep learning methods, we…
▽ More
While Parkinson's disease (PD) is typically characterized by motor disorder, there is evidence of diminished emotion perception in PD patients. This study examines the utility of affective Electroencephalography (EEG) signals to understand emotional differences between PD vs Healthy Controls (HC), and for automated PD detection. Employing traditional machine learning and deep learning methods, we explore (a) dimensional and categorical emotion recognition, and (b) PD vs HC classification from emotional EEG signals. Our results reveal that PD patients comprehend arousal better than valence, and amongst emotion categories, \textit{fear}, \textit{disgust} and \textit{surprise} less accurately, and \textit{sadness} most accurately. Mislabeling analyses confirm confounds among opposite-valence emotions with PD data. Emotional EEG responses also achieve near-perfect PD vs HC recognition. {Cumulatively, our study demonstrates that (a) examining \textit{implicit} responses alone enables (i) discovery of valence-related impairments in PD patients, and (ii) differentiation of PD from HC, and (b) emotional EEG analysis is an ecologically-valid, effective, facile and sustainable tool for PD diagnosis vis-á-vis self reports, expert assessments and resting-state analysis.}
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Characterizing Hirability via Personality and Behavior
Authors:
Harshit Malik,
Hersh Dhillon,
Roland Goecke,
Ramanathan Subramanian
Abstract:
While personality traits have been extensively modeled as behavioral constructs, we model \textbf{\textit{job hirability}} as a \emph{personality construct}. On the {\emph{First Impressions Candidate Screening}} (FICS) dataset, we examine relationships among personality and hirability measures. Modeling hirability as a discrete/continuous variable with the \emph{big-five} personality traits as pre…
▽ More
While personality traits have been extensively modeled as behavioral constructs, we model \textbf{\textit{job hirability}} as a \emph{personality construct}. On the {\emph{First Impressions Candidate Screening}} (FICS) dataset, we examine relationships among personality and hirability measures. Modeling hirability as a discrete/continuous variable with the \emph{big-five} personality traits as predictors, we utilize (a) apparent personality annotations, and (b) personality estimates obtained via audio, visual and textual cues for hirability prediction (HP). We also examine the efficacy of a two-step HP process involving (1) personality estimation from multimodal behavioral cues, followed by (2) HP from personality estimates.
Interesting results from experiments performed on $\approx$~5000 FICS videos are as follows. (1) For each of the \emph{text}, \emph{audio} and \emph{visual} modalities, HP via the above two-step process is more effective than directly predicting from behavioral cues. Superior results are achieved when hirability is modeled as a continuous vis-á-vis categorical variable. (2) Among visual cues, eye and bodily information achieve performance comparable to face cues for predicting personality and hirability. (3) Explanatory analyses reveal the impact of multimodal behavior on personality impressions; \eg, Conscientiousness impressions are impacted by the use of \emph{cuss words} (verbal behavior), and \emph{eye movements} (non-verbal behavior), confirming prior observations.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks
Authors:
Aamir Mustafa,
Salman Khan,
Munawar Hayat,
Roland Goecke,
Jianbing Shen,
Ling Shao
Abstract:
Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such pertur…
▽ More
Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such perturbations is the close proximity of different class samples in the learned feature space. This allows model decisions to be totally changed by adding an imperceptible perturbation in the inputs. To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks. Specifically, we force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes. In this manner, the network is forced to learn distinct and distant decision regions for each class. We observe that this simple constraint on the features greatly enhances the robustness of learned models, even against the strongest white-box attacks, without degrading the classification performance on clean images. We report extensive evaluations in both black-box and white-box attack scenarios and show significant gains in comparison to state-of-the art defenses.
△ Less
Submitted 28 July, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
A Global Alignment Kernel based Approach for Group-level Happiness Intensity Estimation
Authors:
Xiaohua Huang,
Abhinav Dhall,
Roland Goecke,
Matti Pietikainen,
Guoying Zhao
Abstract:
With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment…
▽ More
With the progress in automatic human behavior understanding, analysing the perceived affect of multiple people has been recieved interest in affective computing community. Unlike conventional facial expression analysis, this paper primarily focuses on analysing the behaviour of multiple people in an image. The proposed method is based on support vector regression with the combined global alignment kernels (GAKs) to estimate the happiness intensity of a group of people. We first exploit Riesz-based volume local binary pattern (RVLBP) and deep convolutional neural network (CNN) based features for characterizing facial images. Furthermore, we propose to use the GAK for RVLBP and deep CNN features, respectively for explicitly measuring the similarity of two group-level images. Specifically, we exploit the global weight sort scheme to sort the face images from group-level image according to their spatial weights, making an efficient data structure to GAK. Lastly, we propose Multiple kernel learning based on three combination strategies for combining two respective GAKs based on RVLBP and deep CNN features, such that enhancing the discriminative ability of each GAK. Intensive experiments are performed on the challenging group-level happiness intensity database, namely HAPPEI. Our experimental results demonstrate that the proposed approach achieves promising performance for group happiness intensity analysis, when compared with the recent state-of-the-art methods.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction
Authors:
Abhinav Dhall,
Amanjot Kaur,
Roland Goecke,
Tom Gedeon
Abstract:
This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing a common platform to researchers working in the affective computing community to benchmark their algorithms on `in the wild' data. This year EmotiW contains three sub-chal…
▽ More
This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colorado, USA. The challenge aims at providing a common platform to researchers working in the affective computing community to benchmark their algorithms on `in the wild' data. This year EmotiW contains three sub-challenges: a) Audio-video based emotion recognition; b) Student engagement prediction; and c) Group-level emotion recognition. The databases, protocols and baselines are discussed in detail.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Analyzing the Affect of a Group of People Using Multi-modal Framework
Authors:
Xiaohua Huang,
Abhinav Dhall,
Xin Liu,
Guoying Zhao,
**gang Shi,
Roland Goecke,
Matti Pietikainen
Abstract:
Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing works…
▽ More
Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing works on Group-level Emotion Recognition (GER) have investigated on face-level information. Due to the challenging environments, face may not provide enough information to GER. Relatively few studies have investigated multi-modal GER. Therefore, we propose a novel multi-modal approach based on a new feature description for understanding emotional state of a group of people in an image. In this paper, we firstly exploit three kinds of rich information containing face, upperbody and scene in a group-level image. Furthermore, in order to integrate multiple person's information in a group-level image, we propose an information aggregation method to generate three features for face, upperbody and scene, respectively. We fuse face, upperbody and scene information for robustness of GER against the challenging environments. Intensive experiments are performed on two challenging group-level emotion databases to investigate the role of face, upperbody and scene as well as multi-modal framework. Experimental results demonstrate that our framework achieves very promising performance for GER.
△ Less
Submitted 13 October, 2016; v1 submitted 12 October, 2016;
originally announced October 2016.
-
Harnessing the Deep Net Object Models for Enhancing Human Action Recognition
Authors:
O. V. Ramana Murthy,
Roland Goecke
Abstract:
In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense traje…
▽ More
In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense trajectories may fail to detect them. Hence we propose to detect objects using pre-trained object detectors in every frame statically. Trained Deep network models are used as object detectors. Information from different layers in conjunction with different encoding techniques is extensively studied to obtain the richest feature vectors. This technique is observed to yield state-of-the-art performance on HMDB51 and UCF101 datasets.
△ Less
Submitted 23 December, 2015; v1 submitted 21 December, 2015;
originally announced December 2015.
-
Occlusion-Aware Human Pose Estimation with Mixtures of Sub-Trees
Authors:
Ibrahim Radwan,
Abhinav Dhall,
Roland Goecke
Abstract:
In this paper, we study the problem of learning a model for human pose estimation as mixtures of compositional sub-trees in two layers of prediction. This involves estimating the pose of a sub-tree followed by identifying the relationships between sub-trees that are used to handle occlusions between different parts. The mixtures of the sub-trees are learnt utilising both geometric and appearance d…
▽ More
In this paper, we study the problem of learning a model for human pose estimation as mixtures of compositional sub-trees in two layers of prediction. This involves estimating the pose of a sub-tree followed by identifying the relationships between sub-trees that are used to handle occlusions between different parts. The mixtures of the sub-trees are learnt utilising both geometric and appearance distances. The Chow-Liu (CL) algorithm is recursively applied to determine the inter-relations between the nodes and to build the structure of the sub-trees. These structures are used to learn the latent parameters of the sub-trees and the inference is done using a standard belief propagation technique. The proposed method handles occlusions during the inference process by identifying overlap** regions between different sub-trees and introducing a penalty term for overlap** parts. Experiments are performed on three different datasets: the Leeds Sports, Image Parse and UIUC People datasets. The results show the robustness of the proposed method to occlusions over the state-of-the-art approaches.
△ Less
Submitted 3 December, 2015;
originally announced December 2015.