-
Mood as a Contextual Cue for Improved Emotion Inference
Authors:
Soujanya Narayana,
Ibrahim Radwan,
Ramanathan Subramanian,
Roland Goecke
Abstract:
Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} p…
▽ More
Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} prediction from videos, fusing multimodal cues including \emph{mood} and \emph{emotion-change} ($Δ$) labels, (b) serially integrates spatial and channel attention for improved inference, and (c) demonstrates algorithmic generalisability with experiments on the \emph{EMMA} and \emph{AffWild2} datasets. Empirical results affirm that utilising mood labels is highly beneficial for dynamic valence prediction. Comparing \emph{unimodal} (training only with mood labels) vs \emph{multimodal} (training with mood and $Δ$ labels) results, inference performance improves for the latter, conveying that both long and short-term contextual cues are critical for time-continuous emotion inference.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
EEG-based Cognitive Load Estimation of Acoustic Parameters for Data Sonification
Authors:
Gulshan Sharma,
Surbhi Madan,
Maneesh Bilalpur,
Abhinav Dhall,
Ramanathan Subramanian
Abstract:
Sonification is a data visualization technique which expresses data attributes via psychoacoustic parameters, which are non-speech audio signals used to convey information. This paper investigates the binary estimation of cognitive load induced by psychoacoustic parameters conveying the focus level of an astronomical image via Electroencephalogram (EEG) embeddings. Employing machine learning and d…
▽ More
Sonification is a data visualization technique which expresses data attributes via psychoacoustic parameters, which are non-speech audio signals used to convey information. This paper investigates the binary estimation of cognitive load induced by psychoacoustic parameters conveying the focus level of an astronomical image via Electroencephalogram (EEG) embeddings. Employing machine learning and deep learning methodologies, we demonstrate that EEG signals are reliable for (a) binary estimation of cognitive load, (b) isolating easy vs difficult visual-to-auditory perceptual map**s, and (c) capturing perceptual similarities among psychoacoustic parameters. Our key findings reveal that (1) EEG embeddings can reliably measure cognitive load, achieving a peak F1-score of 0.98; (2) Extreme focus levels are easier to detect via auditory map**s than intermediate ones, and (3) psychoacoustic parameters inducing comparable cognitive load levels tend to generate similar EEG encodings.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
A Traffic Control Framework for Uncrewed Aircraft Systems
Authors:
Ananay Vikram Gupta,
Aaditya Prakash Kattekola,
Ansh Vikram Gupta,
Dacharla Venkata Abhiram,
Kamesh Namuduri,
Ravichandran Subramanian
Abstract:
The exponential growth of Advanced Air Mobility (AAM) services demands assurances of safety in the airspace. This research a Traffic Control Framework (TCF) for develo** digital flight rules for Uncrewed Aircraft System (UAS) flying in designated air corridors. The proposed TCF helps model, deploy, and test UAS control, agents, regardless of their hardware configurations. This paper investigates…
▽ More
The exponential growth of Advanced Air Mobility (AAM) services demands assurances of safety in the airspace. This research a Traffic Control Framework (TCF) for develo** digital flight rules for Uncrewed Aircraft System (UAS) flying in designated air corridors. The proposed TCF helps model, deploy, and test UAS control, agents, regardless of their hardware configurations. This paper investigates the importance of digital flight rules in preventing collisions in the context of AAM. TCF is introduced as a platform for develo** strategies for managing traffic towards enhanced autonomy in the airspace. It allows for assessment and evaluation of autonomous navigation, route planning, obstacle avoidance, and adaptive decision making for UAS. It also allows for the introduction and evaluation of advance technologies Artificial Intelligence (AI) and Machine Learning (ML) in a simulation environment before deploying them in the real world. TCF can be used as a tool for comprehensive UAS traffic analysis, including KPI measurements. It offers flexibility for further testing and deployment laying the foundation for improved airspace safety - a vital aspect of UAS technological advancement. Finally, this papers demonstrates the capabilities of the proposed TCF in managing UAS traffic at intersections and its impact on overall traffic flow in air corridors, noting the bottlenecks and the inverse relationship safety and traffic volume.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings
Authors:
Surbhi Madan,
Rishabh Jain,
Gulshan Sharma,
Ramanathan Subramanian,
Abhinav Dhall
Abstract:
Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the de…
▽ More
Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC-TBR
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning
Authors:
Ravikiran Parameshwara,
Ibrahim Radwan,
Akshay Asthana,
Iman Abbasnejad,
Ramanathan Subramanian,
Roland Goecke
Abstract:
Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (\textbf{MT-CLAR}) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network tra…
▽ More
Whilst deep learning techniques have achieved excellent emotion prediction, they still require large amounts of labelled training data, which are (a) onerous and tedious to compile, and (b) prone to errors and biases. We propose Multi-Task Contrastive Learning for Affect Representation (\textbf{MT-CLAR}) for few-shot affect inference. MT-CLAR combines multi-task learning with a Siamese network trained via contrastive learning to infer from a pair of expressive facial images (a) the (dis)similarity between the facial expressions, and (b) the difference in valence and arousal levels of the two faces. We further extend the image-based MT-CLAR framework for automated video labelling where, given one or a few labelled video frames (termed \textit{support-set}), MT-CLAR labels the remainder of the video for valence and arousal. Experiments are performed on the AFEW-VA dataset with multiple support-set configurations; moreover, supervised learning on representations learnt via MT-CLAR are used for valence, arousal and categorical emotion prediction on the AffectNet and AFEW-VA datasets. The results show that valence and arousal predictions via MT-CLAR are very comparable to the state-of-the-art (SOTA), and we significantly outperform SOTA with a support-set $\approx$6\% the size of the video dataset.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
A Virtual Reality Game to Improve Physical and Cognitive Acuity
Authors:
Blooma John,
Ramanathan Subramanian,
Jayan Chirayath Kurian
Abstract:
We present the Virtual Human Benchmark (VHB) game to evaluate and improve physical and cognitive acuity. VHB simulates in 3D the BATAK lightboard game, which is designed to improve physical reaction and hand-eye coordination, on the \textit{Oculus Rift} and \textit{Quest} headsets. The game comprises the \textit{reaction}, \textit{accumulator} and \textit{sequence} modes; \bj{along} with the \text…
▽ More
We present the Virtual Human Benchmark (VHB) game to evaluate and improve physical and cognitive acuity. VHB simulates in 3D the BATAK lightboard game, which is designed to improve physical reaction and hand-eye coordination, on the \textit{Oculus Rift} and \textit{Quest} headsets. The game comprises the \textit{reaction}, \textit{accumulator} and \textit{sequence} modes; \bj{along} with the \textit{reaction} and \textit{accumulator} modes which mimic BATAK functionalities, the \textit{sequence} mode involves the user repeating a sequence of illuminated targets with increasing complexity to train visual memory and cognitive processing. A first version of the game (VHB v1) was evaluated against the real-world BATAK by 20 users, and their feedback was utilized to improve game design and obtain a second version (VHB v2). Another study to evaluate VHB v2 was conducted with 20 users, whose results confirmed that the deign improvements enhanced game usability and user experience in multiple respects. Also, logging and visualization of performance data such as \textit{reaction time}, \textit{speed between targets} and \textit{completed sequence patterns} provides useful data for coaches/therapists monitoring sports/rehabilitation regimens.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Explainable Depression Detection via Head Motion Patterns
Authors:
Monika Gahalawat,
Raul Fernandez Rojas,
Tanaya Guha,
Ramanathan Subramanian,
Roland Goecke
Abstract:
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding t…
▽ More
While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Authors:
Hugo Touvron,
Louis Martin,
Kevin Stone,
Peter Albert,
Amjad Almahairi,
Yasmine Babaei,
Nikolay Bashlykov,
Soumya Batra,
Prajjwal Bhargava,
Shruti Bhosale,
Dan Bikel,
Lukas Blecher,
Cristian Canton Ferrer,
Moya Chen,
Guillem Cucurull,
David Esiobu,
Jude Fernandes,
Jeremy Fu,
Wenyin Fu,
Brian Fuller,
Cynthia Gao,
Vedanuj Goswami,
Naman Goyal,
Anthony Hartshorn,
Saghar Hosseini
, et al. (43 additional authors not shown)
Abstract:
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be…
▽ More
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
△ Less
Submitted 19 July, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Grain and Grain Boundary Segmentation using Machine Learning with Real and Generated Datasets
Authors:
Peter Warren,
Nandhini Raju,
Abhilash Prasad,
Shajahan Hossain,
Ramesh Subramanian,
Jayanta Kapat,
Navin Manjooran,
Ranajay Ghosh
Abstract:
We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and h…
▽ More
We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and have the efficiency of a computational method. An extensive dataset of from 316L stainless steel samples is additively manufactured, prepared, polished, etched, and then microstructure grain images were systematically collected. Grain segmentation via existing computational methods and manual (by-hand) were conducted, to create "real" training data. A Voronoi tessellation pattern combined with random synthetic noise and simulated defects, is developed to create a novel artificial grain image fabrication method. This provided training data supplementation for data-intensive machine learning methods. The accuracy of the grain measurements from microstructure images segmented via computational methods and machine learning methods proposed in this work are calculated and compared to provide much benchmarks in grain segmentation. Over 400 images of the microstructure of stainless steel samples were manually segmented for machine learning training applications. This data and the artificial data is available on Kaggle.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
A Weakly Supervised Approach to Emotion-change Prediction and Improved Mood Inference
Authors:
Soujanya Narayana,
Ibrahim Radwan,
Ravikiran Parameshwara,
Iman Abbasnejad,
Akshay Asthana,
Ramanathan Subramanian,
Roland Goecke
Abstract:
Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips,…
▽ More
Whilst a majority of affective computing research focuses on inferring emotions, examining mood or understanding the \textit{mood-emotion interplay} has received significantly less attention. Building on prior work, we (a) deduce and incorporate emotion-change ($Δ$) information for inferring mood, without resorting to annotated labels, and (b) attempt mood prediction for long duration video clips, in alignment with the characterisation of mood. We generate the emotion-change ($Δ$) labels via metric learning from a pre-trained Siamese Network, and use these in addition to mood labels for mood classification. Experiments evaluating \textit{unimodal} (training only using mood labels) vs \textit{multimodal} (training using mood plus $Δ$ labels) models show that mood prediction benefits from the incorporation of emotion-change information, emphasising the importance of modelling the mood-emotion interplay for effective mood inference.
△ Less
Submitted 16 August, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Focus on Change: Mood Prediction by Learning Emotion Changes via Spatio-Temporal Attention
Authors:
Soujanya Narayana,
Ramanathan Subramanian,
Ibrahim Radwan,
Roland Goecke
Abstract:
While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform…
▽ More
While emotion and mood interchangeably used, they differ in terms of duration, intensity and attributes. Even as multiple psychology studies examine the mood-emotion relationship, mood prediction has barely been studied. Recent machine learning advances such as the attention mechanism to focus on salient parts of the input data, have only been applied to infer emotions rather than mood. We perform mood prediction by incorporating both mood and emotion change information. We additionally explore spatial and temporal attention, and parallel/sequential arrangements of the spatial and temporal attention modules to improve mood prediction performance. To examine generalizability of the proposed method, we evaluate models trained on the AFEW dataset with EMMA. Experiments reveal that (a) emotion change information is inherently beneficial to mood prediction, and (b) prediction performance improves with the integration of sequential and parallel spatial-temporal attention modules.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Explainable Human-centered Traits from Head Motion and Facial Expression Dynamics
Authors:
Surbhi Madan,
Monika Gahalawat,
Tanaya Guha,
Roland Goecke,
Ramanathan Subramanian
Abstract:
We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors whi…
▽ More
We explore the efficacy of multimodal behavioral cues for explainable prediction of personality and interview-specific traits. We utilize elementary head-motion units named kinemes, atomic facial movements termed action units and speech features to estimate these human-centered traits. Empirical results confirm that kinemes and action units enable discovery of multiple trait-specific behaviors while also enabling explainability in support of the predictions. For fusing cues, we explore decision and feature-level fusion, and an additive attention-based fusion strategy which quantifies the relative importance of the three modalities for trait prediction. Examining various long-short term memory (LSTM) architectures for classification and regression on the MIT Interview and First Impressions Candidate Screening (FICS) datasets, we note that: (1) Multimodal approaches outperform unimodal counterparts; (2) Efficient trait predictions and plausible explanations are achieved with both unimodal and multimodal approaches, and (3) Following the thin-slice approach, effective trait prediction is achieved even from two-second behavioral snippets.
△ Less
Submitted 23 February, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
To Improve Is to Change: Towards Improving Mood Prediction by Learning Changes in Emotion
Authors:
Soujanya Narayana,
Ramanathan Subramanian,
Ibrahim Radwan,
Roland Goecke
Abstract:
Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framewor…
▽ More
Although the terms mood and emotion are closely related and often used interchangeably, they are distinguished based on their duration, intensity and attribution. To date, hardly any computational models have (a) examined mood recognition, and (b) modelled the interplay between mood and emotional state in their analysis. In this paper, as a first step towards mood prediction, we propose a framework that utilises both dominant emotion (or mood) labels, and emotional change labels on the AFEW-VA database. Experiments evaluating unimodal (trained only using mood labels) and multimodal (trained with both mood and emotion change labels) convolutional neural networks confirm that incorporating emotional change information in the network training process can significantly improve the mood prediction performance, thus highlighting the importance of modelling emotion and mood simultaneously for improved performance in affective state recognition.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Neural Encoding of Songs is Modulated by Their Enjoyment
Authors:
Gulshan Sharma,
Pankaj Pandey,
Ramanathan Subramanian,
Krishna. P. Miyapuram,
Abhinav Dhall
Abstract:
We examine user and song identification from neural (EEG) signals. Owing to perceptual subjectivity in human-media interaction, music identification from brain signals is a challenging task. We demonstrate that subjective differences in music perception aid user identification, but hinder song identification. In an attempt to address intrinsic complexities in music identification, we provide empir…
▽ More
We examine user and song identification from neural (EEG) signals. Owing to perceptual subjectivity in human-media interaction, music identification from brain signals is a challenging task. We demonstrate that subjective differences in music perception aid user identification, but hinder song identification. In an attempt to address intrinsic complexities in music identification, we provide empirical evidence on the role of enjoyment in song recognition. Our findings reveal that considering song enjoyment as an additional factor can improve EEG-based song recognition.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
Affective Computational Advertising Based on Perceptual Metrics
Authors:
Soujanya Narayana,
Shweta Jain,
Harish Katti,
Roland Goecke,
Ramanathan Subramanian
Abstract:
We present \textbf{ACAD}, an \textbf{a}ffective \textbf{c}omputational \textbf{ad}vertising framework expressly derived from perceptual metrics. Different from advertising methods which either ignore the emotional nature of (most) programs and ads, or are based on axiomatic rules, the ACAD formulation incorporates findings from a user study examining the effect of within-program ad placements on a…
▽ More
We present \textbf{ACAD}, an \textbf{a}ffective \textbf{c}omputational \textbf{ad}vertising framework expressly derived from perceptual metrics. Different from advertising methods which either ignore the emotional nature of (most) programs and ads, or are based on axiomatic rules, the ACAD formulation incorporates findings from a user study examining the effect of within-program ad placements on ad perception. A linear program formulation seeking to achieve (a) \emph{genuine} ad assessments and (b) \emph{maximal} ad recall is then proposed. Effectiveness of the ACAD framework is confirmed via a validational user study, where ACAD-induced ad placements are found to be optimal with respect to objectives (a) and (b) against competing approaches.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Automated Parkinson's Disease Detection and Affective Analysis from Emotional EEG Signals
Authors:
Ravikiran Parameshwara,
Soujanya Narayana,
Murugappan Murugappan,
Ramanathan Subramanian,
Ibrahim Radwan,
Roland Goecke
Abstract:
While Parkinson's disease (PD) is typically characterized by motor disorder, there is evidence of diminished emotion perception in PD patients. This study examines the utility of affective Electroencephalography (EEG) signals to understand emotional differences between PD vs Healthy Controls (HC), and for automated PD detection. Employing traditional machine learning and deep learning methods, we…
▽ More
While Parkinson's disease (PD) is typically characterized by motor disorder, there is evidence of diminished emotion perception in PD patients. This study examines the utility of affective Electroencephalography (EEG) signals to understand emotional differences between PD vs Healthy Controls (HC), and for automated PD detection. Employing traditional machine learning and deep learning methods, we explore (a) dimensional and categorical emotion recognition, and (b) PD vs HC classification from emotional EEG signals. Our results reveal that PD patients comprehend arousal better than valence, and amongst emotion categories, \textit{fear}, \textit{disgust} and \textit{surprise} less accurately, and \textit{sadness} most accurately. Mislabeling analyses confirm confounds among opposite-valence emotions with PD data. Emotional EEG responses also achieve near-perfect PD vs HC recognition. {Cumulatively, our study demonstrates that (a) examining \textit{implicit} responses alone enables (i) discovery of valence-related impairments in PD patients, and (ii) differentiation of PD from HC, and (b) emotional EEG analysis is an ecologically-valid, effective, facile and sustainable tool for PD diagnosis vis-á-vis self reports, expert assessments and resting-state analysis.}
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
Outlier-based Autism Detection using Longitudinal Structural MRI
Authors:
Devika K,
Venkata Ramana Murthy Oruganti,
Dwarikanath Mahapatra,
Ramanathan Subramanian
Abstract:
Diagnosis of Autism Spectrum Disorder (ASD) using clinical evaluation (cognitive tests) is challenging due to wide variations amongst individuals. Since no effective treatment exists, prompt and reliable ASD diagnosis can enable the effective preparation of treatment regimens. This paper proposes structural Magnetic Resonance Imaging (sMRI)-based ASD diagnosis via an outlier detection approach. To…
▽ More
Diagnosis of Autism Spectrum Disorder (ASD) using clinical evaluation (cognitive tests) is challenging due to wide variations amongst individuals. Since no effective treatment exists, prompt and reliable ASD diagnosis can enable the effective preparation of treatment regimens. This paper proposes structural Magnetic Resonance Imaging (sMRI)-based ASD diagnosis via an outlier detection approach. To learn Spatio-temporal patterns in structural brain connectivity, a Generative Adversarial Network (GAN) is trained exclusively with sMRI scans of healthy subjects. Given a stack of three adjacent slices as input, the GAN generator reconstructs the next three adjacent slices; the GAN discriminator then identifies ASD sMRI scan reconstructions as outliers. This model is compared against two other baselines -- a simpler UNet and a sophisticated Self-Attention GAN. Axial, Coronal, and Sagittal sMRI slices from the multi-site ABIDE II dataset are used for evaluation. Extensive experiments reveal that our ASD detection framework performs comparably with the state-of-the-art with far fewer training data. Furthermore, longitudinal data (two scans per subject over time) achieve 17-28% higher accuracy than cross-sectional data (one scan per subject). Among other findings, metrics employed for model training as well as reconstruction loss computation impact detection performance, and the coronal modality is found to best encode structural information for ASD detection.
△ Less
Submitted 10 March, 2022; v1 submitted 20 February, 2022;
originally announced February 2022.
-
Expert and Crowd-Guided Affect Annotation and Prediction
Authors:
Ramanathan Subramanian,
Yan Yan,
Nicu Sebe
Abstract:
We employ crowdsourcing to acquire time-continuous affective annotations for movie clips, and refine noisy models trained from these crowd annotations incorporating expert information within a Multi-task Learning (MTL) framework. We propose a novel \textbf{e}xpert \textbf{g}uided MTL (EG-MTL) algorithm, which minimizes the loss with respect to both crowd and expert labels to learn a set of weights…
▽ More
We employ crowdsourcing to acquire time-continuous affective annotations for movie clips, and refine noisy models trained from these crowd annotations incorporating expert information within a Multi-task Learning (MTL) framework. We propose a novel \textbf{e}xpert \textbf{g}uided MTL (EG-MTL) algorithm, which minimizes the loss with respect to both crowd and expert labels to learn a set of weights corresponding to each movie clip for which crowd annotations are acquired. We employ EG-MTL to solve two problems, namely, \textbf{\texttt{P1}}: where dynamic annotations acquired from both experts and crowdworkers for the \textbf{Validation} set are used to train a regression model with audio-visual clip descriptors as features, and predict dynamic arousal and valence levels on 5--15 second snippets derived from the clips; and \textbf{\texttt{P2}}: where a classification model trained on the \textbf{Validation} set using dynamic crowd and expert annotations (as features) and static affective clip labels is used for binary emotion recognition on the \textbf{Evaluation} set for which only dynamic crowd annotations are available. Observed experimental results confirm the effectiveness of the EG-MTL algorithm, which is reflected via improved arousal and valence estimation for \textbf{\texttt{P1}}, and higher recognition accuracy for \textbf{\texttt{P2}}.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Head Matters: Explainable Human-centered Trait Prediction from Head Motion Dynamics
Authors:
Surbhi Madan,
Monika Gahalawat,
Tanaya Guha,
Ramanathan Subramanian
Abstract:
We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits. Transforming head-motion patterns into a sequence of kinemes facilitates discovery of latent temporal signatures characterizing the targeted traits, thereby enabling both efficient and explainable trait prediction. Utilizing Kinemes and Facial Action Codin…
▽ More
We demonstrate the utility of elementary head-motion units termed kinemes for behavioral analytics to predict personality and interview traits. Transforming head-motion patterns into a sequence of kinemes facilitates discovery of latent temporal signatures characterizing the targeted traits, thereby enabling both efficient and explainable trait prediction. Utilizing Kinemes and Facial Action Coding System (FACS) features to predict (a) OCEAN personality traits on the First Impressions Candidate Screening videos, and (b) Interview traits on the MIT dataset, we note that: (1) A Long-Short Term Memory (LSTM) network trained with kineme sequences performs better than or similar to a Convolutional Neural Network (CNN) trained with facial images; (2) Accurate predictions and explanations are achieved on combining FACS action units (AUs) with kinemes, and (3) Prediction performance is affected by the time-length over which head and facial movements are observed.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
FakeBuster: A DeepFakes Detection Tool for Video Conferencing Scenarios
Authors:
Vineet Mehta,
Parul Gupta,
Ramanathan Subramanian,
Abhinav Dhall
Abstract:
This paper proposes a new DeepFake detector FakeBuster for detecting impostors during video conferencing and manipulated faces on social media. FakeBuster is a standalone deep learning based solution, which enables a user to detect if another person's video is manipulated or spoofed during a video conferencing based meeting. This tool is independent of video conferencing solutions and has been tes…
▽ More
This paper proposes a new DeepFake detector FakeBuster for detecting impostors during video conferencing and manipulated faces on social media. FakeBuster is a standalone deep learning based solution, which enables a user to detect if another person's video is manipulated or spoofed during a video conferencing based meeting. This tool is independent of video conferencing solutions and has been tested with Zoom and Skype applications. It uses a 3D convolutional neural network for predicting video segment-wise fakeness scores. The network is trained on a combination of datasets such as Deeperforensics, DFDC, VoxCeleb, and deepfake videos created using locally captured (for video conferencing scenarios) images. This leads to different environments and perturbations in the dataset, which improves the generalization of the deepfake network.
△ Less
Submitted 9 January, 2021;
originally announced January 2021.
-
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
Authors:
Samyak Jain,
Pradeep Yarlagadda,
Shreyank Jyoti,
Shyamgopal Karthik,
Ramanathan Subramanian,
Vineet Gandhi
Abstract:
We propose the ViNet architecture for audio-visual saliency prediction. ViNet is a fully convolutional encoder-decoder architecture. The encoder uses visual features from a network trained for action recognition, and the decoder infers a saliency map via trilinear interpolation and 3D convolutions, combining features from multiple hierarchies. The overall architecture of ViNet is conceptually simp…
▽ More
We propose the ViNet architecture for audio-visual saliency prediction. ViNet is a fully convolutional encoder-decoder architecture. The encoder uses visual features from a network trained for action recognition, and the decoder infers a saliency map via trilinear interpolation and 3D convolutions, combining features from multiple hierarchies. The overall architecture of ViNet is conceptually simple; it is causal and runs in real-time (60 fps). ViNet does not use audio as input and still outperforms the state-of-the-art audio-visual saliency prediction models on nine different datasets (three visual-only and six audio-visual datasets). ViNet also surpasses human performance on the CC, SIM and AUC metrics for the AVE dataset, and to our knowledge, it is the first network to do so. We also explore a variation of ViNet architecture by augmenting audio features into the decoder. To our surprise, upon sufficient training, the network becomes agnostic to the input audio and provides the same output irrespective of the input. Interestingly, we also observe similar behaviour in the previous state-of-the-art models \cite{tsiami2020stavis} for audio-visual saliency prediction. Our findings contrast with previous works on deep learning-based audio-visual saliency prediction, suggesting a clear avenue for future explorations incorporating audio in a more effective manner. The code and pre-trained models are available at https://github.com/samyak0210/ViNet.
△ Less
Submitted 7 August, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Gender and Emotion Recognition from Implicit User Behavior Signals
Authors:
Maneesh Bilalpur,
Seyed Mostafa Kia,
Mohan Kankanhalli,
Ramanathan Subramanian
Abstract:
This work explores the utility of implicit behavioral cues, namely, Electroencephalogram (EEG) signals and eye movements for gender recognition (GR) and emotion recognition (ER) from psychophysical behavior. Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. 28 users (14 male) recognized emotions from unoccluded (no mask) and partially occluded (eye or mouth masked)…
▽ More
This work explores the utility of implicit behavioral cues, namely, Electroencephalogram (EEG) signals and eye movements for gender recognition (GR) and emotion recognition (ER) from psychophysical behavior. Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. 28 users (14 male) recognized emotions from unoccluded (no mask) and partially occluded (eye or mouth masked) emotive faces; their EEG responses contained gender-specific differences, while their eye movements were characteristic of the perceived facial emotions. Experimental results reveal that (a) reliable GR and ER is achievable with EEG and eye features, (b) differential cognitive processing of negative emotions is observed for females and (c) eye gaze-based gender differences manifest under partial face occlusion, as typified by the eye and mouth mask conditions.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
Characterizing Hirability via Personality and Behavior
Authors:
Harshit Malik,
Hersh Dhillon,
Roland Goecke,
Ramanathan Subramanian
Abstract:
While personality traits have been extensively modeled as behavioral constructs, we model \textbf{\textit{job hirability}} as a \emph{personality construct}. On the {\emph{First Impressions Candidate Screening}} (FICS) dataset, we examine relationships among personality and hirability measures. Modeling hirability as a discrete/continuous variable with the \emph{big-five} personality traits as pre…
▽ More
While personality traits have been extensively modeled as behavioral constructs, we model \textbf{\textit{job hirability}} as a \emph{personality construct}. On the {\emph{First Impressions Candidate Screening}} (FICS) dataset, we examine relationships among personality and hirability measures. Modeling hirability as a discrete/continuous variable with the \emph{big-five} personality traits as predictors, we utilize (a) apparent personality annotations, and (b) personality estimates obtained via audio, visual and textual cues for hirability prediction (HP). We also examine the efficacy of a two-step HP process involving (1) personality estimation from multimodal behavioral cues, followed by (2) HP from personality estimates.
Interesting results from experiments performed on $\approx$~5000 FICS videos are as follows. (1) For each of the \emph{text}, \emph{audio} and \emph{visual} modalities, HP via the above two-step process is more effective than directly predicting from behavioral cues. Superior results are achieved when hirability is modeled as a continuous vis-á-vis categorical variable. (2) Among visual cues, eye and bodily information achieve performance comparable to face cues for predicting personality and hirability. (3) Explanatory analyses reveal the impact of multimodal behavior on personality impressions; \eg, Conscientiousness impressions are impacted by the use of \emph{cuss words} (verbal behavior), and \emph{eye movements} (non-verbal behavior), confirming prior observations.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
The eyes know it: FakeET -- An Eye-tracking Database to Understand Deepfake Perception
Authors:
Parul Gupta,
Komal Chugh,
Abhinav Dhall,
Ramanathan Subramanian
Abstract:
We present \textbf{FakeET}-- an eye-tracking database to understand human visual perception of \emph{deepfake} videos. Given that the principal purpose of deepfakes is to deceive human observers, FakeET is designed to understand and evaluate the ease with which viewers can detect synthetic video artifacts. FakeET contains viewing patterns compiled from 40 users via the \emph{Tobii} desktop eye-tra…
▽ More
We present \textbf{FakeET}-- an eye-tracking database to understand human visual perception of \emph{deepfake} videos. Given that the principal purpose of deepfakes is to deceive human observers, FakeET is designed to understand and evaluate the ease with which viewers can detect synthetic video artifacts. FakeET contains viewing patterns compiled from 40 users via the \emph{Tobii} desktop eye-tracker for 811 videos from the \textit{Google Deepfake} dataset, with a minimum of two viewings per video. Additionally, EEG responses acquired via the \emph{Emotiv} sensor are also available. The compiled data confirms (a) distinct eye movement characteristics for \emph{real} vs \emph{fake} videos; (b) utility of the eye-track saliency maps for spatial forgery localization and detection, and (c) Error Related Negativity (ERN) triggers in the EEG responses, and the ability of the \emph{raw} EEG signal to distinguish between \emph{real} and \emph{fake} videos.
△ Less
Submitted 18 June, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization
Authors:
Komal Chugh,
Parul Gupta,
Abhinav Dhall,
Ramanathan Subramanian
Abstract:
We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, eg, loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as an aggregate of dissimilarity scores between audio and vis…
▽ More
We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, eg, loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as an aggregate of dissimilarity scores between audio and visual segments in a video. Discriminative features are learnt for the audio and visual channels in a chunk-wise manner, employing the cross-entropy loss for individual modalities, and a contrastive loss that models inter-modality similarity. Extensive experiments on the DFDC and DeepFake-TIMIT Datasets show that our approach outperforms the state-of-the-art by up to 7%. We also demonstrate temporal forgery localization, and show how our technique identifies the manipulated video segments.
△ Less
Submitted 20 March, 2021; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Approximating MIS over equilateral $B_1$-VPG graphs
Authors:
Abhiruk Lahiri,
Joydeep Mukherjee,
C. R. Subramanian
Abstract:
We present an approximation algorithm for the maximum independent set (MIS) problem over the class of equilateral $B_1$-VPG graphs. These are intersection graphs of $L$-shaped planar objects % (and their rotations by multiples of $90^o$) with both arms of each object being equal. We obtain a $36(\log 2d)$-approximate algorithm running in $O(n(\log n)^2)$ time for this problem, where $d$ is the rat…
▽ More
We present an approximation algorithm for the maximum independent set (MIS) problem over the class of equilateral $B_1$-VPG graphs. These are intersection graphs of $L$-shaped planar objects % (and their rotations by multiples of $90^o$) with both arms of each object being equal. We obtain a $36(\log 2d)$-approximate algorithm running in $O(n(\log n)^2)$ time for this problem, where $d$ is the ratio $d_{max}/d_{min}$ and $d_{max}$ and $d_{min}$ denote respectively the maximum and minimum length of any arm in the input equilateral $L$-representation of the graph. In particular, we obtain $O(1)$-factor approximation of MIS for $B_1$-VPG -graphs for which the ratio $d$ is bounded by a constant. % formed by unit length $L$-shapes. In fact, algorithm can be generalized to an $O(n(\log n)^2)$ time and a $36(\log 2d_x)(\log 2d_y)$-approximate MIS algorithm over arbitrary $B_1$-VPG graphs. Here, $d_x$ and $d_y$ denote respectively the analogues of $d$ when restricted to only horizontal and vertical arms of members of the input. This is an improvement over the previously best $n^ε$-approximate algorithm \cite{FoxP} (for some fixed $ε>0$), unless the ratio $d$ is exponentially large in $n$. In particular, $O(1)$-approximation of MIS is achieved for graphs with $\max\{d_x,d_y\}=O(1)$.
△ Less
Submitted 17 December, 2019;
originally announced December 2019.
-
Can Machine Learning Identify Governing Laws For Dynamics in Complex Engineered Systems ? : A Study in Chemical Engineering
Authors:
Renganathan Subramanian,
Shweta Singh
Abstract:
Machine learning recently has been used to identify the governing equations for dynamics in physical systems. The promising results from applications on systems such as fluid dynamics and chemical kinetics inspire further investigation of these methods on complex engineered systems. Dynamics of these systems play a crucial role in design and operations. Hence, it would be advantageous to learn abo…
▽ More
Machine learning recently has been used to identify the governing equations for dynamics in physical systems. The promising results from applications on systems such as fluid dynamics and chemical kinetics inspire further investigation of these methods on complex engineered systems. Dynamics of these systems play a crucial role in design and operations. Hence, it would be advantageous to learn about the mechanisms that may be driving the complex dynamics of systems. In this work, our research question was aimed at addressing this open question about applicability and usefulness of novel machine learning approach in identifying the governing dynamical equations for engineered systems. We focused on distillation column which is an ubiquitous unit operation in chemical engineering and demonstrates complex dynamics i.e. it's dynamics is a combination of heuristics and fundamental physical laws. We tested the method of Sparse Identification of Non-Linear Dynamics (SINDy) because of it's ability to produce white-box models with terms that can be used for physical interpretation of dynamics. Time series data for dynamics was generated from simulation of distillation column using ASPEN Dynamics. One promising result was reduction of number of equations for dynamic simulation from 1000s in ASPEN to only 13 - one for each state variable. Prediction accuracy was high on the test data from system within the perturbation range, however outside perturbation range equations did not perform well. In terms of physical law extraction, some terms were interpretable as related to Fick's law of diffusion (with concentration terms) and Henry's law (with ratio of concentration and pressure terms). While some terms were interpretable, we conclude that more research is needed on combining engineering systems with machine learning approach to improve understanding of governing laws for unknown dynamics.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Recognition of Advertisement Emotions with Application to Computational Advertising
Authors:
Abhinav Shukla,
Shruti Shriya Gullapuram,
Harish Katti,
Mohan Kankanhalli,
Stefan Winkler,
Ramanathan Subramanian
Abstract:
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience. However, most computational affect recognition (AR) approaches examine ads via the text modality, and only limited work has been devoted to decoding ad emotions from audiovisual or user cues. This work (1) compiles an affective ad dataset capable of evoking coher…
▽ More
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience. However, most computational affect recognition (AR) approaches examine ads via the text modality, and only limited work has been devoted to decoding ad emotions from audiovisual or user cues. This work (1) compiles an affective ad dataset capable of evoking coherent emotions across users; (2) explores the efficacy of content-centric convolutional neural network (CNN) features for AR vis-ã-vis handcrafted audio-visual descriptors; (3) examines user-centric ad AR from Electroencephalogram (EEG) responses acquired during ad-viewing, and (4) demonstrates how better affect predictions facilitate effective computational advertising as determined by a study involving 18 users. Experiments reveal that (a) CNN features outperform audiovisual descriptors for content-centric AR; (b) EEG features are able to encode ad-induced emotions better than content-based features; (c) Multi-task learning performs best among a slew of classification algorithms to achieve optimal AR, and (d) Pursuant to (b), EEG features also enable optimized ad insertion onto streamed video, as compared to content-based or manual insertion techniques in terms of ad memorability and overall user experience.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Investigating the generalizability of EEG-based Cognitive Load Estimation Across Visualizations
Authors:
Viral Parekh,
Maneesh Bilalpur,
Sharavan Kumar,
Stefan Winkler,
C V Jawahar,
Ramanathan Subramanian
Abstract:
We examine if EEG-based cognitive load (CL) estimation is generalizable across the character, spatial pattern, bar graph and pie chart-based visualizations for the nback~task. CL is estimated via two recent approaches: (a) Deep convolutional neural network, and (b) Proximal support vector machines. Experiments reveal that CL estimation suffers across visualizations motivating the need for effectiv…
▽ More
We examine if EEG-based cognitive load (CL) estimation is generalizable across the character, spatial pattern, bar graph and pie chart-based visualizations for the nback~task. CL is estimated via two recent approaches: (a) Deep convolutional neural network, and (b) Proximal support vector machines. Experiments reveal that CL estimation suffers across visualizations motivating the need for effective machine learning techniques to benchmark visual interface usability for a given analytic task.
△ Less
Submitted 12 September, 2018;
originally announced September 2018.
-
EEG-based Evaluation of Cognitive Workload Induced by Acoustic Parameters for Data Sonification
Authors:
Maneesh Bilalpur,
Mohan Kankanhalli,
Stefan Winkler,
Ramanathan Subramanian
Abstract:
Data Visualization has been receiving growing attention recently, with ubiquitous smart devices designed to render information in a variety of ways. However, while evaluations of visual tools for their interpretability and intuitiveness have been commonplace, not much research has been devoted to other forms of data rendering, eg, sonification. This work is the first to automatically estimate the…
▽ More
Data Visualization has been receiving growing attention recently, with ubiquitous smart devices designed to render information in a variety of ways. However, while evaluations of visual tools for their interpretability and intuitiveness have been commonplace, not much research has been devoted to other forms of data rendering, eg, sonification. This work is the first to automatically estimate the cognitive load induced by different acoustic parameters considered for sonification in prior studies. We examine cognitive load via (a) perceptual data-sound map** accuracies of users for the different acoustic parameters, (b) cognitive workload impressions explicitly reported by users, and (c) their implicit EEG responses compiled during the map** task. Our main findings are that (i) low cognitive load-inducing (ie, more intuitive) acoustic parameters correspond to higher map** accuracies, (ii) EEG spectral power analysis reveals higher $α$ band power for low cognitive load parameters, implying a congruent relationship between explicit and implicit user responses, and (iii) Cognitive load classification with EEG features achieves a peak F1-score of 0.64, confirming that reliable workload estimation is achievable with user EEG data compiled using wearable sensors.
△ Less
Submitted 18 August, 2018;
originally announced August 2018.
-
Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements
Authors:
Abhinav Shukla,
Harish Katti,
Mohan Kankanhalli,
Ramanathan Subramanian
Abstract:
Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approac…
▽ More
Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approaches inform us on how humans prioritize visual information for ad understanding. Our work addresses these lacunae by decomposing video content into detected objects, coarse scene structure, object statistics and actively attended objects identified via eye-gaze. We measure the importance of each of these information channels by systematically incorporating related information into ad affect prediction models. Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
Watch to Edit: Video Retargeting using Gaze
Authors:
Kranthi Kumar,
Moneish Kumar,
Vineet Gandhi,
Ramanathan Subramanian
Abstract:
We present a novel approach to optimally retarget videos for varied displays with differing aspect ratios by preserving salient scene content discovered via eye tracking. Our algorithm performs editing with cut, pan and zoom operations by optimizing the path of a crop** window within the original video while seeking to (i) preserve salient regions, and (ii) adhere to the principles of cinematogr…
▽ More
We present a novel approach to optimally retarget videos for varied displays with differing aspect ratios by preserving salient scene content discovered via eye tracking. Our algorithm performs editing with cut, pan and zoom operations by optimizing the path of a crop** window within the original video while seeking to (i) preserve salient regions, and (ii) adhere to the principles of cinematography. Our approach is (a) content agnostic as the same methodology is employed to re-edit a wide-angle video recording or a close-up movie sequence captured with a static or moving camera, and (b) independent of video length and can in principle re-edit an entire movie in one shot. Our algorithm consists of two steps. The first step employs gaze transition cues to detect time stamps where new cuts are to be introduced in the original video via dynamic programming. A subsequent step optimizes the crop** window path (to create pan and zoom effects), while accounting for the original and new cuts. The crop** window path is designed to include maximum gaze information, and is composed of piecewise constant, linear and parabolic segments. It is obtained via L(1) regularized convex optimization which ensures a smooth viewing experience. We test our approach on a wide variety of videos and demonstrate significant improvement over the state-of-the-art, both in terms of computational complexity and qualitative aspects. A study performed with 16 users confirms that our approach results in a superior viewing experience as compared to gaze driven re-editing and letterboxing methods, especially for wide-angle static camera recordings.
△ Less
Submitted 27 June, 2018;
originally announced July 2018.
-
AVEID: Automatic Video System for Measuring Engagement In Dementia
Authors:
Viral Parekh,
Pin Sym Foong,
Shendong Zhao,
Ramanathan Subramanian
Abstract:
Engagement in dementia is typically measured using behavior observational scales (BOS) that are tedious and involve intensive manual labor to annotate, and are therefore not easily scalable. We propose AVEID, a low cost and easy-to-use video-based engagement measurement tool to determine the engagement level of a person with dementia (PwD) during digital interaction. We show that the objective beh…
▽ More
Engagement in dementia is typically measured using behavior observational scales (BOS) that are tedious and involve intensive manual labor to annotate, and are therefore not easily scalable. We propose AVEID, a low cost and easy-to-use video-based engagement measurement tool to determine the engagement level of a person with dementia (PwD) during digital interaction. We show that the objective behavioral measures computed via AVEID correlate well with subjective expert impressions for the popular MPES and OME BOS, confirming its viability and effectiveness. Moreover, AVEID measures can be obtained for a variety of engagement designs, thereby facilitating large-scale studies with PwD populations.
△ Less
Submitted 21 December, 2017;
originally announced December 2017.
-
An EEG-based Image Annotation System
Authors:
Viral Parekh,
Ramanathan Subramanian,
Dipanjan Roy,
C. V. Jawahar
Abstract:
The success of deep learning in computer vision has greatly increased the need for annotated image datasets. We propose an EEG (Electroencephalogram)-based image annotation system. While humans can recognize objects in 20-200 milliseconds, the need to manually label images results in a low annotation throughput. Our system employs brain signals captured via a consumer EEG device to achieve an anno…
▽ More
The success of deep learning in computer vision has greatly increased the need for annotated image datasets. We propose an EEG (Electroencephalogram)-based image annotation system. While humans can recognize objects in 20-200 milliseconds, the need to manually label images results in a low annotation throughput. Our system employs brain signals captured via a consumer EEG device to achieve an annotation rate of up to 10 images per second. We exploit the P300 event-related potential (ERP) signature to identify target images during a rapid serial visual presentation (RSVP) task. We further perform unsupervised outlier removal to achieve an F1-score of 0.88 on the test set. The proposed system does not depend on category-specific EEG signatures enabling the annotation of any new image category without any model pre-training.
△ Less
Submitted 7 November, 2017;
originally announced November 2017.
-
Evaluating Content-centric vs User-centric Ad Affect Recognition
Authors:
Abhinav Shukla,
Shruti Shriya Gullapuram,
Harish Katti,
Karthik Yadati,
Mohan Kankanhalli,
Ramanathan Subramanian
Abstract:
Despite the fact that advertisements (ads) often include strongly emotional content, very little work has been devoted to affect recognition (AR) from ads. This work explicitly compares content-centric and user-centric ad AR methodologies, and evaluates the impact of enhanced AR on computational advertising via a user study. Specifically, we (1) compile an affective ad dataset capable of evoking c…
▽ More
Despite the fact that advertisements (ads) often include strongly emotional content, very little work has been devoted to affect recognition (AR) from ads. This work explicitly compares content-centric and user-centric ad AR methodologies, and evaluates the impact of enhanced AR on computational advertising via a user study. Specifically, we (1) compile an affective ad dataset capable of evoking coherent emotions across users; (2) explore the efficacy of content-centric convolutional neural network (CNN) features for encoding emotions, and show that CNN features outperform low-level emotion descriptors; (3) examine user-centered ad AR by analyzing Electroencephalogram (EEG) responses acquired from eleven viewers, and find that EEG signals encode emotional information better than content descriptors; (4) investigate the relationship between objective AR and subjective viewer experience while watching an ad-embedded online video stream based on a study involving 12 users. To our knowledge, this is the first work to (a) expressly compare user vs content-centered AR for ads, and (b) study the relationship between modeling of ad emotions and its impact on a real-life advertising application.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Affect Recognition in Ads with Application to Computational Advertising
Authors:
Abhinav Shukla,
Shruti Shriya Gullapuram,
Harish Katti,
Karthik Yadati,
Mohan Kankanhalli,
Ramanathan Subramanian
Abstract:
Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN…
▽ More
Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users. We model ad emotions based on subjective human opinions as well as objective multimodal features, and show how effectively modeling ad emotions can positively impact a real-life application.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Gender and Emotion Recognition with Implicit User Signals
Authors:
Maneesh Bilalpur,
Seyed Mostafa Kia,
Manisha Chawla,
Tat-Seng Chua,
Ramanathan Subramanian
Abstract:
We examine the utility of implicit user behavioral signals captured using low-cost, off-the-shelf devices for anonymous gender and emotion recognition. A user study designed to examine male and female sensitivity to facial emotions confirms that females recognize (especially negative) emotions quicker and more accurately than men, mirroring prior findings. Implicit viewer responses in the form of…
▽ More
We examine the utility of implicit user behavioral signals captured using low-cost, off-the-shelf devices for anonymous gender and emotion recognition. A user study designed to examine male and female sensitivity to facial emotions confirms that females recognize (especially negative) emotions quicker and more accurately than men, mirroring prior findings. Implicit viewer responses in the form of EEG brain signals and eye movements are then examined for existence of (a) emotion and gender-specific patterns from event-related potentials (ERPs) and fixation distributions and (b) emotion and gender discriminability. Experiments reveal that (i) Gender and emotion-specific differences are observable from ERPs, (ii) multiple similarities exist between explicit responses gathered from users and their implicit behavioral signals, and (iii) Significantly above-chance ($\approx$70%) gender recognition is achievable on comparing emotion-specific EEG responses-- gender differences are encoded best for anger and disgust. Also, fairly modest valence (positive vs negative emotion) recognition is achieved with EEG and eye-based features.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Discovering Gender Differences in Facial Emotion Recognition via Implicit Behavioral Cues
Authors:
Maneesh Bilalpur,
Seyed Mostafa Kia,
Tat-Seng Chua,
Ramanathan Subramanian
Abstract:
We examine the utility of implicit behavioral cues in the form of EEG brain signals and eye movements for gender recognition (GR) and emotion recognition (ER). Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. We asked 28 viewers (14 female) to recognize emotions from unoccluded (no mask) as well as partially occluded (eye and mouth masked) emotive faces. Obtained e…
▽ More
We examine the utility of implicit behavioral cues in the form of EEG brain signals and eye movements for gender recognition (GR) and emotion recognition (ER). Specifically, the examined cues are acquired via low-cost, off-the-shelf sensors. We asked 28 viewers (14 female) to recognize emotions from unoccluded (no mask) as well as partially occluded (eye and mouth masked) emotive faces. Obtained experimental results reveal that (a) reliable GR and ER is achievable with EEG and eye features, (b) differential cognitive processing especially for negative emotions is observed for males and females and (c) some of these cognitive differences manifest under partial face occlusion, as typified by the eye and mouth mask conditions.
△ Less
Submitted 29 August, 2017;
originally announced August 2017.
-
Software-Defined Network Controlled Switching between Millimeter Wave and Terahertz Small Cells
Authors:
Angela Sara Cacciapuoti,
Ramanathan Subramanian,
Kaushik Roy Chowdhury,
Marcello Caleffi
Abstract:
Small cells are a cost-effective way to reliably expand network coverage and provide significantly increased capacity for end users. The ultra-high bandwidth available at millimeter (mmWave) and Terahertz (THz) frequencies can effectively realize short-range wireless access links in small cells enabling potential uses cases such as driver-less cars, data backhauling and ultra-high-definition infot…
▽ More
Small cells are a cost-effective way to reliably expand network coverage and provide significantly increased capacity for end users. The ultra-high bandwidth available at millimeter (mmWave) and Terahertz (THz) frequencies can effectively realize short-range wireless access links in small cells enabling potential uses cases such as driver-less cars, data backhauling and ultra-high-definition infotainment services. This paper describes a new software defined network (SDN) framework for vehicles equipped with transceivers capable of dynamically switching between THz and mmWave bands. We present a novel SDN controlled admission policy that preferentially handoffs between the mmWave and THz small cells, accommodates asymmetric uplink/downlink traffic, performs error recovery and handles distinct link states that arise due to motion along practical vehicular paths. We then analytically derive the resulting capacity of such a small cell network by accounting for the channel characteristics unique to both these spectrum bands, relative distance and the contact times between a given transceiver pair. We then formulate the optimal procedure for scheduling multiple vehicles at a given infrastructure tower, with regards to practical road congestion scenarios. The search for the optimal schedule is shown to be a NP-hard problem. Hence, we design a computationally-feasible polynomial-time scheduling algorithm that runs at the SDN controller and compare its performance against the optimal procedure and random access. Additionally, we present a simulation-based case study for the use case of data center backhauling in Boston city to showcase the benefits of our approach.
△ Less
Submitted 9 February, 2017;
originally announced February 2017.
-
Evaluating Crowdsourcing Participants in the Absence of Ground-Truth
Authors:
Ramanathan Subramanian,
Romer Rosales,
Glenn Fung,
Jennifer Dy
Abstract:
Given a supervised/semi-supervised learning scenario where multiple annotators are available, we consider the problem of identification of adversarial or unreliable annotators.
Given a supervised/semi-supervised learning scenario where multiple annotators are available, we consider the problem of identification of adversarial or unreliable annotators.
△ Less
Submitted 30 May, 2016;
originally announced May 2016.
-
High-Level System Design of IEEE 802.11b Standard-Compliant Link Layer for MATLAB-Based SDR
Authors:
Ramanathan Subramanian,
Benjamin Drozdenko,
Eric Doyle,
Rameez Ahmed,
Miriam Leeser,
Kaushik R. Chowdhury
Abstract:
Software defined radio (SDR) allows unprecedented levels of flexibility by transitioning the radio communication system from a rigid hardware platform to a more user-controlled software paradigm. However, it can still be time consuming to design and implement such SDRs as they typically require thorough knowledge of the operating environment and a careful tuning of the program. In this work, our c…
▽ More
Software defined radio (SDR) allows unprecedented levels of flexibility by transitioning the radio communication system from a rigid hardware platform to a more user-controlled software paradigm. However, it can still be time consuming to design and implement such SDRs as they typically require thorough knowledge of the operating environment and a careful tuning of the program. In this work, our contribution is the design of a bidirectional transceiver that runs on the commonly used USRP platform and implemented in MATLAB using standard tools like MATLAB Coder and MEX to speed up the processing steps. We outline strategies on how to create a state-action based design, wherein the same node switches between transmitter and receiver functions. Our design allows optimal selection of the parameters towards meeting the timing requirements set forth by various processing blocks associated with a DBPSK physical layer and CSMA/CA/ACK MAC layer so that all operations remain functionally compliant with the IEEE 802.11b standard for the 1 Mbps specification. The code base of the system is enabled through the Communications System Toolbox and incorporates channel sensing and exponential random back-off for contention resolution. The current work provides an experimental testbed that enables creation of new MAC protocols starting from the fundamental IEEE 802.11b standard. Our design approach guarantees consistent performance of the bi-directional link, and the three node experimental results demonstrate the robustness of the system in mitigating packet collisions and enforcing fairness among nodes, making it a feasible framework in higher layer protocol design.
△ Less
Submitted 26 April, 2016;
originally announced April 2016.
-
PET: An Eye-tracking Dataset for Animal-centric PASCAL Object Classes
Authors:
Syed Omer Gilani,
Ramanathan Subramanian,
Yan Yan,
David Melcher,
Nicu Sebe,
Stefan Winkler
Abstract:
We present the Pascal animal classes Eye Tracking database. Our database comprises eye movement recordings compiled from forty users for the bird, cat, cow, dog, horse and sheep {trainval} sets from the VOC 2012 image set. Different from recent eye-tracking databases such as \cite{kiwon_cvpr13_gaze,PapadopoulosCKF14}, a salient aspect of PET is that it contains eye movements recorded for both the…
▽ More
We present the Pascal animal classes Eye Tracking database. Our database comprises eye movement recordings compiled from forty users for the bird, cat, cow, dog, horse and sheep {trainval} sets from the VOC 2012 image set. Different from recent eye-tracking databases such as \cite{kiwon_cvpr13_gaze,PapadopoulosCKF14}, a salient aspect of PET is that it contains eye movements recorded for both the free-viewing and visual search task conditions. While some differences in terms of overall gaze behavior and scanning patterns are observed between the two conditions, a very similar number of fixations are observed on target objects for both conditions. As a utility application, we show how feature pooling around fixated locations enables enhanced (animal) object classification accuracy.
△ Less
Submitted 6 April, 2016;
originally announced April 2016.
-
Selecting wavelengths for least squares range estimation
Authors:
Assad Akhlaq,
Robby McKilliam,
Ramanan Subramanian,
Andre Pollok
Abstract:
We consider the problem of estimating the distance, or range, between two locations by measuring the phase of multiple sinusoidal signals transmitted between the locations. Traditional estimators developed for optical interferometry include the beat wavelength and excess fractions methods. More recently, estimators based on the Chinese remainder theorem (CRT) and least squares have appeared. Recen…
▽ More
We consider the problem of estimating the distance, or range, between two locations by measuring the phase of multiple sinusoidal signals transmitted between the locations. Traditional estimators developed for optical interferometry include the beat wavelength and excess fractions methods. More recently, estimators based on the Chinese remainder theorem (CRT) and least squares have appeared. Recent research suggests the least squares estimator to be most accurate in many cases. The accuracy of all of these range estimators depends upon the wavelengths chosen. This leads to the problem of selecting wavelengths that maximise accuracy. Procedures for selecting wavelengths for the beat wavelength and excess fractions methods have previously been described, but procedures for the CRT and least squares estimators are yet to be developed. In this paper we develop an algorithm to automatically select wavelengths for use with the least square range estimator. The algorithm minimises an optimisation criterion connected with the mean square error. Interesting properties of a particular class of lattices simplify the criterion allowing minimisation by depth first search. Monte-Carlo simulations indicate that wavelengths that minimise the criterion can result is considerably more accurate range estimates than wavelengths selected by ad hoc means.
△ Less
Submitted 4 February, 2016;
originally announced February 2016.
-
Basis construction for range estimation by phase unwrap**
Authors:
Assad Akhlaq,
R. G. McKilliam,
R. Subramanian
Abstract:
We consider the problem of estimating the distance, or range, between two locations by measuring the phase of a sinusoidal signal transmitted between the locations. This method is only capable of unambiguously measuring range within an interval of length equal to the wavelength of the signal. To address this problem signals of multiple different wavelengths can be transmitted. The range can then b…
▽ More
We consider the problem of estimating the distance, or range, between two locations by measuring the phase of a sinusoidal signal transmitted between the locations. This method is only capable of unambiguously measuring range within an interval of length equal to the wavelength of the signal. To address this problem signals of multiple different wavelengths can be transmitted. The range can then be measured within an interval of length equal to the least common multiple of these wavelengths. Estimation of the range requires solution of a problem from computational number theory called the closest lattice point problem. Algorithms to solve this problem require a basis for this lattice. Constructing a basis is non-trivial and an explicit construction has only been given in the case that the wavelengths can be scaled to pairwise relatively prime integers. In this paper we present an explicit construction of a basis without this assumption on the wavelengths. This is important because the accuracy of the range estimator depends upon the wavelengths. Simulations indicate that significant improvement in accuracy can be achieved by using wavelengths that cannot be scaled to pairwise relatively prime integers.
△ Less
Submitted 9 June, 2015;
originally announced August 2015.
-
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Authors:
Xavier Alameda-Pineda,
Jacopo Staiano,
Ramanathan Subramanian,
Ligia Batrinca,
Elisa Ricci,
Bruno Lepri,
Oswald Lanz,
Nicu Sebe
Abstract:
Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavior…
▽ More
Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.
△ Less
Submitted 23 June, 2015;
originally announced June 2015.
-
A novel approach for mobility management inf lte femtocells
Authors:
Pantha Ghosal,
Shouman Barua,
Ramprasad Subramanian,
Shiqi Xing,
Kumbesan Sandrasegaran
Abstract:
LTE is an emerging wireless data communication technology to provide broadband ubiquitous Internet access. Femtocells are included in 3GPP since Release 8 to enhance the indoor network coverage and capacity. The main challenge of mobility management in hierarchical LTE structure is to guarantee efficient handover to or from/to/between Femtocells. This paper focuses, on different types of Handover…
▽ More
LTE is an emerging wireless data communication technology to provide broadband ubiquitous Internet access. Femtocells are included in 3GPP since Release 8 to enhance the indoor network coverage and capacity. The main challenge of mobility management in hierarchical LTE structure is to guarantee efficient handover to or from/to/between Femtocells. This paper focuses, on different types of Handover and comparison performance between different decision algorithms. Furthermore, a speed based Handover algorithm for macro-femto scenario is proposed with simulation results
△ Less
Submitted 10 November, 2014;
originally announced November 2014.
-
Information Loss due to Finite Block Length in a Gaussian Line Network: An Improved Bound
Authors:
Ramanan Subramanian,
Badri Vellambi,
Ingmar Land
Abstract:
A bound on the maximum information transmission rate through a cascade of Gaussian links is presented. The network model consists of a source node attempting to send a message drawn from a finite alphabet to a sink, through a cascade of Additive White Gaussian Noise links each having an input power constraint. Intermediate nodes are allowed to perform arbitrary encoding/decoding operations, but th…
▽ More
A bound on the maximum information transmission rate through a cascade of Gaussian links is presented. The network model consists of a source node attempting to send a message drawn from a finite alphabet to a sink, through a cascade of Additive White Gaussian Noise links each having an input power constraint. Intermediate nodes are allowed to perform arbitrary encoding/decoding operations, but the block length and the encoding rate are fixed. The bound presented in this paper is fundamental and depends only on the design parameters namely, the network size, block length, transmission rate, and signal-to-noise ratio.
△ Less
Submitted 26 January, 2013;
originally announced January 2013.
-
On the error performance of the $A_n$ lattices
Authors:
Robby McKilliam,
Ramanan Subramanian,
Emanuele Viterbo,
I. Vaughan L. Clarkson
Abstract:
We consider the root lattice $A_n$ and derive explicit formulae for the moments of its Voronoi cell. We then show that these formulae enable accurate prediction of the error probability of lattice codes constructed from $A_n$.
We consider the root lattice $A_n$ and derive explicit formulae for the moments of its Voronoi cell. We then show that these formulae enable accurate prediction of the error probability of lattice codes constructed from $A_n$.
△ Less
Submitted 27 November, 2011;
originally announced November 2011.
-
Robust Multi biometric Recognition Using Face and Ear Images
Authors:
Nazmeen Bibi Boodoo,
R. K. Subramanian
Abstract:
This study investigates the use of ear as a biometric for authentication and shows experimental results obtained on a newly created dataset of 420 images. Images are passed to a quality module in order to reduce False Rejection Rate. The Principal Component Analysis (eigen ear) approach was used, obtaining 90.7 percent recognition rate. Improvement in recognition results is obtained when ear bio…
▽ More
This study investigates the use of ear as a biometric for authentication and shows experimental results obtained on a newly created dataset of 420 images. Images are passed to a quality module in order to reduce False Rejection Rate. The Principal Component Analysis (eigen ear) approach was used, obtaining 90.7 percent recognition rate. Improvement in recognition results is obtained when ear biometric is fused with face biometric. The fusion is done at decision level, achieving a recognition rate of 96 percent.
△ Less
Submitted 4 December, 2009;
originally announced December 2009.