Search | arXiv e-print repository

Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation

Abstract: Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in develo** accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion pro… ▽ More Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in develo** accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable emotion recognition models that are less likely to encode sensitive demographic information. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: PhD Thesis

arXiv:2104.08806 [pdf, other]

Best Practices for Noise-Based Augmentation to Improve the Performance of Deployable Speech-Based Emotion Recognition Systems

Authors: Mimansa Jaiswal, Emily Mower Provost

Abstract: Speech emotion recognition is an important component of any human centered system. But speech characteristics produced and perceived by a person can be influenced by a multitude of reasons, both desirable such as emotion, and undesirable such as noise. To train robust emotion recognition models, we need a large, yet realistic data distribution, but emotion datasets are often small and hence are au… ▽ More Speech emotion recognition is an important component of any human centered system. But speech characteristics produced and perceived by a person can be influenced by a multitude of reasons, both desirable such as emotion, and undesirable such as noise. To train robust emotion recognition models, we need a large, yet realistic data distribution, but emotion datasets are often small and hence are augmented with noise. Often noise augmentation makes one important assumption, that the prediction label should remain the same in presence or absence of noise, which is true for automatic speech recognition but not necessarily true for perception based tasks. In this paper we make three novel contributions. We validate through crowdsourcing that the presence of noise does change the annotation label and hence may alter the original ground truth label. We then show how disregarding this knowledge and assuming consistency in ground truth labels propagates to downstream evaluation of ML models, both for performance evaluation and robustness testing. We end the paper with a set of recommendations for noise augmentations in speech emotion recognition datasets. △ Less

Submitted 31 August, 2023; v1 submitted 18 April, 2021; originally announced April 2021.

arXiv:1910.13212 [pdf, other]

Privacy Enhanced Multimodal Neural Representations for Emotion Recognition

Authors: Mimansa Jaiswal, Emily Mower Provost

Abstract: Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. To enable this, data are transmitted from users' devices and stored on central servers. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdrop** adversary. In this work, we show how multimodal representatio… ▽ More Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. To enable this, data are transmitted from users' devices and stored on central servers. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdrop** adversary. In this work, we show how multimodal representations trained for a primary task, here emotion recognition, can unintentionally leak demographic information, which could override a selected opt-out option by the user. We analyze how this leakage differs in representations obtained from textual, acoustic, and multimodal data. We use an adversarial learning paradigm to unlearn the private information present in a representation and investigate the effect of varying the strength of the adversarial component on the primary task and on the privacy metric, defined here as the inability of an attacker to predict specific demographic information. We evaluate this paradigm on multiple datasets and show that we can improve the privacy metric while not significantly impacting the performance on the primary task. To the best of our knowledge, this is the first work to analyze how the privacy metric differs across modalities and how multiple privacy concerns can be tackled while still maintaining performance on emotion recognition. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: 8 pages

arXiv:1910.05115 [pdf, ps, other]

Identifying Mood Episodes Using Dialogue Features from Clinical Interviews

Authors: Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin McInnis, Emily Mower Provost

Abstract: Bipolar disorder, a severe chronic mental illness characterized by pathological mood swings from depression to mania, requires ongoing symptom severity tracking to both guide and measure treatments that are critical for maintaining long-term health. Mental health professionals assess symptom severity through semi-structured clinical interviews. During these interviews, they observe their patients'… ▽ More Bipolar disorder, a severe chronic mental illness characterized by pathological mood swings from depression to mania, requires ongoing symptom severity tracking to both guide and measure treatments that are critical for maintaining long-term health. Mental health professionals assess symptom severity through semi-structured clinical interviews. During these interviews, they observe their patients' spoken behaviors, including both what the patients say and how they say it. In this work, we move beyond acoustic and lexical information, investigating how higher-level interactive patterns also change during mood episodes. We then perform a secondary analysis, asking if these interactive patterns, measured through dialogue features, can be used in conjunction with acoustic features to automatically recognize mood episodes. Our results show that it is beneficial to consider dialogue features when analyzing and building automated systems for predicting and monitoring mood. △ Less

Submitted 24 March, 2022; v1 submitted 28 September, 2019; originally announced October 2019.

arXiv:1908.08979 [pdf, other]

doi 10.1145/3340555.3353731

Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning

Authors: Mimansa Jaiswal, Zakaria Aldeneh, Emily Mower Provost

Abstract: Various psychological factors affect how individuals express emotions. Yet, when we collect data intended for use in building emotion recognition systems, we often try to do so by creating paradigms that are designed just with a focus on eliciting emotional behavior. Algorithms trained with these types of data are unlikely to function outside of controlled environments because our emotions natural… ▽ More Various psychological factors affect how individuals express emotions. Yet, when we collect data intended for use in building emotion recognition systems, we often try to do so by creating paradigms that are designed just with a focus on eliciting emotional behavior. Algorithms trained with these types of data are unlikely to function outside of controlled environments because our emotions naturally change as a function of these other factors. In this work, we study how the multimodal expressions of emotion change when an individual is under varying levels of stress. We hypothesize that stress produces modulations that can hide the true underlying emotions of individuals and that we can make emotion recognition algorithms more generalizable by controlling for variations in stress. To this end, we use adversarial networks to decorrelate stress modulations from emotion representations. We study how stress alters acoustic and lexical emotional predictions, paying special attention to how modulations due to stress affect the transferability of learned emotion recognition models across domains. Our results show that stress is indeed encoded in trained emotion classifiers and that this encoding varies across levels of emotions and across the lexical and acoustic modalities. Our results also show that emotion recognition models that control for stress during training have better generalizability when applied to new domains, compared to models that do not control for stress during training. We conclude that is is necessary to consider the effect of extraneous psychological factors when building and testing emotion recognition models. △ Less

Submitted 23 August, 2019; originally announced August 2019.

Comments: 10 pages, ICMI 2019

arXiv:1908.01901 [pdf, other]

Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information

Authors: Charles B. Delahunt, Mayoore S. Jaiswal, Matthew P. Horning, Samantha Janko, Clay M. Thompson, Sourabh Kulhare, Liming Hu, Travis Ostbye, Grace Yun, Roman Gebrehiwot, Benjamin K. Wilson, Earl Long, Stephane Proux, Dionicia Gamboa, Peter Chiodini, Jane Carter, Mehul Dhorda, David Isaboke, Bernhards Ogutu, Wellington Oyibo, Elizabeth Villasis, Kyaw Myo Tun, Christine Bachman, David Bell, Courosh Mehanian

Abstract: Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumb… ▽ More Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy. △ Less

Submitted 11 September, 2022; v1 submitted 5 August, 2019; originally announced August 2019.

Comments: 16 pages, 13 figures

MSC Class: 68T10 ACM Class: I.5.0

arXiv:1903.11672 [pdf, other]

MuSE-ing on the Impact of Utterance Ordering On Crowdsourced Emotion Annotations

Authors: Mimansa Jaiswal, Zakaria Aldeneh, Cristian-Paul Bara, Yuanhang Luo, Mihai Burzo, Rada Mihalcea, Emily Mower Provost

Abstract: Emotion recognition algorithms rely on data annotated with high quality labels. However, emotion expression and perception are inherently subjective. There is generally not a single annotation that can be unambiguously declared "correct". As a result, annotations are colored by the manner in which they were collected. In this paper, we conduct crowdsourcing experiments to investigate this impact o… ▽ More Emotion recognition algorithms rely on data annotated with high quality labels. However, emotion expression and perception are inherently subjective. There is generally not a single annotation that can be unambiguously declared "correct". As a result, annotations are colored by the manner in which they were collected. In this paper, we conduct crowdsourcing experiments to investigate this impact on both the annotations themselves and on the performance of these algorithms. We focus on one critical question: the effect of context. We present a new emotion dataset, Multimodal Stressed Emotion (MuSE), and annotate the dataset using two conditions: randomized, in which annotators are presented with clips in random order, and contextualized, in which annotators are presented with clips in order. We find that contextual labeling schemes result in annotations that are more similar to a speaker's own self-reported labels and that labels generated from randomized schemes are most easily predictable by automated systems. △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: 5 pages, ICASSP 2019

arXiv:1707.02866 [pdf, other]

doi 10.1109/TSP.2017.2726990

On a registration-based approach to sensor network localization

Authors: Rajat Sanyal, Monika Jaiswal, Kunal Narayan Chaudhury

Abstract: We consider a registration-based approach for localizing sensor networks from range measurements. This is based on the assumption that one can find overlap** cliques spanning the network. That is, for each sensor, one can identify geometric neighbors for which all inter-sensor ranges are known. Such cliques can be efficiently localized using multidimensional scaling. However, since each clique i… ▽ More We consider a registration-based approach for localizing sensor networks from range measurements. This is based on the assumption that one can find overlap** cliques spanning the network. That is, for each sensor, one can identify geometric neighbors for which all inter-sensor ranges are known. Such cliques can be efficiently localized using multidimensional scaling. However, since each clique is localized in some local coordinate system, we are required to register them in a global coordinate system. In other words, our approach is based on transforming the localization problem into a problem of registration. In this context, the main contributions are as follows. First, we describe an efficient method for partitioning the network into overlap** cliques. Second, we study the problem of registering the localized cliques, and formulate a necessary rigidity condition for uniquely recovering the global sensor coordinates. In particular, we present a method for efficiently testing rigidity, and a proposal for augmenting the partitioned network to enforce rigidity. A recently proposed semidefinite relaxation of global registration is used for registering the cliques. We present simulation results on random and structured sensor networks to demonstrate that the proposed method compares favourably with state-of-the-art methods in terms of run-time, accuracy, and scalability. △ Less

Submitted 8 November, 2017; v1 submitted 6 July, 2017; originally announced July 2017.

Journal ref: IEEE Transactions on Signal Processing, vol. 65, no. 20, pp. 5357-5367, 2017

Showing 1–8 of 8 results for author: Jaiswal, M