Exploring Emotion Expression Recognition in Older Adults Interacting with a Virtual Coach
Authors:
Cristina Palmero,
Mikel deVelasco,
Mohamed Amine Hmani,
Aymen Mtibaa,
Leila Ben Letaifa,
Pau Buch-Cardona,
Raquel Justo,
Terry Amorese,
Eduardo González-Fraile,
Begoña Fernández-Ruanova,
Jofre Tenorio-Laranga,
Anna Torp Johansen,
Micaela Rodrigues da Silva,
Liva Jenny Martinussen,
Maria Stylianou Korsnes,
Gennaro Cordasco,
Anna Esposito,
Mounim A. El-Yacoubi,
Dijana Petrovska-Delacrétaz,
M. Inés Torres,
Sergio Escalera
Abstract:
The EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging. One of the core aspects of the system is its human sensing capabilities, allowing for the perception of emotional states to provide a personalized experience. This paper outlines the development of the emotion expression recognition m…
▽ More
The EMPATHIC project aimed to design an emotionally expressive virtual coach capable of engaging healthy seniors to improve well-being and promote independent aging. One of the core aspects of the system is its human sensing capabilities, allowing for the perception of emotional states to provide a personalized experience. This paper outlines the development of the emotion expression recognition module of the virtual coach, encompassing data collection, annotation design, and a first methodological approach, all tailored to the project requirements. With the latter, we investigate the role of various modalities, individually and combined, for discrete emotion expression recognition in this context: speech from audio, and facial expressions, gaze, and head dynamics from video. The collected corpus includes users from Spain, France, and Norway, and was annotated separately for the audio and video channels with distinct emotional labels, allowing for a performance comparison across cultures and label types. Results confirm the informative power of the modalities studied for the emotional categories considered, with multimodal methods generally outperforming others (around 68% accuracy with audio labels and 72-74% with video labels). The findings are expected to contribute to the limited literature on emotion recognition applied to older adults in conversational human-machine interaction.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.