Search | arXiv e-print repository

Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue

Authors: Simone Alghisi, Massimo Rizzoli, Gabriel Roccabruna, Seyed Mahed Mousavi, Giuseppe Riccardi

Abstract: We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LL… ▽ More We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LLM adaptation techniques when applied to different dialogue types. We have selected two base LLMs, Llama-2 and Mistral, and four dialogue types Open-Domain, Knowledge-Grounded, Task-Oriented, and Question Answering. We evaluate the performance of in-context learning and fine-tuning techniques across datasets selected for each dialogue type. We assess the impact of incorporating external knowledge to ground the generation in both scenarios of Retrieval-Augmented Generation (RAG) and gold knowledge. We adopt consistent evaluation and explainability criteria for automatic metrics and human evaluation protocols. Our analysis shows that there is no universal best-technique for adapting large language models as the efficacy of each technique depends on both the base LLM and the specific type of dialogue. Last but not least, the assessment of the best adaptation technique should include human evaluation to avoid false expectations and outcomes derived from automatic metrics. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2404.08700 [pdf, other]

DyKnow:Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs

Authors: Seyed Mahed Mousavi, Simone Alghisi, Giuseppe Riccardi

Abstract: LLMs acquire knowledge from massive data snapshots collected at different timestamps. Their knowledge is then commonly evaluated using static benchmarks. However, factual knowledge is generally subject to time-sensitive changes, and static benchmarks cannot address those cases. We present an approach to dynamically evaluate the knowledge in LLMs and their time-sensitiveness against Wikidata, a pub… ▽ More LLMs acquire knowledge from massive data snapshots collected at different timestamps. Their knowledge is then commonly evaluated using static benchmarks. However, factual knowledge is generally subject to time-sensitive changes, and static benchmarks cannot address those cases. We present an approach to dynamically evaluate the knowledge in LLMs and their time-sensitiveness against Wikidata, a publicly available up-to-date knowledge graph. We evaluate the time-sensitive knowledge in twenty-four private and open-source LLMs, as well as the effectiveness of four editing methods in updating the outdated facts. Our results show that 1) outdatedness is a critical problem across state-of-the-art LLMs; 2) LLMs output inconsistent answers when prompted with slight variations of the question prompt; and 3) the performance of the state-of-the-art knowledge editing algorithms is very limited, as they can not reduce the cases of outdatedness and output inconsistency. △ Less

Submitted 12 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2401.02297 [pdf, other]

Are LLMs Robust for Spoken Dialogues?

Authors: Seyed Mahed Mousavi, Gabriel Roccabruna, Simone Alghisi, Massimo Rizzoli, Mirco Ravanelli, Giuseppe Riccardi

Abstract: Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks, including dialogue state tracking and end-to-end response generation. Nevertheless, most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations. Consequently, the robustness of the developed models to spoken interactions is unknown. In… ▽ More Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks, including dialogue state tracking and end-to-end response generation. Nevertheless, most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations. Consequently, the robustness of the developed models to spoken interactions is unknown. In this work, we have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets. Due to the lack of proper spoken dialogue datasets, we have automatically transcribed a development set of spoken dialogues with a state-of-the-art ASR engine. We have characterized the ASR-error types and their distributions and simulated these errors in a large dataset of dialogues. We report the intrinsic (perplexity) and extrinsic (human evaluation) performance of fine-tuned GPT-2 and T5 models in two subtasks of response generation and dialogue state tracking, respectively. The results show that LLMs are not robust to spoken noise by default, however, fine-tuning/training such models on a proper dataset of spoken TODs can result in a more robust performance. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2308.02665 [pdf, other]

Let's Give a Voice to Conversational Agents in Virtual Reality

Authors: Michele Yin, Gabriel Roccabruna, Abhinav Azad, Giuseppe Riccardi

Abstract: The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and ad… ▽ More The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and adding custom or cloud-based Speech-To-Text and Text-To-Speech models to make the interaction voice-based. Using this architecture, we present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2305.17422 [pdf, other]

doi 10.18653/v1/2023.wassa-1.9

Understanding Emotion Valence is a Joint Deep Learning Task

Authors: Gabriel Roccabruna, Seyed Mahed Mousavi, Giuseppe Riccardi

Abstract: The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task lea… ▽ More The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task, two-step, and joint settings for the valence and EC prediction tasks. We compare and evaluate the performance of generative (GPT-2) and discriminative (BERT) architectures in each setting. We observed that providing the ground truth label of one task improves the prediction performance of the models in the other task. We further observed that the discriminative model achieves the best trade-off of valence and EC prediction tasks in the joint prediction setting. As a result, we attain a single model that performs both tasks, thus, saving computation resources at training and inference times. △ Less

Submitted 31 October, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.15908 [pdf, other]

doi 10.18653/v1/2023.nlp4convai-1.1

Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?

Authors: Seyed Mahed Mousavi, Simone Caldarella, Giuseppe Riccardi

Abstract: Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage… ▽ More Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2302.07748 [pdf, other]

doi 10.18653/v1/2023.wnu-1.1

Whats New? Identifying the Unfolding of New Events in Narratives

Authors: Seyed Mahed Mousavi, Shohei Tanaka, Gabriel Roccabruna, Koichiro Yoshino, Satoshi Nakamura, Giuseppe Riccardi

Abstract: Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a… ▽ More Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding. △ Less

Submitted 8 August, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2206.08835 [pdf, other]

What can Speech and Language Tell us About the Working Alliance in Psychotherapy

Authors: Sebastian P. Bayerl, Gabriel Roccabruna, Shammur Absar Chowdhury, Tommaso Ciulli, Morena Danieli, Korbinian Riedhammer, Giuseppe Riccardi

Abstract: We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Alliance Inventory Observer-rated Shortened - a 12 item… ▽ More We are interested in the problem of conversational analysis and its application to the health domain. Cognitive Behavioral Therapy is a structured approach in psychotherapy, allowing the therapist to help the patient to identify and modify the malicious thoughts, behavior, or actions. This cooperative effort can be evaluated using the Working Alliance Inventory Observer-rated Shortened - a 12 items inventory covering task, goal, and relationship - which has a relevant influence on therapeutic outcomes. In this work, we investigate the relation between this alliance inventory and the spoken conversations (sessions) between the patient and the psychotherapist. We have delivered eight weeks of e-therapy, collected their audio and video call sessions, and manually transcribed them. The spoken conversations have been annotated and evaluated with WAI ratings by professional therapists. We have investigated speech and language features and their association with WAI items. The feature types include turn dynamics, lexical entrainment, and conversational descriptors extracted from the speech and language signals. Our findings provide strong evidence that a subset of these features are strong indicators of working alliance. To the best of our knowledge, this is the first and a novel study to exploit speech and language for characterising working alliance. △ Less

Submitted 27 June, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted at Interspeech 2022

arXiv:2112.06603 [pdf, other]

Detecting Emotion Carriers by Combining Acoustic and Lexical Representations

Authors: Sebastian P. Bayerl, Aniruddha Tammewar, Korbinian Riedhammer, Giuseppe Riccardi

Abstract: Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of fa… ▽ More Personal narratives (PN) - spoken or written - are recollections of facts, people, events, and thoughts from one's own experience. Emotion recognition and sentiment analysis tasks are usually defined at the utterance or document level. However, in this work, we focus on Emotion Carriers (EC) defined as the segments (speech or text) that best explain the emotional state of the narrator ("loss of father", "made me choose"). Once extracted, such EC can provide a richer representation of the user state to improve natural language understanding and dialogue modeling. In previous work, it has been shown that EC can be identified using lexical features. However, spoken narratives should provide a richer description of the context and the users' emotional state. In this paper, we leverage word-based acoustic and textual embeddings as well as early and late fusion techniques for the detection of ECs in spoken narratives. For the acoustic word-level representations, we use Residual Neural Networks (ResNet) pretrained on separate speech emotion corpora and fine-tuned to detect EC. Experiments with different fusion and system combination strategies show that late fusion leads to significant improvements for this task. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted at ASRU 2021 https://asru2021.org/

arXiv:2111.13208 [pdf, other]

Evaluation of Interpretability for Deep Learning algorithms in EEG Emotion Recognition: A case study in Autism

Authors: Juan Manuel Mayor-Torres, Sara Medina-DeVilliers, Tessa Clarkson, Matthew D. Lerner, Giuseppe Riccardi

Abstract: Current models on Explainable Artificial Intelligence (XAI) have shown an evident and quantified lack of reliability for measuring feature-relevance when statistically entangled features are proposed for training deep classifiers. There has been an increase in the application of Deep Learning in clinical trials to predict early diagnosis of neuro-developmental disorders, such as Autism Spectrum Di… ▽ More Current models on Explainable Artificial Intelligence (XAI) have shown an evident and quantified lack of reliability for measuring feature-relevance when statistically entangled features are proposed for training deep classifiers. There has been an increase in the application of Deep Learning in clinical trials to predict early diagnosis of neuro-developmental disorders, such as Autism Spectrum Disorder (ASD). However, the inclusion of more reliable saliency-maps to obtain more trustworthy and interpretable metrics using neural activity features is still insufficiently mature for practical applications in diagnostics or clinical trials. Moreover, in ASD research the inclusion of deep classifiers that use neural measures to predict viewed facial emotions is relatively unexplored. Therefore, in this study we propose the evaluation of a Convolutional Neural Network (CNN) for electroencephalography (EEG)-based facial emotion recognition decoding complemented with a novel RemOve-And-Retrain (ROAR) methodology to recover highly relevant features used in the classifier. Specifically, we compare well-known relevance maps such as Layer-Wise Relevance Propagation (LRP), PatternNet, Pattern-Attribution, and Smooth-Grad Squared. This study is the first to consolidate a more transparent feature-relevance calculation for a successful EEG-based facial emotion recognition using a within-subject-trained CNN in typically-developed and ASD individuals. △ Less

Submitted 23 February, 2023; v1 submitted 25 November, 2021; originally announced November 2021.

arXiv:2107.10790 [pdf, other]

Interpretable SincNet-based Deep Learning for Emotion Recognition from EEG brain activity

Authors: Juan Manuel Mayor-Torres, Mirco Ravanelli, Sara E. Medina-DeVilliers, Matthew D. Lerner, Giuseppe Riccardi

Abstract: Machine learning methods, such as deep learning, show promising results in the medical domain. However, the lack of interpretability of these algorithms may hinder their applicability to medical decision support systems. This paper studies an interpretable deep learning technique, called SincNet. SincNet is a convolutional neural network that efficiently learns customized band-pass filters through… ▽ More Machine learning methods, such as deep learning, show promising results in the medical domain. However, the lack of interpretability of these algorithms may hinder their applicability to medical decision support systems. This paper studies an interpretable deep learning technique, called SincNet. SincNet is a convolutional neural network that efficiently learns customized band-pass filters through trainable sinc-functions. In this study, we use SincNet to analyze the neural activity of individuals with Autism Spectrum Disorder (ASD), who experience characteristic differences in neural oscillatory activity. In particular, we propose a novel SincNet-based neural network for detecting emotions in ASD patients using EEG signals. The learned filters can be easily inspected to detect which part of the EEG spectrum is used for predicting emotions. We found that our system automatically learns the high-$α$ (9-13 Hz) and $β$ (13-30 Hz) band suppression often present in individuals with ASD. This result is consistent with recent neuroscience studies on emotion recognition, which found an association between these band suppressions and the behavioral deficits observed in individuals with ASD. The improved interpretability of SincNet is achieved without sacrificing performance in emotion recognition. △ Less

Submitted 18 July, 2021; originally announced July 2021.

arXiv:2102.10121 [pdf, other]

doi 10.1103/PhysRevA.103.042417

Exploring the relationship between the faithfulness and entanglement of two qubits

Authors: Gabriele Riccardi, Daniel E. Jones, Xiao-Dong Yu, Otfried Gühne, Brian T. Kirby

Abstract: A conceptually simple and experimentally prevalent class of entanglement witnesses, known as fidelity witnesses, detect entanglement via a state's fidelity with a pure reference state. While existence proofs guarantee that a suitable witness can be constructed for every entangled state, such assurances do not apply to fidelity witnesses. Recent results have found that entangled states that cannot… ▽ More A conceptually simple and experimentally prevalent class of entanglement witnesses, known as fidelity witnesses, detect entanglement via a state's fidelity with a pure reference state. While existence proofs guarantee that a suitable witness can be constructed for every entangled state, such assurances do not apply to fidelity witnesses. Recent results have found that entangled states that cannot be detected by a fidelity witness, known as unfaithful states, are exceedingly common among bipartite states. In this paper, we show that even among two-qubit states, the simplest of all entangled states, unfaithful states can be created through a suitable application of decoherence and filtering to a Bell state. We also show that the faithfulness is not monotonic to entanglement, as measured by the concurrence. Finally, we experimentally verify our predictions using polarization-entangled photons and specifically demonstrate a situation where an unfaithful state is brought to faithfulness at the expense of further reducing the entanglement of the state. △ Less

Submitted 26 February, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

Comments: 9 pages, 6 figures

Journal ref: Phys. Rev. A 103, 042417 (2021)

arXiv:2011.02213 [pdf, other]

doi 10.1088/1475-7516/2022/04/034

QUBIC I: Overview and ScienceProgram

Authors: J. -Ch. Hamilton, L. Mousset, E. S. Battistelli, M. -A. Bigot-Sazy, P. Chanial, R. Charlassier, G. D'Alessandro, P. de Bernardis, M. De Petris, M. M. Gamboa Lerena, L. Grandsire, S. Lau, S. Marnieros, S. Masi, A. Mennella, C. O'Sullivan, M. Piat, G. Riccardi, C. Scóccola, M. Stolpovskiy, A. Tartari, S. A. Torchinsky, F. Voisin, M. Zannoni, P. Ade , et al. (105 additional authors not shown)

Abstract: The Q $\&$ U Bolometric Interferometer for Cosmology (QUBIC) is a novel kind of polarimeter optimized for the measurement of the B-mode polarization of the Cosmic Microwave Background (CMB), which is one of the major challenges of observational cosmology. The signal is expected to be of the order of a few tens of nK, prone to instrumental systematic effects and polluted by various astrophysical fo… ▽ More The Q $\&$ U Bolometric Interferometer for Cosmology (QUBIC) is a novel kind of polarimeter optimized for the measurement of the B-mode polarization of the Cosmic Microwave Background (CMB), which is one of the major challenges of observational cosmology. The signal is expected to be of the order of a few tens of nK, prone to instrumental systematic effects and polluted by various astrophysical foregrounds which can only be controlled through multichroic observations. QUBIC is designed to address these observational issues with a novel approach that combines the advantages of interferometry in terms of control of instrumental systematic effects with those of bolometric detectors in terms of wide-band, background-limited sensitivity. The QUBIC synthesized beam has a frequency-dependent shape that results in the ability to produce maps of the CMB polarization in multiple sub-bands within the two physical bands of the instrument (150 and 220 GHz). These features make QUBIC complementary to other instruments and makes it particularly well suited to characterize and remove Galactic foreground contamination. In this article, first of a series of eight, we give an overview of the QUBIC instrument design, the main results of the calibration campaign, and present the scientific program of QUBIC including not only the measurement of primordial B-modes, but also the measurement of Galactic foregrounds. We give forecasts for typical observations and measurements: with three years of integration on the sky and assuming perfect foreground removal as well as stable atmospheric conditions from our site in Argentina, our simulations show that we can achieve a statistical sensitivity to the effective tensor-to-scalar ratio (including primordial and foreground B-modes) $σ(r)=0.015$. △ Less

Submitted 26 August, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: 34 pages, 16 figures, accepted for publication by JCAP. Overview paper for a series of 8 QUBIC articles special JCAP edition dedicated to QUBIC

arXiv:2008.07481 [pdf, other]

Emotion Carrier Recognition from Personal Narratives

Authors: Aniruddha Tammewar, Alessandra Cervone, Giuseppe Riccardi

Abstract: Personal Narratives (PN) - recollections of facts, events, and thoughts from one's own experience - are often used in everyday conversations. So far, PNs have mainly been explored for tasks such as valence prediction or emotion classification (e.g. happy, sad). However, these tasks might overlook more fine-grained information that could prove to be relevant for understanding PNs. In this work, we… ▽ More Personal Narratives (PN) - recollections of facts, events, and thoughts from one's own experience - are often used in everyday conversations. So far, PNs have mainly been explored for tasks such as valence prediction or emotion classification (e.g. happy, sad). However, these tasks might overlook more fine-grained information that could prove to be relevant for understanding PNs. In this work, we propose a novel task for Narrative Understanding: Emotion Carrier Recognition (ECR). Emotion carriers, the text fragments that carry the emotions of the narrator (e.g. loss of a grandpa, high school reunion), provide a fine-grained description of the emotion state. We explore the task of ECR in a corpus of PNs manually annotated with emotion carriers and investigate different machine learning models for the task. We propose evaluation strategies for ECR including metrics that can be appropriate for different tasks. △ Less

Submitted 24 June, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: To be published at INTERSPEECH 2021, Brno, Czechia

arXiv:2007.12610 [pdf, other]

doi 10.1088/1367-2630/ab990c

Exploring classical correlations in noise to recover quantum information using local filtering

Authors: Daniel E. Jones, Brian T. Kirby, Gabriele Riccardi, Cristian Antonelli, Michael Brodsky

Abstract: A general quantum channel consisting of a decohering and a filtering element carries one qubit of an entangled photon pair. As we apply a local filter to the other qubit, some mutual quantum information between the two qubits is restored depending on the properties of the noise mixed into the signal. We demonstrate a drastic difference between channels with bit-flip and phase-flip noise and furthe… ▽ More A general quantum channel consisting of a decohering and a filtering element carries one qubit of an entangled photon pair. As we apply a local filter to the other qubit, some mutual quantum information between the two qubits is restored depending on the properties of the noise mixed into the signal. We demonstrate a drastic difference between channels with bit-flip and phase-flip noise and further suggest a scheme for maximal recovery of the quantum information. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Journal ref: New J. Phys. 22, 073037 (2020)

arXiv:2006.10157 [pdf, other]

Is this Dialogue Coherent? Learning from Dialogue Acts and Entities

Authors: Alessandra Cervone, Giuseppe Riccardi

Abstract: In this work, we investigate the human perception of coherence in open-domain dialogues. In particular, we address the problem of annotating and modeling the coherence of next-turn candidates while considering the entire history of the dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings, where next-… ▽ More In this work, we investigate the human perception of coherence in open-domain dialogues. In particular, we address the problem of annotating and modeling the coherence of next-turn candidates while considering the entire history of the dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a dataset of human-human spoken dialogues annotated with turn coherence ratings, where next-turn candidate utterances ratings are provided considering the full dialogue context. Our statistical analysis of the corpus indicates how turn coherence perception is affected by patterns of distribution of entities previously introduced and the Dialogue Acts used. Second, we experiment with different architectures to model entities, Dialogue Acts and their combination and evaluate their performance in predicting human coherence ratings on SWBD-Coh. We find that models combining both DA and entity information yield the best performances both for response selection and turn coherence rating. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: Accepted at SIGDIAL 2020

arXiv:2002.12196 [pdf, other]

Annotation of Emotion Carriers in Personal Narratives

Authors: Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi

Abstract: We are interested in the problem of understanding personal narratives (PN) - spoken or written - recollections of facts, events, and thoughts. In PN, emotion carriers are the speech or text segments that best explain the emotional state of the user. Such segments may include entities, verb or noun phrases. Advanced automatic understanding of PNs requires not only the prediction of the user emotion… ▽ More We are interested in the problem of understanding personal narratives (PN) - spoken or written - recollections of facts, events, and thoughts. In PN, emotion carriers are the speech or text segments that best explain the emotional state of the user. Such segments may include entities, verb or noun phrases. Advanced automatic understanding of PNs requires not only the prediction of the user emotional state but also to identify which events (e.g. "the loss of relative" or "the visit of grandpa") or people ( e.g. "the old group of high school mates") carry the emotion manifested during the personal recollection. This work proposes and evaluates an annotation model for identifying emotion carriers in spoken personal narratives. Compared to other text genres such as news and microblogs, spoken PNs are particularly challenging because a narrative is usually unstructured, involving multiple sub-events and characters as well as thoughts and associated emotions perceived by the narrator. In this work, we experiment with annotating emotion carriers from speech transcriptions in the Ulm State-of-Mind in Speech (USoMS) corpus, a dataset of German PNs. We believe this resource could be used for experiments in the automatic extraction of emotion carriers from PN, a task that could provide further advancements in narrative understanding. △ Less

Submitted 15 May, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: published in LREC 2020 http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.188.pdf

arXiv:1911.01371 [pdf, other]

doi 10.18653/v1/W19-3211

Affective Behaviour Analysis of On-line User Interactions: Are On-line Support Groups more Therapeutic than Twitter?

Authors: Giuliano Tortoreto, Evgeny A. Stepanov, Alessandra Cervone, Mateusz Dubiel, Giuseppe Riccardi

Abstract: The increase in the prevalence of mental health problems has coincided with a growing popularity of health related social networking sites. Regardless of their therapeutic potential, On-line Support Groups (OSGs) can also have negative effects on patients. In this work we propose a novel methodology to automatically verify the presence of therapeutic factors in social networking websites by using… ▽ More The increase in the prevalence of mental health problems has coincided with a growing popularity of health related social networking sites. Regardless of their therapeutic potential, On-line Support Groups (OSGs) can also have negative effects on patients. In this work we propose a novel methodology to automatically verify the presence of therapeutic factors in social networking websites by using Natural Language Processing (NLP) techniques. The methodology is evaluated on On-line asynchronous multi-party conversations collected from an OSG and Twitter. The results of the analysis indicate that therapeutic factors occur more frequently in OSG conversations than in Twitter conversations. Moreover, the analysis of OSG conversations reveals that the users of that platform are supportive, and interactions are likely to lead to the improvement of their emotional state. We believe that our method provides a step** stone towards automatic analysis of emotional states of users of online platforms. Possible applications of the method include provision of guidelines that highlight potential implications of using such platforms on users' mental health, and/or support in the analysis of their impact on specific individuals. △ Less

Submitted 4 November, 2019; originally announced November 2019.

arXiv:1908.04092 [pdf, ps, other]

Active Annotation: bootstrap** annotation lexicon and guidelines for supervised NLU learning

Authors: Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov, Giuseppe Di Fabbrizio, Giuseppe Riccardi

Abstract: Natural Language Understanding (NLU) models are typically trained in a supervised learning framework. In the case of intent classification, the predicted labels are predefined and based on the designed annotation schema while the labelling process is based on a laborious task where annotators manually inspect each utterance and assign the corresponding label. We propose an Active Annotation (AA) a… ▽ More Natural Language Understanding (NLU) models are typically trained in a supervised learning framework. In the case of intent classification, the predicted labels are predefined and based on the designed annotation schema while the labelling process is based on a laborious task where annotators manually inspect each utterance and assign the corresponding label. We propose an Active Annotation (AA) approach where we combine an unsupervised learning method in the embedding space, a human-in-the-loop verification process, and linguistic insights to create lexicons that can be open categories and adapted over time. In particular, annotators define the y-label space on-the-fly during the annotation using an iterative process and without the need for prior knowledge about the input data. We evaluate the proposed annotation paradigm in a real use-case NLU scenario. Results show that our Active Annotation paradigm achieves accurate and higher quality training data, with an annotation speed of an order of magnitude higher with respect to the traditional human-only driven baseline annotation methodology. △ Less

Submitted 12 August, 2019; originally announced August 2019.

Comments: 4 pages

MSC Class: 68Uxx

Journal ref: INTERSPEECH 2019

arXiv:1905.11806 [pdf, other]

An Incremental Turn-Taking Model For Task-Oriented Dialog Systems

Authors: Andrei C. Coman, Koichiro Yoshino, Yukitoshi Murase, Satoshi Nakamura, Giuseppe Riccardi

Abstract: In a human-machine dialog scenario, deciding the appropriate time for the machine to take the turn is an open research problem. In contrast, humans engaged in conversations are able to timely decide when to interrupt the speaker for competitive or non-competitive reasons. In state-of-the-art turn-by-turn dialog systems the decision on the next dialog action is taken at the end of the utterance. In… ▽ More In a human-machine dialog scenario, deciding the appropriate time for the machine to take the turn is an open research problem. In contrast, humans engaged in conversations are able to timely decide when to interrupt the speaker for competitive or non-competitive reasons. In state-of-the-art turn-by-turn dialog systems the decision on the next dialog action is taken at the end of the utterance. In this paper, we propose a token-by-token prediction of the dialog state from incremental transcriptions of the user utterance. To identify the point of maximal understanding in an ongoing utterance, we a) implement an incremental Dialog State Tracker which is updated on a token basis (iDST) b) re-label the Dialog State Tracking Challenge 2 (DSTC2) dataset and c) adapt it to the incremental turn-taking experimental scenario. The re-labeling consists of assigning a binary value to each token in the user utterance that allows to identify the appropriate point for taking the turn. Finally, we implement an incremental Turn Taking Decider (iTTD) that is trained on these new labels for the turn-taking decision. We show that the proposed model can achieve a better performance compared to a deterministic handcrafted turn-taking algorithm. △ Less

Submitted 11 July, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

Comments: Accepted to INTERSPEECH 2019

arXiv:1905.05701 [pdf, other]

doi 10.21437/Interspeech.2019-2489

Modeling user context for valence prediction from narratives

Authors: Aniruddha Tammewar, Alessandra Cervone, Eva-Maria Messner, Giuseppe Riccardi

Abstract: Automated prediction of valence, one key feature of a person's emotional state, from individuals' personal narratives may provide crucial information for mental healthcare (e.g. early diagnosis of mental diseases, supervision of disease course, etc.). In the Interspeech 2018 ComParE Self-Assessed Affect challenge, the task of valence prediction was framed as a three-class classification problem us… ▽ More Automated prediction of valence, one key feature of a person's emotional state, from individuals' personal narratives may provide crucial information for mental healthcare (e.g. early diagnosis of mental diseases, supervision of disease course, etc.). In the Interspeech 2018 ComParE Self-Assessed Affect challenge, the task of valence prediction was framed as a three-class classification problem using 8 seconds fragments from individuals' narratives. As such, the task did not allow for exploring contextual information of the narratives. In this work, we investigate the intrinsic information from multiple narratives recounted by the same individual in order to predict their current state-of-mind. Furthermore, with generalizability in mind, we decided to focus our experiments exclusively on textual information as the public availability of audio narratives is limited compared to text. Our hypothesis is, that context modeling might provide insights about emotion triggering concepts (e.g. events, people, places) mentioned in the narratives that are linked to an individual's state of mind. We explore multiple machine learning techniques to model narratives. We find that the models are able to capture inter-individual differences, leading to more accurate predictions of an individual's emotional state, as compared to single narratives. △ Less

Submitted 14 July, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

Comments: To be published in Interspeech 2019

Journal ref: Interspeech 2019

arXiv:1807.10661 [pdf, ps, other]

Concept Tagging for Natural Language Understanding: Two Decadelong Algorithm Development

Authors: Jacopo Gobbi, Evgeny Stepanov, Giuseppe Riccardi

Abstract: Concept tagging is a type of structured learning needed for natural language understanding (NLU) systems. In this task, meaning labels from a domain ontology are assigned to word sequences. In this paper, we review the algorithms developed over the last twenty five years. We perform a comparative evaluation of generative, discriminative and deep learning methods on two public datasets. We report o… ▽ More Concept tagging is a type of structured learning needed for natural language understanding (NLU) systems. In this task, meaning labels from a domain ontology are assigned to word sequences. In this paper, we review the algorithms developed over the last twenty five years. We perform a comparative evaluation of generative, discriminative and deep learning methods on two public datasets. We report on the statistical variability performance measurements. The third contribution is the release of a repository of the algorithms, datasets and recipes for NLU evaluation. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: 5 pages

arXiv:1806.08044 [pdf, ps, other]

Coherence Models for Dialogue

Authors: Alessandra Cervone, Evgeny Stepanov, Giuseppe Riccardi

Abstract: Coherence across multiple turns is a major challenge for state-of-the-art dialogue models. Arguably the most successful approach to automatically learning text coherence is the entity grid, which relies on modelling patterns of distribution of entities across multiple sentences of a text. Originally applied to the evaluation of automatic summaries and the news genre, among its many extensions, thi… ▽ More Coherence across multiple turns is a major challenge for state-of-the-art dialogue models. Arguably the most successful approach to automatically learning text coherence is the entity grid, which relies on modelling patterns of distribution of entities across multiple sentences of a text. Originally applied to the evaluation of automatic summaries and the news genre, among its many extensions, this model has also been successfully used to assess dialogue coherence. Nevertheless, both the original grid and its extensions do not model intents, a crucial aspect that has been studied widely in the literature in connection to dialogue structure. We propose to augment the original grid document representation for dialogue with the intentional structure of the conversation. Our models outperform the original grid representation on both text discrimination and insertion, the two main standard tasks for coherence assessment across three different dialogue datasets, confirming that intents play a key role in modelling dialogue coherence. △ Less

Submitted 20 June, 2018; originally announced June 2018.

Comments: Interspeech 2018

arXiv:1806.04327 [pdf, ps, other]

ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents

Authors: Stefano Mezza, Alessandra Cervone, Giuliano Tortoreto, Evgeny A. Stepanov, Giuseppe Riccardi

Abstract: Dialogue Act (DA) tagging is crucial for spoken language understanding systems, as it provides a general representation of speakers' intents, not bound to a particular dialogue system. Unfortunately, publicly available data sets with DA annotation are all based on different annotation schemes and thus incompatible with each other. Moreover, their schemes often do not cover all aspects necessary fo… ▽ More Dialogue Act (DA) tagging is crucial for spoken language understanding systems, as it provides a general representation of speakers' intents, not bound to a particular dialogue system. Unfortunately, publicly available data sets with DA annotation are all based on different annotation schemes and thus incompatible with each other. Moreover, their schemes often do not cover all aspects necessary for open-domain human-machine interaction. In this paper, we propose a methodology to map several publicly available corpora to a subset of the ISO standard, in order to create a large task-independent training corpus for DA classification. We show the feasibility of using this corpus to train a domain-independent DA tagger testing it on out-of-domain conversational data, and argue the importance of training on multiple corpora to achieve robustness across different DA categories. △ Less

Submitted 12 June, 2018; originally announced June 2018.

arXiv:1711.06095 [pdf, other]

Depression Severity Estimation from Multiple Modalities

Authors: Evgeny Stepanov, Stephane Lathuiliere, Shammur Absar Chowdhury, Arindam Ghosh, Radu-Laurentiu Vieriu, Nicu Sebe, Giuseppe Riccardi

Abstract: Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual features extracted from face) to design and develop a… ▽ More Depression is a major debilitating disorder which can affect people from all ages. With a continuous increase in the number of annual cases of depression, there is a need to develop automatic techniques for the detection of the presence and extent of depression. In this AVEC challenge we explore different modalities (speech, language and visual features extracted from face) to design and develop automatic methods for the detection of depression. In psychology literature, the PHQ-8 questionnaire is well established as a tool for measuring the severity of depression. In this paper we aim to automatically predict the PHQ-8 scores from features extracted from the different modalities. We show that visual features extracted from facial landmarks obtain the best performance in terms of estimating the PHQ-8 results with a mean absolute error (MAE) of 4.66 on the development set. Behavioral characteristics from speech provide an MAE of 4.73. Language features yield a slightly higher MAE of 5.17. When switching to the test set, our Turn Features derived from audio transcriptions achieve the best performance, scoring an MAE of 4.11 (corresponding to an RMSE of 4.94), which makes our system the winner of the AVEC 2017 depression sub-challenge. △ Less

Submitted 10 November, 2017; originally announced November 2017.

Comments: 8 pages, 1 figure

arXiv:1705.04839 [pdf, ps, other]

doi 10.1016/j.csl.2017.12.003

Annotating and Modeling Empathy in Spoken Conversations

Authors: Firoj Alam, Morena Danieli, Giuseppe Riccardi

Abstract: Empathy, as defined in behavioral sciences, expresses the ability of human beings to recognize, understand and react to emotions, attitudes and beliefs of others. The lack of an operational definition of empathy makes it difficult to measure it. In this paper, we address two related problems in automatic affective behavior analysis: the design of the annotation protocol and the automatic recogniti… ▽ More Empathy, as defined in behavioral sciences, expresses the ability of human beings to recognize, understand and react to emotions, attitudes and beliefs of others. The lack of an operational definition of empathy makes it difficult to measure it. In this paper, we address two related problems in automatic affective behavior analysis: the design of the annotation protocol and the automatic recognition of empathy from spoken conversations. We propose and evaluate an annotation scheme for empathy inspired by the modal model of emotions. The annotation scheme was evaluated on a corpus of real-life, dyadic spoken conversations. In the context of behavioral analysis, we designed an automatic segmentation and classification system for empathy. Given the different speech and language levels of representation where empathy may be communicated, we investigated features derived from the lexical and acoustic spaces. The feature development process was designed to support both the fusion and automatic selection of relevant features from high dimensional space. The automatic classification system was evaluated on call center conversations where it showed significantly better performance than the baseline. △ Less

Submitted 29 December, 2017; v1 submitted 13 May, 2017; originally announced May 2017.

Comments: Journal of Computer Speech and Language

ACM Class: I.2; I.2.7

Showing 1–26 of 26 results for author: Riccardi, G