Skip to main content

Showing 1–50 of 215 results for author: Schuller, B

.
  1. arXiv:2407.02751  [pdf, other

    cs.CL cs.AI

    Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset

    Authors: Rui Liu, Haolin Zuo, Zheng Lian, Xiaofen Xing, Björn W. Schuller, Haizhou Li

    Abstract: Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, lang… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 26 pages, 8 figures, 12 tables, NeurIPS 2024 Dataset and Benchmark Track

  2. arXiv:2407.01143  [pdf, other

    cs.SD cs.AI eess.AS

    Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition

    Authors: Oliver Schrüfer, Manuel Milling, Felix Burkhardt, Florian Eyben, Björn Schuller

    Abstract: Uncertainty Quantification (UQ) is an important building block for the reliable use of neural networks in real-world scenarios, as it can be a useful tool in identifying faulty predictions. Speech emotion recognition (SER) models can suffer from particularly many sources of uncertainty, such as the ambiguity of emotions, Out-of-Distribution (OOD) data or, in general, poor recording conditions. Rel… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: accepted for Interspeech 2024, 5 pages

  3. arXiv:2406.17667  [pdf, other

    cs.SD cs.CL eess.AS

    This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach

    Authors: Lukas Christ, Shahin Amiriparian, Friederike Hawighorst, Ann-Kathrin Schill, Angelo Boutalikakis, Lorenz Graf-Vlachy, Andreas König, Björn W. Schuller

    Abstract: Flattery is an important aspect of human communication that facilitates social bonding, shapes perceptions, and influences behavior through strategic compliments and praise, leveraging the power of speech to build rapport effectively. Its automatic detection can thus enhance the naturalness of human-AI interactions. To meet this need, we present a novel audio textual dataset comprising 20 hours of… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  4. arXiv:2406.15119  [pdf, other

    cs.SD cs.AI eess.AS

    Speech Emotion Recognition under Resource Constraints with Data Distillation

    Authors: Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller

    Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.10275  [pdf, other

    cs.CL

    ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets

    Authors: Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller

    Abstract: Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: accepted at INTERSPEECH 2024

    MSC Class: 68T10 ACM Class: I.2

  6. arXiv:2406.08517  [pdf, other

    eess.AS cs.SD

    DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition

    Authors: Xin **g, Luyang Zhang, Jiangjian Xie, Alexander Gebhard, Alice Baird, Bjoern Schuller

    Abstract: In ornithology, bird species are known to have variedit's widely acknowledged that bird species display diverse dialects in their calls across different regions. Consequently, computational methods to identify bird species onsolely through their calls face critsignificalnt challenges. There is growing interest in understanding the impact of species-specific dialects on the effectiveness of bird sp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: accepted by Interspeech 2024

  7. arXiv:2406.07753  [pdf, ps, other

    cs.AI cs.CL

    The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition

    Authors: Shahin Amiriparian, Lukas Christ, Alexander Kathan, Maurice Gerczuk, Niklas Müller, Steffen Klug, Lukas Stappen, Andreas König, Erik Cambria, Björn Schuller, Simone Eulitz

    Abstract: The Multimodal Sentiment Analysis Challenge (MuSe) 2024 addresses two contemporary multimodal affect and sentiment analysis problems: In the Social Perception Sub-Challenge (MuSe-Perception), participants will predict 16 different social attributes of individuals such as assertiveness, dominance, likability, and sincerity based on the provided audio-visual data. The Cross-Cultural Humor Detection… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    MSC Class: 68T10 ACM Class: I.2

  8. arXiv:2406.07203  [pdf, other

    cs.SD eess.AS

    ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks

    Authors: Xin **g, Andreas Triantafyllopoulos, Björn Schuller

    Abstract: Contrastive language-audio pretraining (CLAP) has recently emerged as a method for making audio analysis more generalisable. Specifically, CLAP-style models are able to `answer' a diverse set of language queries, extending the capabilities of audio models beyond a closed set of labels. However, CLAP relies on a large set of (audio, query) pairs for pretraining. While such sets are available for ge… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2406.06665  [pdf, other

    cs.CL

    Enrolment-based personalisation for improving individual-level fairness in speech emotion recognition

    Authors: Andreas Triantafyllopoulos, Björn Schuller

    Abstract: The expression of emotion is highly individualistic. However, contemporary speech emotion recognition (SER) systems typically rely on population-level models that adopt a `one-size-fits-all' approach for predicting emotion. Moreover, standard evaluation practices measure performance also on the population level, thus failing to characterise how models work across different speakers. In the present… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  10. arXiv:2406.06401  [pdf, other

    cs.CL

    INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition

    Authors: Andreas Triantafyllopoulos, Anton Batliner, Simon Rampp, Manuel Milling, Björn Schuller

    Abstract: We revisit the INTERSPEECH 2009 Emotion Challenge -- the first ever speech emotion recognition (SER) challenge -- and evaluate a series of deep learning models that are representative of the major advances in SER research in the time since then. We start by training each model using a fixed set of hyperparameters, and further fine-tune the best-performing models of that initial setup with a grid s… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  11. arXiv:2406.06355  [pdf, other

    cs.CL

    Sustained Vowels for Pre- vs Post-Treatment COPD Classification

    Authors: Andreas Triantafyllopoulos, Anton Batliner, Wolfgang Mayr, Markus Fendler, Florian Pokorny, Maurice Gerczuk, Shahin Amiriparian, Thomas Berghaus, Björn Schuller

    Abstract: Chronic obstructive pulmonary disease (COPD) is a serious inflammatory lung disease affecting millions of people around the world. Due to an obstructed airflow from the lungs, it also becomes manifest in patients' vocal behaviour. Of particular importance is the detection of an exacerbation episode, which marks an acute phase and often requires hospitalisation and treatment. Previous work has show… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  12. arXiv:2406.06339  [pdf, ps, other

    cs.SD eess.AS

    Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines

    Authors: Philipp Wagner, Andreas Triantafyllopoulos, Alexander Gebhard, Björn Schuller

    Abstract: In recent decades, running has become an increasingly popular pastime activity due to its accessibility, ease of practice, and anticipated health benefits. However, the risk of running-related injuries is substantial for runners of different experience levels. Several common forms of injuries result from overuse -- extending beyond the recommended running time and intensity. Recently, audio-based… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at EUSIPCO 2024

  13. arXiv:2406.06332  [pdf, other

    cs.SD eess.AS

    An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats

    Authors: Andreas Triantafyllopoulos, Alexander Gebhard, Manuel Milling, Simon Rampp, Björn Schuller

    Abstract: Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is oft… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted at EUSIPCO 2024

  14. arXiv:2406.02251  [pdf, other

    cs.CL cs.AI

    Modeling Emotional Trajectories in Written Stories Utilizing Transformers and Weakly-Supervised Learning

    Authors: Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

    Abstract: Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience. Automatically modeling emotional trajectories in stories has thus attracted considerable scholarly interest. However, as most existing works have been limited to unsupervised dictionary-based approaches, there is no benchmark for this task. We address this gap by… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings. arXiv admin note: text overlap with arXiv:2212.11382

  15. arXiv:2405.13206  [pdf, other

    cs.CV

    Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding

    Authors: Rong Gao, Xin Liu, Bohao Xing, Zitong Yu, Bjorn W. Schuller, Heikki Kälviäinen

    Abstract: In this work, we focus on a special group of human body language -- the micro-gesture (MG), which differs from the range of ordinary illustrative gestures in that they are not intentional behaviors performed to convey information to others, but rather unintentional behaviors driven by inner feelings. This characteristic introduces two novel challenges regarding micro-gestures that are worth rethin… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  16. arXiv:2405.03953  [pdf, other

    cs.SD eess.AS

    Intelligent Cardiac Auscultation for Murmur Detection via Parallel-Attentive Models with Uncertainty Estimation

    Authors: Zixing Zhang, Tao Pang, **g Han, Björn W. Schuller

    Abstract: Heart murmurs are a common manifestation of cardiovascular diseases and can provide crucial clues to early cardiac abnormalities. While most current research methods primarily focus on the accuracy of models, they often overlook other important aspects such as the interpretability of machine learning algorithms and the uncertainty of predictions. This paper introduces a heart murmur detection meth… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: published at ICASSP 2024

  17. arXiv:2405.03952  [pdf, other

    cs.SD cs.CL eess.AS

    HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech

    Authors: Zhongren Dong, Zixing Zhang, Weixiang Xu, **g Han, Jianjun Ou, Björn W. Schuller

    Abstract: Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: publised at ICASSP 2024

  18. arXiv:2404.19363  [pdf, other

    cs.CL

    Expressivity and Speech Synthesis

    Authors: Andreas Triantafyllopoulos, Björn W. Schuller

    Abstract: Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research. From the very beginning, the community has not only aimed to synthesise high-fidelity speech that accurately conveys the semantic meaning of an utterance, but also to colour it with inflections that cover the same range of affective expressions that humans are capable of. After many year… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Invited contribution. Under review

  19. arXiv:2404.17113  [pdf, other

    cs.LG cs.HC

    MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, **ming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing dataset size and building more effective architectures. However, due to various reasons (such as complex environments and inaccurate annotations), current systems are hard to meet the demands of practical applications. Therefor… ▽ More

    Submitted 23 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  20. arXiv:2404.12132  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Suicide Risk Assessment: A Speech-Based Automated Approach in Emergency Medicine

    Authors: Shahin Amiriparian, Maurice Gerczuk, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Alexander Kathan, Björn W. Schuller

    Abstract: The delayed access to specialized psychiatric assessments and care for patients at risk of suicidal tendencies in emergency departments creates a notable gap in timely intervention, hindering the provision of adequate mental health support during critical situations. To address this, we present a non-invasive, speech-based approach for automatic suicide risk assessment. For our study, we have coll… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    ACM Class: I.2

  21. arXiv:2403.14083  [pdf, other

    cs.SD cs.LG eess.AS

    emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

    Authors: Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Bjorn W. Schuller, Carlos Busso

    Abstract: Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a pot… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE Transactions on Affective Computing on February 19, 2024. arXiv admin note: text overlap with arXiv:2305.14402

  22. arXiv:2403.14006  [pdf, other

    cs.CL cs.AI

    On Prompt Sensitivity of ChatGPT in Affective Computing

    Authors: Mostafa M. Amin, Björn W. Schuller

    Abstract: Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing. However, accessing these emerging capabilities is facilitated through prompt engineering. Despite the existence of some prompting techniques, the field is still rapidly evolving and many prompting ideas still require investigation. In this work, we introduc… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 2 Tables, 1 Figure, preprint submission to ACII 2024

  23. arXiv:2402.01227  [pdf, other

    cs.SD cs.AI cs.HC eess.AS

    STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

    Authors: Yi Chang, Zhao Ren, Zixing Zhang, Xin **g, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller

    Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  24. arXiv:2401.12925  [pdf, other

    cs.SD eess.AS

    Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition

    Authors: Yan Zhao, **cen Wang, Cheng Lu, Sunan Li, Björn Schuller, Yuan Zong, Wenming Zheng

    Abstract: Cross-corpus speech emotion recognition (SER) aims to transfer emotional knowledge from a labeled source corpus to an unlabeled corpus. However, prior methods require access to source data during adaptation, which is unattainable in real-life scenarios due to data privacy protection concerns. This paper tackles a more practical task, namely source-free cross-corpus SER, where a pre-trained source… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  25. arXiv:2401.10536  [pdf, other

    cs.CL

    Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition

    Authors: Yong Wang, Cheng Lu, Hailun Lian, Yan Zhao, Björn Schuller, Yuan Zong, Wenming Zheng

    Abstract: Swin-Transformer has demonstrated remarkable success in computer vision by leveraging its hierarchical feature representation based on Transformer. In speech signals, emotional information is distributed across different scales of speech features, e.\,g., word, phrase, and utterance. Drawing above inspiration, this paper presents a hierarchical speech Transformer with shifted windows to aggregate… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  26. arXiv:2401.09752  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

    Authors: Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Björn Schuller, Wenming Zheng

    Abstract: In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers. Consequently, when the trained model is confronted with data from new speakers, its performance tends to degrade. To address the issue, we propose a Dynamic Joint Distribu… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  27. arXiv:2312.06270  [pdf, other

    eess.AS cs.SD

    Testing Speech Emotion Recognition Machine Learning Models

    Authors: Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller

    Abstract: Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated on the basis of a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or tone of voice. Those models are mainly evaluated in terms of correlation or recall, and always show some errors in their predictions. The errors manifest… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  28. arXiv:2310.14225  [pdf, other

    cs.CL

    Customising General Large Language Models for Specialised Emotion Recognition Tasks

    Authors: Liyizhe Peng, Zixing Zhang, Tao Pang, **g Han, Huan Zhao, Hao Chen, Björn W. Schuller

    Abstract: The advent of large language models (LLMs) has gained tremendous attention over the past year. Previous studies have shown the astonishing performance of LLMs not only in other tasks but also in emotion recognition in terms of accuracy, universality, explanation, robustness, few/zero-shot learning, and others. Leveraging the capability of LLMs inevitably becomes an essential solution for emotion r… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  29. arXiv:2309.16369  [pdf, other

    cs.SD cs.LG eess.AS

    Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

    Authors: Manuel Milling, Andreas Triantafyllopoulos, Iosif Tsangko, Simon David Noel Rampp, Björn Wolfgang Schuller

    Abstract: The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the acoustic scene classification task of the DCASE2020 challenge data. Our analysis is based on two-dimensi… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication

  30. arXiv:2309.15024  [pdf, other

    cs.SD cs.LG eess.AS

    Synthia's Melody: A Benchmark Framework for Unsupervised Domain Adaptation in Audio

    Authors: Chia-Hsin Lin, Charles Jones, Björn W. Schuller, Harry Coppock

    Abstract: Despite significant advancements in deep learning for vision and natural language, unsupervised domain adaptation in audio remains relatively unexplored. We, in part, attribute this to the lack of an appropriate benchmark dataset. To address this gap, we present Synthia's melody, a novel audio data generation framework capable of simulating an infinite variety of 4-second melodies with user-specif… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  31. arXiv:2309.09832  [pdf, other

    cs.CL cs.AI

    Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits

    Authors: Xiangheng He, Junjie Chen, Björn W. Schuller

    Abstract: Multi-task learning (MTL) aims to improve the performance of a primary task by jointly learning with related auxiliary tasks. Traditional MTL methods select tasks randomly during training. However, both previous studies and our results suggest that such a random selection of tasks may not be helpful, and can even be harmful to performance. Therefore, new strategies for task selection and assignmen… ▽ More

    Submitted 11 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  32. Exploring Meta Information for Audio-based Zero-shot Bird Classification

    Authors: Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Björn W. Schuller

    Abstract: Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich… ▽ More

    Submitted 11 June, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  33. arXiv:2309.03244  [pdf, other

    eess.IV cs.CV cs.LG

    EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation

    Authors: Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller

    Abstract: We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent imag… ▽ More

    Submitted 14 March, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: revised version

  34. arXiv:2308.13911  [pdf, other

    cs.AI cs.CL

    A Wide Evaluation of ChatGPT on Affective Computing Tasks

    Authors: Mostafa M. Amin, Rui Mao, Erik Cambria, Björn W. Schuller

    Abstract: With the rise of foundation models, a new artificial intelligence paradigm has emerged, by simply using general purpose foundation models with prompting to solve problems instead of training a separate machine learning model for each problem. Such models have been shown to have emergent properties of solving problems that they were not initially trained on. The studies for the effectiveness of suc… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

    Comments: 8 pages with references, 2 tables

  35. arXiv:2308.12792  [pdf, other

    cs.SD eess.AS

    Sparks of Large Audio Models: A Survey and Outlook

    Authors: Siddique Latif, Moazzam Shoukat, Fahad Shamshad, Muhammad Usama, Yi Ren, Heriberto Cuayáhuitl, Wenwu Wang, Xulong Zhang, Roberto Togneri, Erik Cambria, Björn W. Schuller

    Abstract: This survey paper provides a comprehensive overview of the recent advancements and challenges in applying large language models to the field of audio signal processing. Audio processing, with its diverse signal representations and a wide range of sources--from human voices to musical instruments and environmental sounds--poses challenges distinct from those found in traditional Natural Language Pr… ▽ More

    Submitted 21 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: Under review, Repo URL: https://github.com/EmulationAI/awesome-large-audio-models

  36. arXiv:2308.11773  [pdf

    cs.CL cs.CY cs.SD eess.AS q-bio.QM

    Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model

    Authors: Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf , et al. (3 additional authors not shown)

    Abstract: Language use has been shown to correlate with depression, but large-scale validation is needed. Traditional methods like clinic studies are expensive. So, natural language processing has been employed on social media to predict depression, but limitations remain-lack of validated labels, biased user samples, and no context. Our study identified 29 topics in 3919 smartphone-collected speech recordi… ▽ More

    Submitted 5 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  37. arXiv:2308.11578  [pdf, other

    cs.CL cs.AI cs.LG

    Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models

    Authors: Zixing Zhang, Liyizhe Peng, Tao Pang, **g Han, Huan Zhao, Bjorn W. Schuller

    Abstract: After the inception of emotion recognition or affective computing, it has increasingly become an active research topic due to its broad applications. Over the past couple of decades, emotion recognition models have gradually migrated from statistically shallow models to neural network-based deep models, which can significantly boost the performance of emotion recognition models and consistently ac… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  38. arXiv:2307.06090  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Aid in Annotating Speech Emotional Data? Uncovering New Frontiers

    Authors: Siddique Latif, Muhammad Usama, Mohammad Ibrahim Malik, Björn W. Schuller

    Abstract: Despite recent advancements in speech emotion recognition (SER) models, state-of-the-art deep learning (DL) approaches face the challenge of the limited availability of annotated data. Large language models (LLMs) have revolutionised our understanding of natural language, introducing emergent properties that broaden comprehension in language, speech, and vision. This paper examines the potential o… ▽ More

    Submitted 19 June, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted in IEEE Computational Intelligence Magazine

  39. arXiv:2307.04648  [pdf, other

    cs.CL cs.AI

    Can ChatGPT's Responses Boost Traditional Natural Language Processing?

    Authors: Mostafa M. Amin, Erik Cambria, Björn W. Schuller

    Abstract: The employment of foundation models is steadily expanding, especially with the launch of ChatGPT and the release of other foundation models. These models have shown the potential of emerging capabilities to solve problems, without being particularly trained to solve. A previous work demonstrated these emerging capabilities in affective computing tasks; the performance quality was similar to tradit… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: 9 pages, 2 Tables, 1 Figure

  40. arXiv:2307.02132  [pdf, other

    cs.SD eess.AS

    Going Retro: Astonishingly Simple Yet Effective Rule-based Prosody Modelling for Speech Synthesis Simulating Emotion Dimensions

    Authors: Felix Burkhardt, Uwe Reichel, Florian Eyben, Björn Schuller

    Abstract: We introduce two rule-based models to modify the prosody of speech synthesis in order to modulate the emotion to be expressed. The prosody modulation is based on speech synthesis markup language (SSML) and can be used with any commercial speech synthesizer. The models as well as the optimization result are evaluated against human emotion annotations. Results indicate that with a very simple method… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: accepted at 34th ESSV 2023, Munich 2023

  41. arXiv:2306.16962  [pdf, other

    cs.SD eess.AS

    Speech-based Age and Gender Prediction with Transformers

    Authors: Felix Burkhardt, Johannes Wagner, Hagen Wierstorf, Florian Eyben, Björn Schuller

    Abstract: We report on the curation of several publicly available datasets for age and gender prediction. Furthermore, we present experiments to predict age and gender with models based on a pre-trained wav2vec 2.0. Depending on the dataset, we achieve an MAE between 7.1 years and 10.8 years for age, and at least 91.1% ACC for gender (female, male, child). Compared to a modelling approach built on handcraft… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: 5 pages, submitted to 15th ITG Conference on Speech Communication

  42. arXiv:2305.17137  [pdf, other

    cs.AI cs.LG

    Integrating Generative Artificial Intelligence in Intelligent Vehicle Systems

    Authors: Lukas Stappen, Jeremy Dillmann, Serena Striegel, Hans-Jörg Vögel, Nicolas Flores-Herr, Björn W. Schuller

    Abstract: This paper aims to serve as a comprehensive guide for researchers and practitioners, offering insights into the current state, potential applications, and future research directions for generative artificial intelligence and foundation models within the context of intelligent vehicles. As the automotive industry progressively integrates AI, generative artificial intelligence technologies hold the… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: under review

  43. arXiv:2305.14402  [pdf, other

    cs.SD cs.LG eess.AS

    Enhancing Speech Emotion Recognition Through Differentiable Architecture Search

    Authors: Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Björn Schuller

    Abstract: Speech Emotion Recognition (SER) is a critical enabler of emotion-aware communication in human-computer interactions. Recent advancements in Deep Learning (DL) have substantially enhanced the performance of SER models through increased model complexity. However, designing optimal DL architectures requires prior experience and experimental evaluations. Encouragingly, Neural Architecture Search (NAS… ▽ More

    Submitted 18 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 5 pages, 4 figures

  44. arXiv:2305.14023  [pdf, other

    cs.SD eess.AS

    Happy or Evil Laughter? Analysing a Database of Natural Audio Samples

    Authors: Aljoscha Düsterhöft, Felix Burkhardt, Björn W. Schuller

    Abstract: We conducted a data collection on the basis of the Google AudioSet database by selecting a subset of the samples annotated with \textit{laughter}. The selection criterion was to be present a communicative act with clear connotation of being either positive (laughing with) or negative (being laughed at). On the basis of this annotated data, we performed two experiments: on the one hand, we manually… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  45. arXiv:2305.13195  [pdf, other

    cs.SD eess.AS

    U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

    Authors: Xin **g, Yi Chang, Zijiang Yang, Jiangjian Xie, Andreas Triantafyllopoulos, Bjoern W. Schuller

    Abstract: Deep learning has led to considerable advances in text-to-speech synthesis. Most recently, the adoption of Score-based Generative Models (SGMs), also known as Diffusion Probabilistic Models (DPMs), has gained traction due to their ability to produce high-quality synthesized neural speech in neural speech synthesis systems. In SGMs, the U-Net architecture and its variants have long dominated as the… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  46. arXiv:2305.11413  [pdf, other

    cs.SD eess.AS

    A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

    Authors: Ibrahim Malik, Siddique Latif, Raja Jurdak, Björn Schuller

    Abstract: In this paper, we propose to utilise diffusion models for data augmentation in speech emotion recognition (SER). In particular, we present an effective approach to utilise improved denoising diffusion probabilistic models (IDDPM) to generate synthetic emotional data. We condition the IDDPM with the textual embedding from bidirectional encoder representations from transformers (BERT) to generate hi… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted Interspeech 2023

  47. arXiv:2305.09485   

    econ.GN cs.LG

    Executive Voiced Laughter and Social Approval: An Explorative Machine Learning Study

    Authors: Niklas Mueller, Steffen Klug, Andreas Koenig, Alexander Kathan, Lukas Christ, Bjoern Schuller, Shahin Amiriparian

    Abstract: We study voiced laughter in executive communication and its effect on social approval. Integrating research on laughter, affect-as-information, and infomediaries' social evaluations of firms, we hypothesize that voiced laughter in executive communication positively affects social approval, defined as audience perceptions of affinity towards an organization. We surmise that the effect of laughter i… ▽ More

    Submitted 20 May, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Method section needs to be updated

  48. arXiv:2305.03369  [pdf, other

    cs.LG cs.AI cs.CL cs.MM

    The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation

    Authors: Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller

    Abstract: The MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems: In the Mimicked Emotions Sub-Challenge (MuSe-Mimic), participants predict three continuous emotion targets. This sub-challenge utilises the Hume-Vidmimic dataset comprising of user-generated videos. For the Cross-Cultural Humour Detection Sub-Challenge (MuSe-Humour), an… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Baseline paper for the 4th Multimodal Sentiment Analysis Challenge (MuSe) 2023, a workshop at ACM Multimedia 2023

  49. arXiv:2304.14882  [pdf, other

    cs.SD cs.LG eess.AS

    The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

    Authors: Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié

    Abstract: The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected. We describe the Sub-Challenges, baseline feature extraction, and classi… ▽ More

    Submitted 1 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: 5 pages, part of the ACM Multimedia 2023 Grand Challenge "The ACM Multimedia 2023 Computational Paralinguistics Challenge (ComParE 2023). arXiv admin note: text overlap with arXiv:2205.06799

    MSC Class: 68 ACM Class: I.2.7; I.5.0; J.3

  50. arXiv:2304.08981  [pdf, other

    cs.CL cs.CV

    MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, **ming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provi… ▽ More

    Submitted 14 September, 2023; v1 submitted 18 April, 2023; originally announced April 2023.