-
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
Authors:
Lucas Goncalves,
Prashant Mathur,
Chandrashekhar Lavania,
Metehan Cekic,
Marcello Federico,
Kyu J. Han
Abstract:
Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks. However, the growth is not attributed solely to models and benchmarks. Universally accepted evaluation metrics also play an important role in advancing the field. While there are many metrics available to evaluate audio and visual content separately…
▽ More
Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks. However, the growth is not attributed solely to models and benchmarks. Universally accepted evaluation metrics also play an important role in advancing the field. While there are many metrics available to evaluate audio and visual content separately, there is a lack of metrics that offer a quantitative and interpretable measure of audio-visual synchronization for videos "in the wild". To address this gap, we first created a large scale human annotated dataset (100+ hrs) representing nine types of synchronization errors in audio-visual content and how human perceive them. We then developed a PEAVS (Perceptual Evaluation of Audio-Visual Synchrony) score, a novel automatic metric with a 5-point scale that evaluates the quality of audio-visual synchronization. We validate PEAVS using a newly generated dataset, achieving a Pearson correlation of 0.79 at the set level and 0.54 at the clip level when compared to human labels. In our experiments, we observe a relative gain 50% over a natural extension of Fréchet based metrics for Audio-Visual synchrony, confirming PEAVS efficacy in objectively modeling subjective perceptions of audio-visual synchronization for videos "in the wild".
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
PASO -- Astronomy and Space Situational Awareness in a Dark Sky Destination
Authors:
Domingos Barbosa,
Bruno Coelho,
Miguel Bergano,
Constança Alves,
Alexandre C. M. Correia,
Luís Cupido,
José Freitas,
Luís Gonçalves,
Bruce Grossan,
Anna Guerman,
Allan K. de Almeida Jr.,
Dalmiro Maia,
Bruno Morgado,
João Pandeirada,
Valério Ribeiro,
Gonçalo Rosa,
George Smoot,
Timothée Vaillant,
Thyrso Villela,
Carlos Alexandre Wuensche
Abstract:
The Pampilhosa da Serra Space Observatory (PASO) is located in the center of the continental Portuguese territory, in the heart of a certified Dark Sky destination by the Starlight Foundation (Aldeias do Xisto) and has been an instrumental asset to advance science, education and astrotourism certifications. PASO hosts astronomy and Space Situational Awareness (SSA) activities including a node of t…
▽ More
The Pampilhosa da Serra Space Observatory (PASO) is located in the center of the continental Portuguese territory, in the heart of a certified Dark Sky destination by the Starlight Foundation (Aldeias do Xisto) and has been an instrumental asset to advance science, education and astrotourism certifications. PASO hosts astronomy and Space Situational Awareness (SSA) activities including a node of the Portuguese Space Surveillance \& Tracking (SST) infrastructure network, such as a space radar currently in test phase using GEM radiotelescope, a double Wide Field of View Telescope system, a EUSST optical sensor telescope. These instruments allow surveillance of satellite and space debris in LEO, MEO and GEO orbits. The WFOV telescope offers spectroscopy capabilities enabling light curve analysis and cosmic sources monitoring. Instruments for Space Weather are being considered for installation to monitor solar activities and expand the range of SSA services.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Authors:
Haibin Wu,
Huang-Cheng Chou,
Kai-Wei Chang,
Lucas Goncalves,
Jiawei Du,
Jyh-Shing Roger Jang,
Chi-Chun Lee,
Hung-Yi Lee
Abstract:
Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervi…
▽ More
Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervised learning models (SSLMs) for exhaustive evaluation across six open-source SER datasets. EMO-SUPERB streamlines result sharing via an online leaderboard, fostering collaboration within a community-driven benchmark and thereby enhancing the development of SER. On average, 2.58% of annotations are annotated using natural language. SER relies on classification models and is unable to process natural languages, leading to the discarding of these valuable annotations. We prompt ChatGPT to mimic annotators, comprehend natural language annotations, and subsequently re-label the data. By utilizing labels generated by ChatGPT, we consistently achieve an average relative gain of 3.08% across all settings.
△ Less
Submitted 12 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks
Authors:
Lucas Goncalves,
Seong-Gyun Leem,
Wei-Cheng Lin,
Berrak Sisman,
Carlos Busso
Abstract:
Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for either predicting emotional attributes or recognizing categorical emotions. Achieving such flexibility in a multimodal emotion recognition system is d…
▽ More
Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for either predicting emotional attributes or recognizing categorical emotions. Achieving such flexibility in a multimodal emotion recognition system is difficult due to the inherent challenges in accurately interpreting and integrating varied data sources. It is also a challenge to robustly handle missing or partial information while allowing direct switch between regression and classification tasks. This study proposes a \emph{versatile audio-visual learning} (VAVL) framework for handling unimodal and multimodal systems for emotion regression and emotion classification tasks. We implement an audio-visual framework that can be trained even when audio and visual paired data is not available for part of the training set (i.e., audio only or only video is present). We achieve this effective representation learning with audio-visual shared layers, residual connections over shared layers, and a unimodal reconstruction task. Our experimental results reveal that our architecture significantly outperforms strong baselines on both the CREMA-D and MSP-IMPROV corpora. Notably, VAVL attains a new state-of-the-art performance in the emotional attribute prediction task on the MSP-IMPROV corpus. Code available at: https://github.com/ilucasgoncalves/VAVL
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Sistema de sensoriamento sem fio aplicavel a deteccao de incendios florestais
Authors:
Lucas Santos Goncalves,
Celso Barbosa Carvalho
Abstract:
In this research work, a hardware and software system is developed that uses wireless sensors to monitor environmental variables such as temperature, gas concentration and luminosity, in order to detect the existence of forest fires. Lora technology was used for wireless sensor networks with communication range that can reach on average up to 5km in urban areas and 10km in rural areas. The develop…
▽ More
In this research work, a hardware and software system is developed that uses wireless sensors to monitor environmental variables such as temperature, gas concentration and luminosity, in order to detect the existence of forest fires. Lora technology was used for wireless sensor networks with communication range that can reach on average up to 5km in urban areas and 10km in rural areas. The developed system also has an integrated web application (dashboard) and that in real time, collects data from wireless sensors, which together form the sensor module, also called device. Then, this data is presented on a map associated with the positioning of each sensor module. The developed system was tested using practical experiments that used flames, gases and lighting, simulating the occurrence of fires. With the tests performed, it was observed the feasibility of the system, hardware/software developed, in detecting the fires in the simulated scenarios. Therefore, it was found that the research is promising, and may advance in the future for the detection of real fires.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.