Search | arXiv e-print repository

Open-Source Conversational AI with SpeechBrain 1.0

Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar , et al. (5 additional authors not shown)

Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presen… ▽ More SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more.It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Submitted to JMLR (Machine Learning Open Source Software)

arXiv:2406.13006 [pdf, other]

Weighted Sum of Segmented Correlation: An Efficient Method for Spectra Matching in Hyperspectral Images

Authors: Sampriti Soor, Priyanka Kumari, B. S. Daya Sagar, Amba Shetty

Abstract: Matching a target spectrum with known spectra in a spectral library is a common method for material identification in hyperspectral imaging research. Hyperspectral spectra exhibit precise absorption features across different wavelength segments, and the unique shapes and positions of these absorptions create distinct spectral signatures for each material, aiding in their identification. Therefore,… ▽ More Matching a target spectrum with known spectra in a spectral library is a common method for material identification in hyperspectral imaging research. Hyperspectral spectra exhibit precise absorption features across different wavelength segments, and the unique shapes and positions of these absorptions create distinct spectral signatures for each material, aiding in their identification. Therefore, only the specific positions can be considered for material identification. This study introduces the Weighted Sum of Segmented Correlation method, which calculates correlation indices between various segments of a library and a test spectrum, and derives a matching index, favoring positive correlations and penalizing negative correlations using assigned weights. The effectiveness of this approach is evaluated for mineral identification in hyperspectral images from both Earth and Martian surfaces. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted in IEEE IGARSS 2024 conference

arXiv:2312.15429 [pdf, other]

Multi-RIS Communication Systems: Asymptotic analysis of best RIS selection for i.n.i.d. Random Variables using Extreme Value Theory

Authors: Srinivas Sagar, Sheetal Kalyani

Abstract: This paper investigates the performance of multiple reconfigurable intelligent surfaces (multi-RIS) communication systems where the RIS link with the highest signal-to-noise-ratio (SNR) is selected at the destination. In practice, all the RISs will not have the same number of reflecting elements. Hence, selecting the RIS link with the highest SNR will involve characterizing the distribution of the… ▽ More This paper investigates the performance of multiple reconfigurable intelligent surfaces (multi-RIS) communication systems where the RIS link with the highest signal-to-noise-ratio (SNR) is selected at the destination. In practice, all the RISs will not have the same number of reflecting elements. Hence, selecting the RIS link with the highest SNR will involve characterizing the distribution of the maximum of independent, non-identically distributed (i.n.i.d.) SNR random variables (RVs). Using extreme value theory (EVT), we derive the asymptotic distribution of the normalized maximum of i.n.i.d. non-central chi-square (NCCS) distributed SNR RVs with one degree of freedom (d.o.f) and then extend the results for k-th order statistics. Using these asymptotic results, the outage capacity and average throughput expressions are derived for the multi-RIS system. The results for independent and identically distributed (i.i.d.) SNR RVs are then derived as a special case of i.n.i.d. RVs. All the derivations are validated through extensive Monte Carlo simulations, and their utility is discussed. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2306.04054 [pdf, other]

RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain

Authors: Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff Korbayova, Josef van Genabith

Abstract: Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments. This poses a particular challenge in the search and rescue (SAR) domain, where transcribing conversations among rescue team members is crucial to support real-time decision-making. The scarcity of speech d… ▽ More Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments. This poses a particular challenge in the search and rescue (SAR) domain, where transcribing conversations among rescue team members is crucial to support real-time decision-making. The scarcity of speech data and associated background noise in SAR scenarios make it difficult to deploy robust speech recognition systems. To address this issue, we have created and made publicly available a German speech dataset called RescueSpeech. This dataset includes real speech recordings from simulated rescue exercises. Additionally, we have released competitive training recipes and pre-trained models. Our study highlights that the performance attained by state-of-the-art methods in this challenging scenario is still far from reaching an acceptable level. △ Less

Submitted 25 September, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2201.05975 [pdf]

IRHA: An Intelligent RSSI based Home automation System

Authors: Samsil Arefin Mozumder, A S M Sharifuzzaman Sagar

Abstract: Human existence is getting more sophisticated and better in many areas due to remarkable advances in the fields of automation. Automated systems are favored over manual ones in the current environment. Home Automation is becoming more popular in this scenario, as people are drawn to the concept of a home environment that can automatically satisfy users' requirements. The key challenges in an intel… ▽ More Human existence is getting more sophisticated and better in many areas due to remarkable advances in the fields of automation. Automated systems are favored over manual ones in the current environment. Home Automation is becoming more popular in this scenario, as people are drawn to the concept of a home environment that can automatically satisfy users' requirements. The key challenges in an intelligent home are intelligent decision making, location-aware service, and compatibility for all users of different ages and physical conditions. Existing solutions address just one or two of these challenges, but smart home automation that is robust, intelligent, location-aware, and predictive is needed to satisfy the user's demand. This paper presents a location-aware intelligent RSSI-based home automation system (IRHA) that uses Wi-Fi signals to detect the user's location and control the appliances automatically. The fingerprinting method is used to map the Wi-Fi signals for different rooms, and the machine learning method, such as Decision Tree, is used to classify the signals for different rooms. The machine learning models are then implemented in the ESP32 microcontroller board to classify the rooms based on the real-time Wi-Fi signal, and then the result is sent to the main control board through the ESP32 MAC communication protocol to control the appliances automatically. The proposed method has achieved 97% accuracy in classifying the users' location. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: This article is submitted to the 2nd International Conference on Ubiquitous Computing and Intelligent Information Systems for possible presentation

Showing 1–5 of 5 results for author: Sagar, S