Skip to main content

Showing 1–15 of 15 results for author: Squartini, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.01688  [pdf, other

    eess.AS cs.CL cs.SD

    One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition

    Authors: Samuele Cornell, Jee-weon Jung, Shinji Watanabe, Stefano Squartini

    Abstract: This paper presents a novel framework for joint speaker diarization (SD) and automatic speech recognition (ASR), named SLIDAR (sliding-window diarization-augmented recognition). SLIDAR can process arbitrary length inputs and can handle any number of speakers, effectively solving ``who spoke what, when'' concurrently. SLIDAR leverages a sliding window approach and consists of an end-to-end diarizat… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  2. arXiv:2307.16809  [pdf, other

    eess.AS

    An enhanced system for the detection and active cancellation of snoring signals

    Authors: Valeria Bruschi, Michela Cantarini, Luca Serafini, Stefano Nobili, Stefania Cecchi, Stefano Squartini

    Abstract: Snoring is a common disorder that affects people's social and marital lives. The annoyance caused by snoring can be partially solved with active noise control systems. In this context, the present work aims at introducing an enhanced system based on the use of a convolutional recurrent neural network for snoring activity detection and a delayless subband approach for active snoring cancellation. T… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  3. arXiv:2307.15611  [pdf, other

    eess.AS

    A Time-Frequency Generative Adversarial based method for Audio Packet Loss Concealment

    Authors: Carlo Aironi, Samuele Cornell, Luca Serafini, Stefano Squartini

    Abstract: Packet loss is a major cause of voice quality degradation in VoIP transmissions with serious impact on intelligibility and user experience. This paper describes a system based on a generative adversarial approach, which aims to repair the lost fragments during the transmission of audio streams. Inspired by the powerful image-to-image translation capability of Generative Adversarial Networks (GANs)… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted at EUSIPCO - 31st European Signal Processing Conference, 2023

  4. arXiv:2306.13734  [pdf, other

    eess.AS cs.CL cs.SD

    The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

    Authors: Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

    Abstract: The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems. We introduce the CHiME-7 distant ASR (DASR) task, within the 7th CHiME challenge. This task comprises joint ASR and diarization in far-field settings with multiple, and possibly heterogeneous, recording devices. Different from previous challenges, we evaluate… ▽ More

    Submitted 14 July, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  5. arXiv:2305.18074  [pdf, other

    eess.AS cs.SD eess.SP

    An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

    Authors: Luca Serafini, Samuele Cornell, Giovanni Morrone, Enrico Zovato, Alessio Brutti, Stefano Squartini

    Abstract: We performed an experimental review of current diarization systems for the conversational telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms belonging to clustering-based, end-to-end neural diarization (EEND), and speech separation guided diarization (SSGD) paradigms. We studied the inference-time computational requirements and diarization accuracy on fou… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 52 pages, 10 figures

  6. End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

    Authors: Giovanni Morrone, Samuele Cornell, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

    Abstract: Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation. It performs diarization by first separating the speakers and then applying voice activity detection (VAD) on each separated stream. In this work we conduct an in-depth study of SSGD in the conversational telephone speech (CTS) domain,… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: 16 pages, 7 figures

    Journal ref: Speech Communication 161 (2024) 103081

  7. arXiv:2205.15700  [pdf, other

    eess.AS

    Conversational Speech Separation: an Evaluation Study for Streaming Applications

    Authors: Giovanni Morrone, Samuele Cornell, Enrico Zovato, Alessio Brutti, Stefano Squartini

    Abstract: Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion. Hereafter we perform an evaluation study on practical design considerations for a CSS system, addressing important aspects which have been neglected in recent works. In particular, we focus on the trade-off between separation performance, co… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: Audio Engineering Society Convention 152, May 2022, The Hague, Netherlands

  8. arXiv:2204.02306  [pdf, other

    eess.AS

    Low-Latency Speech Separation Guided Diarization for Telephone Conversations

    Authors: Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

    Abstract: In this paper, we carry out an analysis on the use of speech separation guided diarization (SSGD) in telephone conversations. SSGD performs diarization by separating the speakers signals and then applying voice activity detection on each estimated speaker signal. In particular, we compare two low-latency speech separation models. Moreover, we show a post-processing algorithm that significantly red… ▽ More

    Submitted 27 October, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted for Presentation at IEEE Spoken Language Technology Workshop (SLT) 2022

  9. arXiv:2111.04614  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Learning Filterbanks for End-to-End Acoustic Beamforming

    Authors: Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini

    Abstract: Recent work on monaural source separation has shown that performance can be increased by using fully learned filterbanks with short windows. On the other hand it is widely known that, for conventional beamforming techniques, performance increases with long analysis windows. This applies also to most hybrid neural beamforming methods which rely on a deep neural network (DNN) to estimate the spatial… ▽ More

    Submitted 19 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: accepted at ICASSP 2022

  10. arXiv:2110.02077  [pdf, other

    eess.AS cs.SD

    Deep Optimization of Parametric IIR Filters for Audio Equalization

    Authors: Giovanni Pepe, Leonardo Gabrielli, Stefano Squartini, Carlo Tripodi, Nicolò Strozzi

    Abstract: This paper describes a novel Deep Learning method for the design of IIR parametric filters for automatic audio equalization. A simple and effective neural architecture, named BiasNet, is proposed to determine the IIR equalizer parameters. An output denormalization technique is used to obtain accurate tuning of the IIR filters center frequency, quality factor and gain. All layers involved in the pr… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: submitted to IEEE/ACM TASLP on 12 May 2021

    MSC Class: 68T07 (Primary) 14C20 (Secondary) ACM Class: I.2.0; F.2.1

  11. arXiv:2104.02819  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Learning to Rank Microphones for Distant Speech Recognition

    Authors: Samuele Cornell, Alessio Brutti, Marco Matassoni, Stefano Squartini

    Abstract: Fully exploiting ad-hoc microphone networks for distant speech recognition is still an open issue. Empirical evidence shows that being able to select the best microphone leads to significant improvements in recognition without any additional effort on front-end processing. Current channel selection techniques either rely on signal, decoder or posterior-based features. Signal-based features are ine… ▽ More

    Submitted 13 April, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

  12. arXiv:1911.02388  [pdf, other

    eess.AS cs.LG cs.SD

    The Speed Submission to DIHARD II: Contributions & Lessons Learned

    Authors: Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras

    Abstract: This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team. Besides describing the system, which considerably outperformed the challenge baselines, we also focus on the lessons learned from numerous approaches that we tried for single and multi-channel systems. We present several components of our diarization syst… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

  13. arXiv:1904.01916  [pdf, other

    cs.SD eess.AS

    End-to-end Binaural Sound Localisation from the Raw Waveform

    Authors: Paolo Vecchiotti, Ning Ma, Stefano Squartini, Guy J. Brown

    Abstract: A novel end-to-end binaural sound localisation approach is proposed which estimates the azimuth of a sound source directly from the waveform. Instead of employing hand-crafted features commonly employed for binaural sound localisation, such as the interaural time and level difference, our end-to-end system approach uses a convolutional neural network (CNN) to extract specific features from the wav… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted by ICASSP 2019

  14. Polyphonic Sound Event Detection by using Capsule Neural Networks

    Authors: Fabio Vesperini, Leonardo Gabrielli, Emanuele Principi, Stefano Squartini

    Abstract: Artificial sound event detection (SED) has the aim to mimic the human ability to perceive and understand what is happening in the surroundings. Nowadays, Deep Learning offers valuable techniques for this goal such as Convolutional Neural Networks (CNNs). The Capsule Neural Network (CapsNet) architecture has been recently introduced in the image processing field with the intent to overcome some of… ▽ More

    Submitted 30 January, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

  15. arXiv:1809.05483  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    A Multi-Stage Algorithm for Acoustic Physical Model Parameters Estimation

    Authors: Leonardo Gabrielli, Stefano Tomassetti, Stefano Squartini, Carlo Zinato, Stefano Guaiana

    Abstract: One of the challenges in computational acoustics is the identification of models that can simulate and predict the physical behavior of a system generating an acoustic signal. Whenever such models are used for commercial applications an additional constraint is the time-to-market, making automation of the sound design process desirable. In previous works, a computational sound design approach has… ▽ More

    Submitted 12 February, 2019; v1 submitted 14 September, 2018; originally announced September 2018.