Skip to main content

Showing 1–11 of 11 results for author: Sarfjoo, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2212.08489  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

    Authors: Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

    Abstract: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable perfo… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted in ICASSP 2023

    ACM Class: I.2.7

    Journal ref: ICASSP 2023

  2. arXiv:2212.07164  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator

    Authors: Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Karel Vesely

    Abstract: This paper describes a simple yet efficient repetition-based modular system for speeding up air-traffic controllers (ATCos) training. E.g., a human pilot is still required in EUROCONTROL's ESCAPE lite simulator (see https://www.eurocontrol.int/simulator/escape) during ATCo training. However, this need can be substituted by an automatic system that could act as a pilot. In this paper, we aim to dev… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Presented at Sesar Innovation Days 2022. https://www.sesarju.eu/sesarinnovationdays

  3. arXiv:2211.04054  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

    Authors: Juan Zuluaga-Gomez, Karel Veselý, Igor Szöke, Alexander Blatt, Petr Motlicek, Martin Kocour, Mickael Rigault, Khalid Choukri, Amrutha Prasad, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Claudia Cevenini, Pavel Kolčárek, Allan Tart, Jan Černocký, Dietrich Klakow

    Abstract: Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-h… ▽ More

    Submitted 15 June, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: Manuscript under review; The code is available at: https://github.com/idiap/atco2-corpus

  4. arXiv:2203.16822  [pdf, other

    eess.AS cs.CL cs.LG

    How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

    Authors: Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Saeed Sarfjoo, Petr Motlicek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser, Qingran Zhan

    Abstract: Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data properties substantially differ between the pre-training and fine-tuning phases, termed d… ▽ More

    Submitted 17 October, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: To be published in the 2022 IEEE Spoken Language Technology Workshop (SLT) (SLT 2022)

  5. arXiv:2202.03725  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    A two-step approach to leverage contextual data: speech recognition in air-traffic communications

    Authors: Iuliia Nigmatulina, Juan Zuluaga-Gomez, Amrutha Prasad, Seyyed Saeed Sarfjoo, Petr Motlicek

    Abstract: Automatic Speech Recognition (ASR), as the assistance of speech communication between pilots and air-traffic controllers, can significantly reduce the complexity of the task and increase the reliability of transmitted information. ASR application can lead to a lower number of incidents caused by misunderstanding and improve air traffic management (ATM) efficiency. Evidently, high accuracy predicti… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. arXiv admin note: text overlap with arXiv:2108.12156

    Journal ref: ICASSP 2022

  6. arXiv:2110.05781  [pdf, other

    eess.AS cs.CL cs.LG

    BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications

    Authors: Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek, Karel Ondrej, Oliver Ohneiser, Hartmut Helmke

    Abstract: Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities, e.g., aircraft callsigns. One common challenge is speech activity detection (SAD) and speaker diarization (SD). In the failure condition, two or more segments remain in the same recording, jeopardizin… ▽ More

    Submitted 14 October, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: To be published in the 2022 IEEE Spoken Language Technology Workshop (SLT) (SLT 2022)

  7. arXiv:2108.12175  [pdf, other

    cs.CL cs.LG eess.AS

    Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

    Authors: Amrutha Prasad, Juan Zuluaga-Gomez, Petr Motlicek, Saeed Sarfjoo, Iuliia Nigmatulina, Oliver Ohneiser, Hartmut Helmke

    Abstract: Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data into one set. This is motivated by the fact that pilot's voice communications are more scarce than ATCOs. Due to this data imbalance and other reasons (e.g., varying acoustic conditions), the speech from ATCOs is usually recognized more accurately than from pilots… ▽ More

    Submitted 14 December, 2022; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: Presented at Sesar Innovation Days - 2022. See https://www.sesarju.eu/sesarinnovationdays

  8. arXiv:2010.12277  [pdf, other

    cs.SD eess.AS

    Speech Activity Detection Based on Multilingual Speech Recognition System

    Authors: Seyyed Saeed Sarfjoo, Srikanth Madikeri, Petr Motlicek

    Abstract: To better model the contextual information and increase the generalization ability of Speech Activity Detection (SAD) system, this paper leverages a multi-lingual Automatic Speech Recognition (ASR) system to perform SAD. Sequence discriminative training of Acoustic Model (AM) using Lattice-Free Maximum Mutual Information (LF-MMI) loss function, effectively extracts the contextual information of th… ▽ More

    Submitted 11 April, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Submitted to Interspeech 2021

  9. arXiv:2006.02093  [pdf, other

    cs.SI cs.SD eess.AS

    Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data

    Authors: Mael Fabien, Seyyed Saeed Sarfjoo, Petr Motlicek, Srikanth Madikeri

    Abstract: Criminal investigations mostly rely on the collection of speech conversational data in order to identify speakers and build or enrich an existing criminal network. Social network analysis tools are then applied to identify the most central characters and the different communities within the network. We introduce two candidate datasets for criminal conversational data, Crime Scene Investigation (CS… ▽ More

    Submitted 21 September, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

  10. arXiv:1911.03952  [pdf, other

    cs.SD eess.AS

    Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

    Authors: Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

    Abstract: Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones. The objective of this research was to study the automatic generation of high-quality speech from such low-quality device-recorded speech, which could then be applied to many speech-generation tasks. In this paper, we first introduce our new devi… ▽ More

    Submitted 20 November, 2019; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: This study was conducted during an internship of the first author at NII, Japan in 2017

  11. arXiv:1608.02272  [pdf, other

    cs.SD cs.CL

    Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems

    Authors: Ali Khodabakhsh, Seyyed Saeed Sarfjoo, Umut Uludag, Osman Soyyigit, Cenk Demiroglu

    Abstract: In recent years identity-vector (i-vector) based speaker verification (SV) systems have become very successful. Nevertheless, environmental noise and speech duration variability still have a significant effect on degrading the performance of these systems. In many real-life applications, duration of recordings are very short; as a result, extracted i-vectors cannot reliably represent the attribute… ▽ More

    Submitted 7 August, 2016; originally announced August 2016.