Skip to main content

Showing 1–12 of 12 results for author: Trmal, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2110.12561  [pdf, other

    cs.SD eess.AS

    Lhotse: a speech data representation library for the modern deep learning ecosystem

    Authors: Piotr Żelasko, Daniel Povey, Jan "Yenda" Trmal, Sanjeev Khudanpur

    Abstract: Speech data is notoriously difficult to work with due to a variety of codecs, lengths of recordings, and meta-data formats. We present Lhotse, a speech data representation library that draws upon lessons learned from Kaldi speech recognition toolkit and brings its concepts into the modern deep learning ecosystem. Lhotse provides a common JSON description format with corresponding Python classes an… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: Accepted for presentation at NeurIPS 2021 Data-Centric AI (DCAI) Workshop

  2. arXiv:2106.06909  [pdf, other

    cs.SD cs.CL eess.AS

    GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

    Authors: Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie **, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

    Abstract: This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous sp… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

  3. arXiv:2103.17122  [pdf, ps, other

    eess.AS cs.CR cs.SD

    Adversarial Attacks and Defenses for Speech Recognition Systems

    Authors: Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

    Abstract: The ubiquitous presence of machine learning systems in our lives necessitates research into their vulnerabilities and appropriate countermeasures. In particular, we investigate the effectiveness of adversarial attacks and defenses against automatic speech recognition (ASR) systems. We select two ASR models - a thoroughly studied DeepSpeech model and a more recent Espresso framework Transformer enc… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  4. arXiv:2004.09249  [pdf, other

    cs.SD cs.CL eess.AS

    CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

    Authors: Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

    Abstract: Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous C… ▽ More

    Submitted 2 May, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  5. arXiv:2001.09239  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Multi-task self-supervised learning for Robust Speech Recognition

    Authors: Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio

    Abstract: Despite the growing interest in unsupervised learning, extracting meaningful knowledge from unlabelled audio remains an open challenge. To take a step in this direction, we recently proposed a problem-agnostic speech encoder (PASE), that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require ma… ▽ More

    Submitted 17 April, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    Comments: In Proc. of ICASSP 2020

  6. arXiv:1910.12299  [pdf, other

    cs.CL cs.SD eess.AS

    Induced Inflection-Set Keyword Search in Speech

    Authors: Oliver Adams, Matthew Wiesner, Jan Trmal, Garrett Nicolai, David Yarowsky

    Abstract: We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline and the value of using curated inflectional paradigms… ▽ More

    Submitted 21 May, 2020; v1 submitted 27 October, 2019; originally announced October 2019.

    Comments: To appear in SIGMORPHON 2020

  7. arXiv:1909.13447  [pdf

    eess.AS cs.CL cs.SD

    DiPCo -- Dinner Party Corpus

    Authors: Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas

    Abstract: We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices position… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

  8. arXiv:1807.06204  [pdf, other

    cs.CL

    Low-Resource Contextual Topic Identification on Speech

    Authors: Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

    Abstract: In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-langua… ▽ More

    Submitted 28 September, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

    Comments: Accepted for publication at 2018 IEEE Workshop on Spoken Language Technology (SLT)

  9. arXiv:1803.10609  [pdf, ps, other

    cs.SD cs.AI eess.AS

    The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

    Authors: Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal

    Abstract: The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning. This paper introduces the 5th CHiME Challenge, which considers the task of distant multi-microphone conversational ASR in real home environments. Speech material was elicited using a dinne… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

  10. arXiv:1802.08731  [pdf, other

    cs.CL

    Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages

    Authors: Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur

    Abstract: Automatic speech recognition (ASR) systems often need to be developed for extremely low-resource languages to serve end-uses such as audio content categorization and search. While universal phone recognition is natural to consider when no transcribed speech is available to train an ASR system in a language, adapting universal phone models using very small amounts (minutes rather than hours) of tra… ▽ More

    Submitted 18 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Accepted for publication at Interspeech 2018

  11. arXiv:1706.00321  [pdf, other

    cs.CL

    Using of heterogeneous corpora for training of an ASR system

    Authors: Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee

    Abstract: The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on "Speech-to-text-translation for low-resource languages". The Pashto language was chosen as a good "proxy" low-resource language, exhibiting multiple phenomena which make the speech-recognition and and speech-to… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

  12. arXiv:1703.07476  [pdf, other

    cs.CL

    Topic Identification for Speech without ASR

    Authors: Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur

    Abstract: Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs. However, under resource-limited conditions, the manually transcribed speech required to develop standard ASR systems can be severely limited or unavailable. In this paper, we investigate alternative unsupervised s… ▽ More

    Submitted 11 July, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: 5 pages, 2 figures; accepted for publication at Interspeech 2017