Skip to main content

Showing 1–3 of 3 results for author: van Hout, J

.
  1. arXiv:2007.15916  [pdf

    cs.CL cs.CV

    Evaluating Automatically Generated Phoneme Captions for Images

    Authors: Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

    Abstract: Image2Speech is the relatively new task of generating a spoken description of an image. This paper presents an investigation into the evaluation of this task. For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences. This system outperformed the original Image2Speech system on the Flickr8k corpus. Subsequently, these phoneme captions wer… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

    Comments: Accepted at Interspeech2020

  2. arXiv:1902.10828  [pdf, ps, other

    eess.AS cs.SD

    The VOiCES from a Distance Challenge 2019 Evaluation Plan

    Authors: Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Colleen Richey, Aaron Lawson, Maria Alejandra Barrios

    Abstract: The "VOiCES from a Distance Challenge 2019" is designed to foster research in the area of speaker recognition and automatic speech recognition (ASR) with the special focus on single channel distant/far-field audio, under noisy conditions. The main objectives of this challenge are to: (i) benchmark state-of-the-art technology in the area of speaker recognition and automatic speech recognition (ASR)… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

    Comments: Special Session for Interspeech 2019

  3. arXiv:1804.05053  [pdf, other

    cs.SD eess.AS

    Voices Obscured in Complex Environmental Settings (VOICES) corpus

    Authors: Colleen Richey, Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeff Hetherly, Cory Stephenson, Karl Ni

    Abstract: This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical appro… ▽ More

    Submitted 15 May, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

    Comments: Submitted to Interspeech 2018