Skip to main content

Showing 1–3 of 3 results for author: Srinivasan, T

Searching in archive eess. Search in all archives.
.
  1. Multimodal Speech Recognition for Language-Guided Embodied Agents

    Authors: Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason

    Abstract: Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions. While Automatic Speech Recognition (ASR) models can bridge the input gap, erroneous ASR transcripts can hurt the agents' ability to complete tasks. In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructio… ▽ More

    Submitted 9 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: 5 pages, 5 figures

    Journal ref: Proceedings of Interspeech 2023, 1608-1612

  2. arXiv:2002.05639  [pdf, other

    cs.CL cs.MM eess.AS

    Looking Enhances Listening: Recovering Missing Speech Using Images

    Authors: Tejas Srinivasan, Ramon Sanabria, Florian Metze

    Abstract: Speech is understood better by using visual context; for this reason, there have been many attempts to use images to adapt automatic speech recognition (ASR) systems. Current work, however, has shown that visually adapted ASR models only use images as a regularization signal, while completely ignoring their semantic content. In this paper, we present a set of experiments where we show the utility… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020

  3. arXiv:1907.00477  [pdf, other

    cs.CL cs.SD eess.AS

    Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions

    Authors: Tejas Srinivasan, Ramon Sanabria, Florian Metze

    Abstract: Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world. However, it is currently unclear to what extent auxiliary modalities improve performance over unimodal models, and under what circumstances the auxiliary modalities are useful. We examine the utility of the auxiliary visual context in Multimodal Auto… ▽ More

    Submitted 28 December, 2019; v1 submitted 30 June, 2019; originally announced July 2019.

    Comments: Accepted to How2 Workshop, ICML 2019