Skip to main content

Showing 1–9 of 9 results for author: Sunder, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.11486  [pdf, other

    eess.AS cs.AI cs.LG

    End-to-End real time tracking of children's reading with pointer network

    Authors: Vishal Sunder, Beulah Karrolla, Eric Fosler-Lussier

    Abstract: In this work, we explore how a real time reading tracker can be built efficiently for children's voices. While previously proposed reading trackers focused on ASR-based cascaded approaches, we propose a fully end-to-end model making it less prone to lags in voice tracking. We employ a pointer network that directly learns to predict positions in the ground truth text conditioned on the streaming sp… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 5 pages, 3 figures

  2. arXiv:2204.05188  [pdf, other

    cs.CL cs.SD eess.AS

    Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

    Authors: Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang J. Kuo, Brian Kingsbury

    Abstract: Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to effective pretraining of speech representations. One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks. This work is a step towards doing the same in a much more efficient and fine-grained manner whe… ▽ More

    Submitted 1 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 2 figures

  3. arXiv:2204.05183  [pdf, other

    cs.CL cs.SD eess.AS

    Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data

    Authors: Vishal Sunder, Prashant Serai, Eric Fosler-Lussier

    Abstract: A Virtual Patient (VP) is a powerful tool for training medical students to take patient histories, where responding to a diverse set of spoken questions is essential to simulate natural conversations with a student. The performance of such a Spoken Language Understanding system (SLU) can be adversely affected by both the presence of Automatic Speech Recognition (ASR) errors in the test data and a… ▽ More

    Submitted 1 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 3 figures

  4. arXiv:2204.05169  [pdf, other

    cs.CL cs.AI

    Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding

    Authors: Vishal Sunder, Samuel Thomas, Hong-Kwang J. Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier

    Abstract: Dialog history plays an important role in spoken language understanding (SLU) performance in a dialog system. For end-to-end (E2E) SLU, previous work has used dialog history in text form, which makes the model dependent on a cascaded automatic speech recognizer (ASR). This rescinds the benefits of an E2E system which is intended to be compact and robust to ASR errors. In this paper, we propose a h… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: 5 pages, 1 figure

  5. arXiv:2103.12258  [pdf, other

    cs.CL cs.LG

    Hallucination of speech recognition errors with sequence to sequence learning

    Authors: Prashant Serai, Vishal Sunder, Eric Fosler-Lussier

    Abstract: Automatic Speech Recognition (ASR) is an imperfect process that results in certain mismatches in ASR output text when compared to plain written text or transcriptions. When plain text data is to be used to train systems for spoken language understanding or ASR, a proven strategy to reduce said mismatch and prevent degradations, is to hallucinate what the ASR outputs would be given a gold transcrip… ▽ More

    Submitted 31 March, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Submitted to IEEE/ACM Transactions on Audio Speech and Language Processing

  6. arXiv:2010.15090  [pdf, other

    cs.CL cs.LG

    Handling Class Imbalance in Low-Resource Dialogue Systems by Combining Few-Shot Classification and Interpolation

    Authors: Vishal Sunder, Eric Fosler-Lussier

    Abstract: Utterance classification performance in low-resource dialogue systems is constrained by an inevitably high degree of data imbalance in class labels. We present a new end-to-end pairwise learning framework that is designed specifically to tackle this phenomenon by inducing a few-shot classification capability in the utterance representations and augmenting data through an interpolation of utterance… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 4 figures, 3 tables

  7. arXiv:1906.02427  [pdf, other

    cs.AI cs.LG cs.LO

    One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

    Authors: Vishal Sunder, Ashwin Srinivasan, Lovekesh Vig, Gautam Shroff, Rohit Rahul

    Abstract: Our interest in this paper is in meeting a rapidly growing industrial demand for information extraction from images of documents such as invoices, bills, receipts etc. In practice users are able to provide a very small number of example images labeled with the information that needs to be extracted. We adopt a novel two-level neuro-deductive, approach where (a) we use pre-trained deep neural netwo… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

    Comments: 11 pages, appears in the 13th International Workshop on Neural-Symbolic Learning and Reasoning at IJCAI 2019

  8. arXiv:1809.07066  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Prosocial or Selfish? Agents with different behaviors for Contract Negotiation using Reinforcement Learning

    Authors: Vishal Sunder, Lovekesh Vig, Arnab Chatterjee, Gautam Shroff

    Abstract: We present an effective technique for training deep learning agents capable of negotiating on a set of clauses in a contract agreement using a simple communication protocol. We use Multi Agent Reinforcement Learning to train both agents simultaneously as they negotiate with each other in the training environment. We also model selfish and prosocial behavior to varying degrees in these agents. Empi… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: Proceedings of the 11th International Workshop on Automated Negotiations (held in conjunction with IJCAI 2018)

  9. arXiv:1804.01000  [pdf, other

    cs.CL cs.AI

    CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles

    Authors: Karamjit Singh, Vishal Sunder

    Abstract: We present an approach where two different models (Deep and Shallow) are trained separately on the data and a weighted average of the outputs is taken as the final result. For the Deep approach, we use different combinations of models like Convolution Neural Network, pretrained word2vec embeddings and LSTMs to get representations which are then used to train a Deep Neural Network. For Clarity pred… ▽ More

    Submitted 1 April, 2018; originally announced April 2018.