Skip to main content

Showing 1–9 of 9 results for author: Pasad, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  2. arXiv:2406.08619  [pdf, other

    cs.CL cs.LG eess.AS

    Self-Supervised Speech Representations are More Phonetic than Semantic

    Authors: Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe

    Abstract: Self-supervised speech models (S3Ms) have become an effective backbone for speech applications. Various analyses suggest that S3Ms encode linguistic properties. In this work, we seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms. Specifically, we curate a novel dataset of near homophone (phonetically similar) and synonym (semantically similar) word pairs and… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024. Source code at https://github.com/juice500ml/phonetic_semantic_probing

  3. arXiv:2307.00162  [pdf, other

    cs.CL cs.LG eess.AS

    What Do Self-Supervised Speech Models Know About Words?

    Authors: Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu

    Abstract: Many self-supervised speech models (S3Ms) have been introduced over the last few years, improving performance and data efficiency on various speech tasks. However, these empirical successes alone do not give a complete picture of what is learned during pre-training. Recent work has begun analyzing how S3Ms encode certain properties, such as phonetic and speaker information, but we still lack a pro… ▽ More

    Submitted 31 January, 2024; v1 submitted 30 June, 2023; originally announced July 2023.

    Comments: Pre-MIT Press publication version

  4. arXiv:2212.10525  [pdf, other

    cs.CL eess.AS

    SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

    Authors: Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community, but have not received as much attention as lower-level tasks like speech and speaker recognition. In particular, there are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce suc… ▽ More

    Submitted 15 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: accepted in ACL 2023 (long paper)

  5. arXiv:2211.03929  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Comparative layer-wise analysis of self-supervised speech models

    Authors: Ankita Pasad, Bowen Shi, Karen Livescu

    Abstract: Many self-supervised speech models, varying in their pre-training objective, input modality, and pre-training data, have been proposed in the last few years. Despite impressive successes on downstream tasks, we still have a limited understanding of the properties encoded by the models and the differences across models. In this work, we examine the intermediate representations for a variety of rece… ▽ More

    Submitted 16 March, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023. Code: https://github.com/ankitapasad/layerwise-analysis

  6. arXiv:2112.07648  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    On the Use of External Data for Spoken Named Entity Recognition

    Authors: Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

    Abstract: Spoken language understanding (SLU) tasks involve map** from speech audio signals to semantic labels. Given the complexity of such tasks, good performance might be expected to require large labeled datasets, which are difficult to collect for each new task and domain. However, recent advances in self-supervised speech representations have made it feasible to consider learning SLU models with lim… ▽ More

    Submitted 8 July, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: Accepted at NAACL 2022. Codebase available at https://github.com/asappresearch/spoken-ner

  7. arXiv:2111.10367  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

    Authors: Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han

    Abstract: Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in higher-level spoken language understanding tasks, including using end-to-end models, but there are fewer annotated datasets for such tasks. At the same time, rece… ▽ More

    Submitted 29 July, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

    Comments: Updated preprint for SLUE Benchmark v0.2; Toolkit link https://github.com/asappresearch/slue-toolkit

  8. arXiv:2107.04734  [pdf, other

    cs.CL cs.LG eess.AS

    Layer-wise Analysis of a Self-supervised Speech Representation Model

    Authors: Ankita Pasad, Ju-Chieh Chou, Karen Livescu

    Abstract: Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Develo** such insights can help understand the capabilities and limits of t… ▽ More

    Submitted 3 December, 2022; v1 submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted to ASRU 2021. Code: https://github.com/ankitapasad/layerwise-analysis

  9. arXiv:1904.10947  [pdf, other

    cs.CL cs.SD eess.AS

    On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval

    Authors: Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu

    Abstract: Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision. In real-world low-resource settings, however, we often have access to some transcribed speech. We study whether and how visual grounding is useful in the presence of varying amounts of textual supervision. In particular, we consider the task… ▽ More

    Submitted 30 August, 2019; v1 submitted 24 April, 2019; originally announced April 2019.