Skip to main content

Showing 1–4 of 4 results for author: Meghanani, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09153  [pdf, other

    cs.CL cs.SD eess.AS

    LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks

    Authors: Amit Meghanani, Thomas Hain

    Abstract: Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is challenging and computationally expensive. Recent attempts have been made to address this issue with cost-effective self-supervised fine-tuning (SSFT) approaches. C… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2403.08738  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations

    Authors: Amit Meghanani, Thomas Hain

    Abstract: Acoustic word embeddings (AWEs) are vector representations of spoken words. An effective method for obtaining AWEs is the Correspondence Auto-Encoder (CAE). In the past, the CAE method has been associated with traditional MFCC features. Representations obtained from self-supervised learning (SSL)-based speech models such as HuBERT, Wav2vec2, etc., are outperforming MFCC in many downstream tasks. H… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to EACL 2024 Main Conference, Long paper

  3. arXiv:2403.06260  [pdf, other

    cs.CL cs.SD eess.AS

    SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations

    Authors: Amit Meghanani, Thomas Hain

    Abstract: There is a growing interest in cost-effective self-supervised fine-tuning (SSFT) of self-supervised learning (SSL)-based speech models to obtain task-specific representations. These task-specific representations are used for robust performance on various downstream tasks by fine-tuning on the labelled data. This work presents a cost-effective SSFT method named Self-supervised Correspondence (SCORE… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted at ICASSP 2024

  4. arXiv:1812.02447  [pdf, other

    eess.AS cs.SD

    Pitch-synchronous DCT features: A pilot study on speaker identification

    Authors: Amit Meghanani, A G Ramakrishnan

    Abstract: We propose a new feature, namely, pitchsynchronous discrete cosine transform (PS-DCT), for the task of speaker identification. These features are obtained directly from the voiced segments of the speech signal, without any preemphasis or windowing. The feature vectors are vector quantized, to create one separate codebook for each speaker during training. The performance of the PS-DCT features is s… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.