Skip to main content

Showing 1–11 of 11 results for author: Manocha, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.09388  [pdf, other

    eess.AS cs.LG cs.SD

    CORN: Co-Trained Full- And No-Reference Speech Quality Assessment

    Authors: Pranay Manocha, Donald Williamson, Adam Finkelstein

    Abstract: Perceptual evaluation constitutes a crucial aspect of various audio-processing tasks. Full reference (FR) or similarity-based metrics rely on high-quality reference recordings, to which lower-quality or corrupted versions of the recording may be compared for evaluation. In contrast, no-reference (NR) metrics evaluate a recording without relying on a reference. Both the FR and NR approaches exhibit… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  2. arXiv:2304.01448  [pdf, other

    eess.AS

    TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

    Authors: Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

    Abstract: Measuring quality and intelligibility of a speech signal is usually a critical step in development of speech processing systems. To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed. Through this paper, we introduce tools and a set of models to estimate such known metrics using deep neural networks. These models are made availa… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICASSP 2023

  3. arXiv:2206.13411  [pdf, other

    eess.AS cs.SD

    Audio Similarity is Unreliable as a Proxy for Audio Quality

    Authors: Pranay Manocha, Zeyu **, Adam Finkelstein

    Abstract: Many audio processing tasks require perceptual assessment. However, the time and expense of obtaining ``gold standard'' human judgments limit the availability of such data. Most applications incorporate full reference or other similarity-based metrics (e.g. PESQ) that depend on a clean reference. Researchers have relied on such metrics to evaluate and compare various proposed methods, often conclu… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  4. arXiv:2206.12297  [pdf, other

    eess.AS cs.SD

    SAQAM: Spatial Audio Quality Assessment Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Audio quality assessment is critical for assessing the perceptual realism of sounds. However, the time and expense of obtaining ''gold standard'' human judgments limit the availability of such data. For AR&VR, good perceived sound quality and localizability of sources are among the key elements to ensure complete immersion of the user. Our work introduces SAQAM which uses a multi-task learning fra… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  5. arXiv:2206.12285  [pdf, other

    eess.AS cs.SD

    Speech Quality Assessment through MOS using Non-Matching References

    Authors: Pranay Manocha, Anurag Kumar

    Abstract: Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals. However, several recent attempts to automatically estimate MOS using deep learning approaches lack robustness and generalization capabilities, limiting their use in real-world applications. In this work, we present a novel framework, NORESQA-MOS, for estimating the MOS of a… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  6. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu **, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  7. arXiv:2109.08125  [pdf, other

    eess.AS cs.SD

    NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

    Authors: Pranay Manocha, Buye Xu, Anurag Kumar

    Abstract: The perceptual task of speech quality assessment (SQA) is a challenging task for machines to do. Objective SQA methods that rely on the availability of the corresponding clean reference have been the primary go-to approaches for SQA. Clearly, these methods fail in real-world scenarios where the ground truth clean references are not available. In recent years, non-intrusive methods that train neura… ▽ More

    Submitted 18 October, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

  8. DPLM: A Deep Perceptual Spatial-Audio Localization Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general pur… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  9. arXiv:2102.05109  [pdf, other

    eess.AS cs.LG cs.SD

    CDPAM: Contrastive learning for perceptual audio similarity

    Authors: Pranay Manocha, Zeyu **, Richard Zhang, Adam Finkelstein

    Abstract: Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbatio… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Dataset, code and sound examples can be found at https://github.com/pranaymanocha/PerceptualAudio/tree/master/cdpam

  10. arXiv:2001.04460  [pdf, other

    eess.AS cs.SD

    A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

    Authors: Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu **

    Abstract: Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a… ▽ More

    Submitted 18 May, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: Dataset, code and sound examples can be found at https://pixl.cs.princeton.edu/pubs/Manocha_2020_ADP/

  11. arXiv:1710.10974  [pdf, other

    cs.SD cs.IR eess.AS

    Content-based Representations of audio using Siamese neural networks

    Authors: Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj

    Abstract: In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of query by example of audio, which aims to retrieve media samples from a database, which are similar to the user-provided example. We propose a novel approach which encodes the audio into… ▽ More

    Submitted 15 February, 2018; v1 submitted 30 October, 2017; originally announced October 2017.