Skip to main content

Showing 1–46 of 46 results for author: Cutler, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.06324  [pdf, other

    cs.NI cs.MM

    ACM MMSys 2024 Bandwidth Estimation in Real Time Communications Challenge

    Authors: Sami Khairy, Gabriel Mittag, Vishak Gopal, Francis Y. Yan, Zhixiong Niu, Ezra Ameri, Scott Inglis, Mehrsa Golestaneh, Ross Cutler

    Abstract: The quality of experience (QoE) delivered by video conferencing systems to end users depends in part on correctly estimating the capacity of the bottleneck link between the sender and the receiver over time. Bandwidth estimation for real-time communications (RTC) remains a significant challenge, primarily due to the continuously evolving heterogeneous network architectures and technologies. From t… ▽ More

    Submitted 15 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  2. arXiv:2402.16927  [pdf, ps, other

    cs.SD eess.AS

    The ICASSP 2024 Audio Deep Packet Loss Concealment Challenge

    Authors: Lorenz Diener, Solomiya Branets, Ando Saabas, Ross Cutler

    Abstract: Audio packet loss concealment is the hiding of gaps in VoIP audio streams caused by network packet loss. With the ICASSP 2024 Audio Deep Packet Loss Concealment Grand Challenge, we build on the success of the previous Audio PLC Challenge held at INTERSPEECH 2022. We evaluate models on an overall harder dataset, and use the new ITU-T P.804 evaluation procedure to more closely evaluate the performan… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  3. arXiv:2401.14444  [pdf, other

    cs.SD cs.AI cs.CV eess.AS

    ICASSP 2024 Speech Signal Improvement Challenge

    Authors: Nicolae Catalin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets

    Abstract: The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. This marks our second challenge, building upon the success from the previous ICASSP 2023 Grand Challenge. We enhance the competition by introducing a dataset synthesizer, enabling all participating teams to start at a higher baseli… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  4. arXiv:2309.13481  [pdf, other

    cs.NI cs.LG

    Real-time Bandwidth Estimation from Offline Expert Demonstrations

    Authors: Aashish Gottipati, Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler

    Abstract: In this work, we tackle the problem of bandwidth estimation (BWE) for real-time communication systems; however, in contrast to previous works, we leverage the vast efforts of prior heuristic-based BWE methods and synergize these approaches with deep learning-based techniques. Our work addresses challenges in generalizing to unseen network dynamics and extracting rich representations from prior exp… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  5. arXiv:2309.12553  [pdf, other

    eess.AS cs.SD

    ICASSP 2023 Acoustic Echo Cancellation Challenge

    Authors: Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Evgenii Indenbom, Nicolae-Catalin Ristea, Jegor Gužvin, Hannes Gamper, Sebastian Braun, Robert Aichner

    Abstract: The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which is an important area of speech enhancement and is still a top issue in audio communication. This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2202.13290, arXiv:2009.04972

  6. arXiv:2309.08295  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

    Authors: Ilya Gurvich, Ido Leichter, Dharmendar Reddy Palle, Yossi Asher, Alon Vinnikov, Igor Abramovski, Vishak Gopal, Ross Cutler, Eyal Krupka

    Abstract: We introduce a distinctive real-time, causal, neural network-based active speaker detection system optimized for low-power edge computing. This system drives a virtual cinematography module and is deployed on a commercial device. The system uses data originating from a microphone array and a 360-degree camera. Our network requires only 127 MFLOPs per participant, for a meeting with 14 participants… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  7. arXiv:2309.07385  [pdf, other

    eess.AS cs.SD

    Multi-dimensional Speech Quality Assessment in Crowdsourcing

    Authors: Babak Naderi, Ross Cutler, Nicolae-Catalin Ristea

    Abstract: Subjective speech quality assessment is the gold standard for evaluating speech enhancement processing and telecommunication systems. The commonly used standard ITU-T Rec. P.800 defines how to measure speech quality in lab environments, and ITU-T Rec.~P.808 extended it for crowdsourcing. ITU-T Rec. P.835 extends P.800 to measure the quality of speech in the presence of noise. ITU-T Rec. P.804 targ… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.06566

  8. arXiv:2309.07376  [pdf, other

    eess.IV cs.MM

    VCD: A Video Conferencing Dataset for Video Compression

    Authors: Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi, Henrik Turbell, Albert Sadovnikov, Quan Zhou

    Abstract: Commonly used datasets for evaluating video codecs are all very high quality and not representative of video typically used in video conferencing scenarios. We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing. VCD includes a wide variety of camera qualities and spatial and temporal information… ▽ More

    Submitted 13 November, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

  9. arXiv:2309.00769  [pdf, other

    eess.IV cs.CV

    Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs

    Authors: Abrar Majeedi, Babak Naderi, Yasaman Hosseinkashi, Juhee Cho, Ruben Alvarez Martinez, Ross Cutler

    Abstract: Machine learning-based video codecs have made significant progress in the past few years. A critical area in the development of ML-based video codecs is an accurate evaluation metric that does not require an expensive and slow subjective test. We show that existing evaluation metrics that were designed and trained on DSP-based video codecs are not highly correlated to subjective opinion when used… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  10. arXiv:2306.03177  [pdf, other

    cs.SD cs.CV eess.AS

    DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation

    Authors: Evgenii Indenbom, Nicolae-Catalin Ristea, Ando Saabas, Tanel Parnamaa, Jegor Guzvin, Ross Cutler

    Abstract: Acoustic echo cancellation (AEC), noise suppression (NS) and dereverberation (DR) are an integral part of modern full-duplex communication systems. As the demand for teleconferencing systems increases, addressing these tasks is required for an effective and efficient online meeting experience. Most prior research proposes solutions for these tasks separately, combining them with digital signal pro… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  11. arXiv:2305.15127  [pdf, other

    cs.SD eess.AS

    PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms

    Authors: Lorenz Diener, Marju Purin, Sten Sootla, Ando Saabas, Robert Aichner, Ross Cutler

    Abstract: Speech quality assessment is a problem for every researcher working on models that produce or process speech. Human subjective ratings, the gold standard in speech quality assessment, are expensive and time-consuming to acquire in a quantity that is sufficient to get reliable data, while automated objective metrics show a low correlation with gold standard ratings. This paper presents PLCMOS, a no… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: to appear: INTERSPEECH 2023, associated model release: https://aka.ms/PLCMOS

  12. arXiv:2304.00652  [pdf, other

    cs.HC

    Meeting effectiveness and inclusiveness: large-scale measurement, identification of key features, and prediction in real-world remote meetings

    Authors: Yasaman Hosseinkashi, Lev Tankelevitch, Jamie Pool, Ross Cutler, Chinmaya Madan

    Abstract: Workplace meetings are vital to organizational collaboration, yet relatively little progress has been made toward measuring meeting effectiveness and inclusiveness at scale. The recent rise in remote and hybrid meetings represents an opportunity to do so via computer-mediated communication (CMC) systems. Here, we share the results of an effective and inclusive meetings survey embedded within a CMC… ▽ More

    Submitted 29 January, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

  13. arXiv:2303.12761  [pdf, other

    eess.IV cs.LG

    LSTM-based Video Quality Prediction Accounting for Temporal Distortions in Videoconferencing Calls

    Authors: Gabriel Mittag, Babak Naderi, Vishak Gopal, Ross Cutler

    Abstract: Current state-of-the-art video quality models, such as VMAF, give excellent prediction results by comparing the degraded video with its reference video. However, they do not consider temporal distortions (e.g., frame freezes or skips) that occur during videoconferencing calls. In this paper, we present a data-driven approach for modeling such distortions automatically by training an LSTM with subj… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted at ICASSP 2023

  14. arXiv:2303.11510  [pdf, other

    cs.SD eess.AS

    ICASSP 2023 Deep Noise Suppression Challenge

    Authors: Harishchandra Dubey, Ashkan Aazami, Vishak Gopal, Babak Naderi, Sebastian Braun, Ross Cutler, Alex Ju, Mehdi Zohourian, Min Tang, Hannes Gamper, Mehrsa Golestaneh, Robert Aichner

    Abstract: Deep Speech Enhancement Challenge is the 5th edition of deep noise suppression (DNS) challenges organized at ICASSP 2023 Signal Processing Grand Challenges. DNS challenges were organized during 2019-2023 to stimulate research in deep speech enhancement (DSE). Previous DNS challenges were organized at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. From prior editions, we learnt t… ▽ More

    Submitted 8 May, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 6 pages, 1 figure. arXiv admin note: text overlap with arXiv:2202.13288

  15. arXiv:2210.13334  [pdf, other

    cs.CL cs.AI cs.LG

    Real-time Speech Interruption Analysis: From Cloud to Client Deployment

    Authors: Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler

    Abstract: Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic. One major issue with remote meetings is that it is challenging for remote participants to interrupt and speak. We have recently developed the first speech interruption analysis model, which detects failed speech interruptions,… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  16. arXiv:2204.05222  [pdf, other

    cs.SD eess.AS

    INTERSPEECH 2022 Audio Deep Packet Loss Concealment Challenge

    Authors: Lorenz Diener, Sten Sootla, Solomiya Branets, Ando Saabas, Robert Aichner, Ross Cutler

    Abstract: Audio Packet Loss Concealment (PLC) is the hiding of gaps in audio streams caused by data transmission failures in packet switched networks. This is a common problem, and of increasing importance as end-to-end VoIP telephony and teleconference systems become the default and ever more widely used form of communication in business as well as in personal usage. This paper presents the INTERSPEECH 202… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: 4 pages + 1 page references, 1 figure, 2 tables. Submitted to INTERSPEECH 2022

  17. arXiv:2203.16032  [pdf, other

    cs.SD eess.AS

    ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

    Authors: Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang

    Abstract: With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laborato… ▽ More

    Submitted 31 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  18. arXiv:2202.13290  [pdf, other

    eess.AS cs.SD

    ICASSP 2022 Acoustic Echo Cancellation Challenge

    Authors: Ross Cutler, Ando Saabas, Tanel Parnamaa, Marju Purin, Hannes Gamper, Sebastian Braun, Karsten Sørensen, Robert Aichner

    Abstract: The ICASSP 2022 Acoustic Echo Cancellation Challenge is intended to stimulate research in acoustic echo cancellation (AEC), which is an important area of speech enhancement and still a top issue in audio communication. This is the third AEC challenge and it is enhanced by including mobile scenarios, adding speech recognition rate in the challenge goal metrics, and making the default sample rate 48… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.04972

  19. arXiv:2202.13288  [pdf, other

    eess.AS cs.SD

    ICASSP 2022 Deep Noise Suppression Challenge

    Authors: Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner

    Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. This is the 4th DNS challenge, with the previous editions held at INTERSPEECH 2020, ICASSP 2021, and INTERSPEECH 2021. We open-source datasets and test sets for researchers to train their deep noise suppression models, as well as a subjective e… ▽ More

    Submitted 26 February, 2022; originally announced February 2022.

  20. arXiv:2110.04391  [pdf, other

    eess.AS cs.CR cs.SD

    Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement

    Authors: Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan Reddy, Jayant Gupchup, Ross Cutler

    Abstract: Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This `ears-off' situation motivates augmenting existing datasets in a privacy-preserving manne… ▽ More

    Submitted 3 April, 2023; v1 submitted 8 October, 2021; originally announced October 2021.

  21. arXiv:2110.04378  [pdf, other

    eess.AS cs.LG cs.SD

    Performance optimizations on deep noise suppression models

    Authors: Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler

    Abstract: We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model. While deep learning approaches have been remarkably successful in enhancing audio quality, their increased complexity inhibits their deployment in real-time applications. We achieve up to a 7.25X inference speedup over the baseline, with a smooth model… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  22. arXiv:2110.04331  [pdf, ps, other

    eess.AS cs.SD

    MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

    Authors: Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner

    Abstract: With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo. Accurate real-time detection of music events can help to improve the user experience. In this paper, we present MusicNet, a compact neural model for detecting background music in the real-time communications pipeline. In video meetings, music frequently co-oc… ▽ More

    Submitted 15 April, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  23. arXiv:2110.03010  [pdf, other

    eess.AS cs.SD

    AECMOS: A speech quality assessment metric for echo impairment

    Authors: Marju Purin, Sten Sootla, Mateja Sponza, Ando Saabas, Ross Cutler

    Abstract: Traditionally, the quality of acoustic echo cancellers is evaluated using intrusive speech quality assessment measures such as ERLE \cite{g168} and PESQ \cite{p862}, or by carrying out subjective laboratory tests. Unfortunately, the former are not well correlated with human subjective measures, while the latter are time and resource consuming to carry out. We provide a new tool for speech quality… ▽ More

    Submitted 27 January, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

  24. arXiv:2110.01763  [pdf, other

    eess.AS cs.SD

    DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

    Authors: Chandan K A Reddy, Vishak Gopal, Ross Cutler

    Abstract: Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. We have recently developed a non-intrusive speech quality metric called Deep Noise Suppression Mean Opinion Score (DNSMOS) using the scores from ITU-T Rec. P.808 subjective evaluation. The P.808 scores reflect the overall q… ▽ More

    Submitted 4 February, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2010.15258

  25. arXiv:2104.04371  [pdf, other

    cs.MM eess.AS

    Speech Quality Assessment in Crowdsourcing: Comparison Category Rating Method

    Authors: Babak Naderi, Sebastian Möller, Ross Cutler

    Abstract: Traditionally, Quality of Experience (QoE) for a communication system is evaluated through a subjective test. The most common test method for speech QoE is the Absolute Category Rating (ACR), in which participants listen to a set of stimuli, processed by the underlying test conditions, and rate their perceived quality for each stimulus on a specific scale. The Comparison Category Rating (CCR) is a… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted for QoMEX2021

  26. Meeting Effectiveness and Inclusiveness in Remote Collaboration

    Authors: Ross Cutler, Yasaman Hosseinkashi, Jamie Pool, Senja Filipi, Robert Aichner, Yuan Tu, Johannes Gehrke

    Abstract: A primary goal of remote collaboration tools is to provide effective and inclusive meetings for all participants. To study meeting effectiveness and meeting inclusiveness, we first conducted a large-scale email survey (N=4,425; after filtering N=3,290) at a large technology company (pre-COVID-19); using this data we derived a multivariate model of meeting effectiveness and show how it correlates w… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  27. arXiv:2101.01902  [pdf, other

    cs.SD cs.LG eess.AS

    Interspeech 2021 Deep Noise Suppression Challenge

    Authors: Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

    Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, wh… ▽ More

    Submitted 4 April, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.06122

  28. arXiv:2011.12715  [pdf, other

    cs.AI cs.LG cs.NI cs.SE

    Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

    Authors: Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler , et al. (1 additional authors not shown)

    Abstract: Large software systems tune hundreds of 'constants' to optimize their runtime performance. These values are commonly derived through intuition, lab tests, or A/B tests. A 'one-size-fits-all' approach is often sub-optimal as the best value depends on runtime context. In this paper, we provide an experimental approach to replace constants with learned contextual functions for Skype - a widely used r… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: Workshop on ML for Systems at NeurIPS 2020, Accepted

    Journal ref: ML for Systems, NeurIPS 2020

  29. arXiv:2010.15258  [pdf, other

    cs.SD cs.LG eess.AS

    DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

    Authors: Chandan K A Reddy, Vishak Gopal, Ross Cutler

    Abstract: Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the re… ▽ More

    Submitted 10 February, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2020

  30. arXiv:2010.13200  [pdf, other

    eess.AS cs.SD

    Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing

    Authors: Babak Naderi, Ross Cutler

    Abstract: The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to the ITU-T Rec. P.835, in which participants rate background noise, speech signal, and overall quality separately. This paper introduces an open-source toolkit for conducting subjective quality evaluation of noise suppressed speech in crowdsourc… ▽ More

    Submitted 16 April, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

  31. arXiv:2010.13063  [pdf, other

    eess.AS cs.SD

    Crowdsourcing approach for subjective evaluation of echo impairment

    Authors: Ross Cutler, Babak Naderi, Markus Loide, Sten Sootla, Ando Saabas

    Abstract: The quality of acoustic echo cancellers (AECs) in real-time communication systems is typically evaluated using objective metrics like ERLE and PESQ, and less commonly with lab-based subjective tests like ITU-T Rec. P.831. We will show that these objective measures are not well correlated to subjective measures. We then introduce an open-source crowdsourcing approach for subjective evaluation of ec… ▽ More

    Submitted 27 February, 2022; v1 submitted 25 October, 2020; originally announced October 2020.

  32. arXiv:2009.04972  [pdf, other

    eess.AS cs.SD

    ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results

    Authors: Kusha Sridhar, Ross Cutler, Ando Saabas, Tanel Parnamaa, Markus Loide, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan

    Abstract: The ICASSP 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. Many recent AEC studies report good performance on synthetic datasets where the train and test samples come from the same underlying distributio… ▽ More

    Submitted 30 October, 2020; v1 submitted 10 September, 2020; originally announced September 2020.

  33. arXiv:2007.14598  [pdf, other

    eess.AS cs.SD

    DNN No-Reference PSTN Speech Quality Prediction

    Authors: Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner

    Abstract: Classic public switched telephone networks (PSTN) are often a black box for VoIP network providers, as they have no access to performance indicators, such as delay or packet loss. Only the degraded output speech signal can be used to monitor the speech quality of these networks. However, the current state-of-the-art speech quality models are not reliable enough to be used for live monitoring. One… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

  34. arXiv:2006.12793  [pdf, other

    cs.CY cs.AI cs.SI

    Lumos: A Library for Diagnosing Metric Regressions in Web-Scale Applications

    Authors: Jamie Pool, Ebrahim Beyrami, Vishak Gopal, Ashkan Aazami, Jayant Gupchup, Jeff Rowland, Binlong Li, Pritesh Kanani, Ross Cutler, Johannes Gehrke

    Abstract: Web-scale applications can ship code on a daily to weekly cadence. These applications rely on online metrics to monitor the health of new releases. Regressions in metric values need to be detected and diagnosed as early as possible to reduce the disruption to users and product owners. Regressions in metrics can surface due to a variety of reasons: genuine product regressions, changes in user popul… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  35. arXiv:2005.13981  [pdf

    eess.AS cs.LG cs.SD

    The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

    Authors: Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

    Abstract: The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. While the performanc… ▽ More

    Submitted 18 October, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:2001.08662

  36. An Open source Implementation of ITU-T Recommendation P.808 with Validation

    Authors: Babak Naderi, Ross Cutler

    Abstract: The ITU-T Recommendation P.808 provides a crowdsourcing approach for conducting a subjective assessment of speech quality using the Absolute Category Rating (ACR) method. We provide an open-source implementation of the ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended our implementation to include Degradation Category Ratings (DCR) and Comparison Category Ratings (CCR)… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  37. arXiv:2002.03977  [pdf

    eess.AS cs.LG cs.MM stat.ML

    Multimodal active speaker detection and virtual cinematography for video conferencing

    Authors: Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle

    Abstract: Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expe… ▽ More

    Submitted 24 May, 2022; v1 submitted 10 February, 2020; originally announced February 2020.

  38. arXiv:2001.10601  [pdf, other

    eess.AS cs.SD

    Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

    Authors: Yangyang Xia, Sebastian Braun, Chandan K. A. Reddy, Harishchandra Dubey, Ross Cutler, Ivan Tashev

    Abstract: This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two… ▽ More

    Submitted 12 February, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

  39. arXiv:2001.08662  [pdf

    cs.SD cs.LG eess.AS

    The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

    Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

    Abstract: The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. Many publications report r… ▽ More

    Submitted 19 April, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Details about Deep Noise Suppression Challenge

  40. arXiv:1912.02222  [pdf, other

    cs.NI cs.LG

    Reinforcement learning for bandwidth estimation and congestion control in real-time communications

    Authors: Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke

    Abstract: Bandwidth estimation and congestion control for real-time communications (i.e., audio and video conferencing) remains a difficult problem, despite many years of research. Achieving high quality of experience (QoE) for end users requires continual updates due to changing network architectures and technologies. In this paper, we apply reinforcement learning for the first time to the problem of real-… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: Workshop on ML for Systems at NeurIPS 2019

  41. arXiv:1909.08050  [pdf

    cs.SD cs.LG eess.AS

    A scalable noisy speech dataset and online subjective test framework

    Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

    Abstract: Background noise is a major source of quality impairments in Voice over Internet Protocol (VoIP) and Public Switched Telephone Network (PSTN) calls. Recent work shows the efficacy of deep learning for noise suppression, but the datasets have been relatively small compared to those used in other domains (e.g., ImageNet) and the associated evaluations have been more focused. In order to better facil… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: InterSpeech 2019

  42. arXiv:1907.01742  [pdf

    cs.SD cs.LG eess.AS

    Supervised Classifiers for Audio Impairments with Noisy Labels

    Authors: Chandan K A Reddy, Ross Cutler, Johannes Gehrke

    Abstract: Voice-over-Internet-Protocol (VoIP) calls are prone to various speech impairments due to environmental and network conditions resulting in bad user experience. A reliable audio impairment classifier helps to identify the cause for bad audio quality. The user feedback after the call can act as the ground truth labels for training a supervised classifier on a large audio dataset. However, the labels… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: To appear in INTERSPEECH 2019

  43. Trustworthy Experimentation Under Telemetry Loss

    Authors: Jayant Gupchup, Yasaman Hosseinkashi, Pavel Dmitriev, Daniel Schneider, Ross Cutler, Andrei Jefremov, Martin Ellis

    Abstract: Failure to accurately measure the outcomes of an experiment can lead to bias and incorrect conclusions. Online controlled experiments (aka AB tests) are increasingly being used to make decisions to improve websites as well as mobile and desktop applications. We argue that loss of telemetry data (during upload or post-processing) can skew the results of experiments, leading to loss of statistical p… ▽ More

    Submitted 21 January, 2019; originally announced March 2019.

    Comments: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, October 2018

  44. arXiv:1903.06908  [pdf, other

    eess.AS cs.SD

    Non-intrusive speech quality assessment using neural networks

    Authors: Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, Johannes Gehrke

    Abstract: Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we present an investigation of the applicability of neural networks for non-intrusive audio quality assessment. We propose three neural network-based approac… ▽ More

    Submitted 16 March, 2019; originally announced March 2019.

    Comments: Accepted at ICASSP 2019

  45. arXiv:1808.06152  [pdf, other

    stat.ME cs.LG stat.ML

    On Design of Problem Token Questions in Quality of Experience Surveys

    Authors: Jayant Gupchup, Ebrahim Beyrami, Martin Ellis, Yasaman Hosseinkashi, Sam Johnson, Ross Cutler

    Abstract: User surveys for Quality of Experience (QoE) are a critical source of information. In addition to the common "star rating" used to estimate Mean Opinion Score (MOS), more detailed survey questions (problem tokens) about specific areas provide valuable insight into the factors impacting QoE. This paper explores two aspects of the problem token questionnaire design. First, we study the bias introduc… ▽ More

    Submitted 18 August, 2018; originally announced August 2018.

  46. Analysis of Problem Tokens to Rank Factors Impacting Quality in VoIP Applications

    Authors: Jayant Gupchup, Yasaman Hosseinkashi, Martin Ellis, Sam Johnson, Ross Cutler

    Abstract: User-perceived quality-of-experience (QoE) in internet telephony systems is commonly evaluated using subjective ratings computed as a Mean Opinion Score (MOS). In such systems, while user MOS can be tracked on an ongoing basis, it does not give insight into which factors of a call induced any perceived degradation in QoE -- it does not tell us what caused a user to have a sub-optimal experience. F… ▽ More

    Submitted 25 March, 2018; originally announced May 2018.

    Journal ref: Quality of Multimedia Experience (QoMEX), 2017 Ninth International Conference