Skip to main content

Showing 1–9 of 9 results for author: Matsoukas, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2206.13476  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework

    Authors: Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas

    Abstract: Acoustic events are sounds with well-defined spectro-temporal characteristics which can be associated with the physical objects generating them. Acoustic scenes are collections of such acoustic events in no specific temporal order. Given this natural linkage between events and scenes, a common belief is that the ability to classify events must help in the classification of scenes. This has led to… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted at ISCA Interspeech 2022

  2. arXiv:2203.11997  [pdf, other

    cs.SD cs.LG eess.AS

    Federated Self-Supervised Learning for Acoustic Event Classification

    Authors: Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploade… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  3. arXiv:2102.06357  [pdf, other

    cs.SD cs.LG eess.AS

    Contrastive Unsupervised Learning for Speech Emotion Recognition

    Authors: Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang

    Abstract: Speech emotion recognition (SER) is a key technology to enable more natural human-machine communication. However, SER has long suffered from a lack of public large-scale labeled datasets. To circumvent this problem, we investigate how unsupervised representation learning on unlabeled datasets can benefit SER. We show that the contrastive predictive coding (CPC) method can learn salient representat… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  4. arXiv:2010.06659  [pdf, other

    eess.AS cs.LG cs.SD

    Towards Data-efficient Modeling for Wake Word Spotting

    Authors: Yixin Gao, Yuriy Mishchenko, Anish Shah, Spyros Matsoukas, Shiv Vitaladevuni

    Abstract: Wake word (WW) spotting is challenging in far-field not only because of the interference in signal transmission but also the complexity in acoustic environments. Traditional WW model training requires large amount of in-domain WW-specific data with substantial human annotations therefore it is hard to build WW models without such data. In this paper we present data-efficient solutions to address t… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Journal ref: Proc. ICASSP 2020

  5. arXiv:2002.09143  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Few-shot acoustic event detection via meta-learning

    Authors: Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang

    Abstract: We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been under-studied. We formulate few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: ICASSP 2020

  6. arXiv:1907.00873  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Compression of Acoustic Event Detection Models With Quantized Distillation

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems. Recently deep neural network significantly advances this field and reduces detection errors to a large scale. However how to efficiently execute deep models in AED has received much less attention. Meanwhile state-of-the-art AED models are based on lar… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Interspeech 2019

  7. arXiv:1905.00855  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models. Our experimental results show this combined compression approach is very effective. For a three-layer long short-term memory (LSTM) based AED model, the original model size can be r… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: NeuralPS 2018 CDNNRIA workshop

  8. arXiv:1904.12926  [pdf, other

    eess.AS cs.LG cs.SD

    Semi-supervised Acoustic Event Detection based on tri-training

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset. Recent acoustic event detectors are based on large-scale neural networks, which are typically trained with huge amounts of labeled data. Labels for acoustic events are expensive to obtain, and relevant acoustic event audios can be limited, especially for rare events. In this paper we leverage an… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

    Comments: 5 pages

  9. arXiv:1808.02504  [pdf, other

    cs.CL eess.AS

    Device-directed Utterance Detection

    Authors: Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

    Abstract: In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabling wake-word free follow-up queries. Consider the example interaction: $"Computer,~play~music", "Computer,~reduce~the~volume"$. In this interaction,… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

    Comments: Interspeech 2018 (accepted)