Skip to main content

Showing 1–10 of 10 results for author: Rozgic, V

.
  1. arXiv:2303.10351  [pdf, other

    cs.SD eess.AS

    Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints

    Authors: Guan-Ting Lin, Qingming Tang, Chieh-Chi Kao, Viktor Rozgic, Chao Wang

    Abstract: Acoustic Event Classification (AEC) has been widely used in devices such as smart speakers and mobile phones for home safety or accessibility support. As AEC models run on more and more devices with diverse computation resource constraints, it became increasingly expensive to develop models that are tuned to achieve optimal accuracy/computation trade-off for each given computation resource constra… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  2. arXiv:2203.11997  [pdf, other

    cs.SD cs.LG eess.AS

    Federated Self-Supervised Learning for Acoustic Event Classification

    Authors: Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization. Federated learning (FL) is a compelling framework that decouples data collection and model training to enhance customer privacy. In this work, we investigate the feasibility of applying FL to improve AEC performance while no customer data can be directly uploade… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  3. arXiv:2201.11826  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

    Authors: Ayoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang

    Abstract: We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the a… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

    Comments: ICASSP 2022

    ACM Class: I.2.7

  4. arXiv:2102.06357  [pdf, other

    cs.SD cs.LG eess.AS

    Contrastive Unsupervised Learning for Speech Emotion Recognition

    Authors: Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang

    Abstract: Speech emotion recognition (SER) is a key technology to enable more natural human-machine communication. However, SER has long suffered from a lack of public large-scale labeled datasets. To circumvent this problem, we investigate how unsupervised representation learning on unlabeled datasets can benefit SER. We show that the contrastive predictive coding (CPC) method can learn salient representat… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  5. arXiv:1907.00873  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Compression of Acoustic Event Detection Models With Quantized Distillation

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems. Recently deep neural network significantly advances this field and reduces detection errors to a large scale. However how to efficiently execute deep models in AED has received much less attention. Meanwhile state-of-the-art AED models are based on lar… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Interspeech 2019

  6. arXiv:1906.10198  [pdf, other

    cs.CL cs.SD eess.AS

    Multimodal and Multi-view Models for Emotion Recognition

    Authors: Gustavo Aguilar, Viktor Rozgić, Weiran Wang, Chao Wang

    Abstract: Studies on emotion recognition (ER) show that combining lexical and acoustic information results in more robust and accurate models. The majority of the studies focus on settings where both modalities are available in training and evaluation. However, in practice, this is not always the case; getting ASR output may represent a bottleneck in a deployment pipeline due to computational complexity or… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  7. arXiv:1905.00855  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models. Our experimental results show this combined compression approach is very effective. For a three-layer long short-term memory (LSTM) based AED model, the original model size can be r… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: NeuralPS 2018 CDNNRIA workshop

  8. arXiv:1904.12926  [pdf, other

    eess.AS cs.LG cs.SD

    Semi-supervised Acoustic Event Detection based on tri-training

    Authors: Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang

    Abstract: This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset. Recent acoustic event detectors are based on large-scale neural networks, which are typically trained with huge amounts of labeled data. Labels for acoustic events are expensive to obtain, and relevant acoustic event audios can be limited, especially for rare events. In this paper we leverage an… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

    Comments: 5 pages

  9. arXiv:1705.06709  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Learning Spatiotemporal Features for Infrared Action Recognition with 3D Convolutional Neural Networks

    Authors: Zhuolin Jiang, Viktor Rozgic, Sancar Adali

    Abstract: Infrared (IR) imaging has the potential to enable more robust action recognition systems compared to visible spectrum cameras due to lower sensitivity to lighting conditions and appearance variability. While the action recognition task on videos collected from visible spectrum imaging has received much attention, action recognition in IR videos is significantly less explored. Our objective is to e… ▽ More

    Submitted 18 May, 2017; originally announced May 2017.

  10. arXiv:1602.01168  [pdf, ps, other

    cs.CV cs.LG cs.MM cs.NE stat.ML

    Learning Discriminative Features via Label Consistent Neural Network

    Authors: Zhuolin Jiang, Yaming Wang, Larry Davis, Walt Andrews, Viktor Rozgic

    Abstract: Deep Convolutional Neural Networks (CNN) enforces supervised information only at the output layer, and hidden layers are trained by back propagating the prediction error from the output layer without explicit supervision. We propose a supervised feature learning approach, Label Consistent Neural Network, which enforces direct supervision in late hidden layers. We associate each neuron in a hidden… ▽ More

    Submitted 4 June, 2016; v1 submitted 2 February, 2016; originally announced February 2016.