Skip to main content

Showing 1–7 of 7 results for author: Kulis, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.14048  [pdf, ps, other

    cs.SD cs.CL eess.AS

    The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data

    Authors: Alice Baird, Rachel Manzelli, Panagiotis Tzirakis, Chris Gagne, Haoqi Li, Sadie Allen, Sander Dieleman, Brian Kulis, Shrikanth S. Narayanan, Alan Cowen

    Abstract: The NeurIPS 2023 Machine Learning for Audio Workshop brings together machine learning (ML) experts from various audio domains. There are several valuable audio-driven ML tasks, from speech emotion recognition to audio event detection, but the community is sparse compared to other ML areas, e.g., computer vision or natural language processing. A major limitation with audio is the available data; wi… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  2. arXiv:2401.06897  [pdf, other

    eess.AS

    Maximum-Entropy Adversarial Audio Augmentation for Keyword Spotting

    Authors: Zuzhao Ye, Gregory Ciccarelli, Brian Kulis

    Abstract: Data augmentation is a key tool for improving the performance of deep networks, particularly when there is limited labeled data. In some fields, such as computer vision, augmentation methods have been extensively studied; however, for speech and audio data, there are relatively fewer methods developed. Using adversarial learning as a starting point, we develop a simple and effective augmentation s… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: 5 pages, 2 figures

  3. Latency Control for Keyword Spotting

    Authors: Christin Jose, Joseph Wang, Grant P. Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis

    Abstract: Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function withou… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Proceedings of INTERSPEECH

  4. arXiv:2109.14725  [pdf, other

    cs.LG cs.SD eess.AS

    Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting

    Authors: Mohammad Omar Khursheed, Christin Jose, Rajath Kumar, Gengshen Fu, Brian Kulis, Santosh Kumar Cheekatmalla

    Abstract: In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architectu… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.12941

    ACM Class: I.2.0

  5. arXiv:2105.07512  [pdf, other

    cs.CV cs.LG eess.IV

    Substitutional Neural Image Compression

    Authors: Xiao Wang, Wei Jiang, Wei Wang, Shan Liu, Brian Kulis, Peter Chin

    Abstract: We describe Substitutional Neural Image Compression (SNIC), a general approach for enhancing any neural image compression model, that requires no data or additional tuning of the trained model. It boosts compression performance toward a flexible distortion metric and enables bit-rate control using a single model instance. The key idea is to replace the image to be compressed with a substitutional… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

  6. arXiv:2011.12941  [pdf, other

    eess.AS

    Small Footprint Convolutional Recurrent Networks for Streaming Wakeword Detection

    Authors: Mohammad Omar Khursheed, Christin Jose, Rajath Kumar, Gengshen Fu, Brian Kulis, Santosh Kumar Cheekatmalla

    Abstract: In this work, we propose small footprint Convolutional Recurrent Neural Network models applied to the problem of wakeword detection and augment them with scaled dot product attention. We find that false accepts compared to Convolutional Neural Network models in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using CRNNs, and we can get up to 32% improvement… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  7. arXiv:1806.09905  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Conditioning Deep Generative Raw Audio Models for Structured Automatic Music

    Authors: Rachel Manzelli, Vijay Thakkar, Ali Siahkamari, Brian Kulis

    Abstract: Existing automatic music generation approaches that feature deep learning can be broadly classified into two types: raw audio models and symbolic models. Symbolic models, which train and generate at the note level, are currently the more prevalent approach; these models can capture long-range dependencies of melodic structure, but fail to grasp the nuances and richness of raw audio generations. Ra… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

    Comments: Presented at the ISMIR 2018 Conference