Skip to main content

Showing 1–5 of 5 results for author: Szurley, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2202.08532  [pdf, other

    eess.AS cs.AI cs.LG cs.NE cs.SD

    Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition

    Authors: Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko

    Abstract: In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples. We focus on a rigorous and empirical "closed-model adversarial robustness" setting (e.g., on-device or cloud applications). The adversarial noise is only generated by closed-model optimization (e.g., evolutionary and zeroth-order estimation) without ac… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: Accepted to ICASSP 2022

  2. arXiv:1906.06355  [pdf, other

    eess.AS cs.SD

    Perceptual Based Adversarial Audio Attacks

    Authors: Joseph Szurley, J. Zico Kolter

    Abstract: Recent work has shown the possibility of adversarial attacks on automatic speechrecognition (ASR) systems. However, in the vast majority of work in this area, theattacks have been executed only in the digital space, or have involved short phrasesand static room settings. In this paper, we demonstrate a physically realizableaudio adversarial attack. We base our approach specifically on a psychoacou… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

  3. arXiv:1712.09680  [pdf, other

    cs.SD eess.AS

    A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging

    Authors: Juncheng Li, Yun Wang, Joseph Szurley, Florian Metze, Samarjit Das

    Abstract: The lack of strong labels has severely limited the state-of-the-art fully supervised audio tagging systems to be scaled to larger dataset. Meanwhile, audio-visual learning models based on unlabeled videos have been successfully applied to audio tagging, but they are inevitably resource hungry and require a long time to train. In this work, we propose a light-weight, multimodal framework for enviro… ▽ More

    Submitted 1 March, 2018; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: 5 pages, 3 figures, Accepted and to appear at ICASSP 2018

  4. arXiv:1712.09673  [pdf, other

    cs.SD eess.AS

    Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

    Authors: Shao-Yen Tseng, Juncheng Li, Yun Wang, Joseph Szurley, Florian Metze, Samarjit Das

    Abstract: State-of-the-art audio event detection (AED) systems rely on supervised learning using strongly labeled data. However, this dependence severely limits scalability to large-scale datasets where fine resolution annotations are too expensive to obtain. In this paper, we propose a small-footprint multiple instance learning (MIL) framework for multi-class AED using weakly annotated labels. The proposed… ▽ More

    Submitted 26 March, 2018; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: 5 pages, 3 figures

  5. arXiv:1712.09668  [pdf, other

    cs.SD eess.AS

    Eventness: Object Detection on Spectrograms for Temporal Localization of Audio Events

    Authors: Phuong Pham, Juncheng Li, Joseph Szurley, Samarjit Das

    Abstract: In this paper, we introduce the concept of Eventness for audio event detection, which can, in part, be thought of as an analogue to Objectness from computer vision. The key observation behind the eventness concept is that audio events reveal themselves as 2-dimensional time-frequency patterns with specific textures and geometric structures in spectrograms. These time-frequency patterns can then be… ▽ More

    Submitted 19 February, 2018; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: 5 pages, 3 figures, accepted to ICASSP 2018