Skip to main content

Showing 1–8 of 8 results for author: Stamenovic, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.14246  [pdf, other

    eess.AS cs.AI

    CATSE: A Context-Aware Framework for Causal Target Sound Extraction

    Authors: Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

    Abstract: Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO 2024

  2. arXiv:2403.12182  [pdf, other

    eess.AS

    Latent CLAP Loss for Better Foley Sound Synthesis

    Authors: Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

    Abstract: Foley sound generation, the art of creating audio for multimedia, has recently seen notable advancements through text-conditioned latent diffusion models. These systems use multimodal text-audio representation models, such as Contrastive Language-Audio Pretraining (CLAP), whose objective is to map corresponding audio and text prompts into a joint embedding space. AudioLDM, a text-to-audio model, w… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  3. arXiv:2309.08144  [pdf, other

    cs.SD cs.LG eess.AS

    Two-Step Knowledge Distillation for Tiny Speech Enhancement

    Authors: Rayan Daod Nathoo, Mikolaj Kegler, Marko Stamenovic

    Abstract: Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Under review ICASSP 2024

  4. CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment

    Authors: Yuchen Liu, Li-Chia Yang, Alex Pawlicki, Marko Stamenovic

    Abstract: Speech quality assessment has been a critical component in many voice communication related applications such as telephony and online conferencing. Traditional intrusive speech quality assessment requires the clean reference of the degraded utterance to provide an accurate quality measurement. This requirement limits the usability of these methods in real-world scenarios. On the other hand, non-in… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  5. arXiv:2211.02542  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    Self-Supervised Learning for Speech Enhancement through Synthesis

    Authors: Bryce Irvin, Marko Stamenovic, Mikolaj Kegler, Li-Chia Yang

    Abstract: Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative speech synthesis, where the system's output is synthesized by a neural vocoder after an inherently lossy feature-denoising step. In this paper, we propose a denoisin… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

  6. arXiv:2111.02351  [pdf, other

    cs.SD cs.LG eess.AS

    Weight, Block or Unit? Exploring Sparsity Tradeoffs for Speech Enhancement on Tiny Neural Accelerators

    Authors: Marko Stamenovic, Nils L. Westhausen, Li-Chia Yang, Carl Jensen, Alex Pawlicki

    Abstract: We explore network sparsification strategies with the aim of compressing neural speech enhancement (SE) down to an optimal configuration for a new generation of low power microcontroller based neural accelerators (microNPU's). We examine three unique sparsity structures: weight pruning, block pruning and unit pruning; and discuss their benefits and drawbacks when applied to SE. We focus on the int… ▽ More

    Submitted 9 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: To appear in NeurIPS 2021 Efficient Natural Langauge and Speech Processing Workshop as oral-spotlight presentation

  7. arXiv:2005.11138  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

    Authors: Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough

    Abstract: Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs). However, large RNNs limit practical deployment in hearing aid hardware (HW) form-factors, which are battery powered and run on resource-constrained microcontroller units (MCUs) with limited memory capacity and compute capability. In this work, we use model compression techn… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: First four authors contributed equally. For audio samples, see https://github.com/BoseCorp/efficient-neural-speech-enhancement

  8. arXiv:2005.10294  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Towards Cover Song Detection with Siamese Convolutional Neural Networks

    Authors: Marko Stamenovic

    Abstract: A cover song, by definition, is a new performance or recording of a previously recorded, commercially released song. It may be by the original artist themselves or a different artist altogether and can vary from the original in unpredictable ways including key, arrangement, instrumentation, timbre and more. In this work we propose a novel approach to learning audio representations for the task of… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: Code available at https://github.com/markostam/coversongs-dual-convnet

    Journal ref: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018