Skip to main content

Showing 1–3 of 3 results for author: Irvin, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.14246  [pdf, other

    eess.AS cs.AI

    CATSE: A Context-Aware Framework for Causal Target Sound Extraction

    Authors: Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

    Abstract: Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE m… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Submitted to EUSIPCO 2024

  2. arXiv:2403.12182  [pdf, other

    eess.AS

    Latent CLAP Loss for Better Foley Sound Synthesis

    Authors: Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

    Abstract: Foley sound generation, the art of creating audio for multimedia, has recently seen notable advancements through text-conditioned latent diffusion models. These systems use multimodal text-audio representation models, such as Contrastive Language-Audio Pretraining (CLAP), whose objective is to map corresponding audio and text prompts into a joint embedding space. AudioLDM, a text-to-audio model, w… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  3. arXiv:2211.02542  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    Self-Supervised Learning for Speech Enhancement through Synthesis

    Authors: Bryce Irvin, Marko Stamenovic, Mikolaj Kegler, Li-Chia Yang

    Abstract: Modern speech enhancement (SE) networks typically implement noise suppression through time-frequency masking, latent representation masking, or discriminative signal prediction. In contrast, some recent works explore SE via generative speech synthesis, where the system's output is synthesized by a neural vocoder after an inherently lossy feature-denoising step. In this paper, we propose a denoisin… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.