Skip to main content

Showing 1–5 of 5 results for author: Goodwin, M M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.06683  [pdf, other

    eess.AS cs.LG cs.SD

    Sound Source Separation Using Latent Variational Block-Wise Disentanglement

    Authors: Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin

    Abstract: While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solutions. In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highligh… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  2. arXiv:2310.07032  [pdf, other

    cs.SD cs.LG eess.AS eess.SY

    Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing

    Authors: Karim Helwani, Erfan Soltanmohammadi, Michael M. Goodwin

    Abstract: Improving the interpretability of deep neural networks has recently gained increased attention, especially when the power of deep learning is leveraged to solve problems in physics. Interpretability helps us understand a model's ability to generalize and reveal its limitations. In this paper, we introduce a causal interpretable deep structure for modeling dynamic systems. Our proposed model makes… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  3. arXiv:2309.14521  [pdf, other

    eess.AS cs.SD

    NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Sha**

    Authors: Jan Büthe, Ahmed Mustafa, Jean-Marc Valin, Karim Helwani, Michael M. Goodwin

    Abstract: Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this… ▽ More

    Submitted 12 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: final version, accepted at ICASSP 2024

  4. arXiv:2302.11768  [pdf, other

    eess.AS cs.SD

    A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement

    Authors: Zhepei Wang, Ritwik Giri, Devansh Shah, Jean-Marc Valin, Michael M. Goodwin, Paris Smaragdis

    Abstract: In this study, we present an approach to train a single speech enhancement network that can perform both personalized and non-personalized speech enhancement. This is achieved by incorporating a frame-wise conditioning input that specifies the type of enhancement output. To improve the quality of the enhanced output and mitigate oversuppression, we experiment with re-weighting frames by the presen… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  5. arXiv:2203.15092  [pdf, other

    eess.AS cs.LG cs.SD

    Improved singing voice separation with chromagram-based pitch-aware remixing

    Authors: Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin, Michael M. Goodwin, Arvindh Krishnaswamy

    Abstract: Singing voice separation aims to separate music into vocals and accompaniment components. One of the major constraints for the task is the limited amount of training data with separated vocals. Data augmentation techniques such as random source mixing have been shown to make better use of existing data and mildly improve model performance. We propose a novel data augmentation technique, chromagram… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: To appear at ICASSP 2022, 5 pages, 1 figure