Skip to main content

Showing 1–10 of 10 results for author: Helwani, K

.
  1. arXiv:2402.06683  [pdf, other

    eess.AS cs.LG cs.SD

    Sound Source Separation Using Latent Variational Block-Wise Disentanglement

    Authors: Karim Helwani, Masahito Togami, Paris Smaragdis, Michael M. Goodwin

    Abstract: While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solutions. In this paper, we present a hybrid classical digital signal processing/deep neural network (DSP/DNN) approach to source separation (SS) highligh… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  2. arXiv:2402.00337  [pdf, other

    eess.AS

    Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure

    Authors: Masahito Togami, Jean-Marc Valin, Karim Helwani, Ritwik Giri, Umut Isik, Michael M. Goodwin

    Abstract: We introduce a real-time, multichannel speech enhancement algorithm which maintains the spatial cues of stereo recordings including two speech sources. Recognizing that each source has unique spatial information, our method utilizes a dual-path structure, ensuring the spatial cues remain unaffected during enhancement by applying source-specific common-band gain. This method also seamlessly integra… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted for ICASSP 2024, 5 pages

  3. arXiv:2310.07032  [pdf, other

    cs.SD cs.LG eess.AS eess.SY

    Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic System Identification with Application to Audio Processing

    Authors: Karim Helwani, Erfan Soltanmohammadi, Michael M. Goodwin

    Abstract: Improving the interpretability of deep neural networks has recently gained increased attention, especially when the power of deep learning is leveraged to solve problems in physics. Interpretability helps us understand a model's ability to generalize and reveal its limitations. In this paper, we introduce a causal interpretable deep structure for modeling dynamic systems. Our proposed model makes… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  4. arXiv:2309.14521  [pdf, other

    eess.AS cs.SD

    NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Sha**

    Authors: Jan Büthe, Ahmed Mustafa, Jean-Marc Valin, Karim Helwani, Michael M. Goodwin

    Abstract: Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this… ▽ More

    Submitted 12 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: final version, accepted at ICASSP 2024

  5. arXiv:2305.18552  [pdf, other

    cs.LG cs.NE

    Learning Linear Groups in Neural Networks

    Authors: Emmanouil Theodosis, Karim Helwani, Demba Ba

    Abstract: Employing equivariance in neural networks leads to greater parameter efficiency and improved generalization performance through the encoding of domain knowledge in the architecture; however, the majority of existing approaches require an a priori specification of the desired symmetries. We present a neural network architecture, Linear Group Networks (LGNs), for learning linear groups acting on the… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  6. arXiv:2202.01784  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Audio Anomaly Detection

    Authors: Wo Jae Lee, Karim Helwani, Arvindh Krishnaswamy, Srikanth Tenneti

    Abstract: We propose an outlier robust multivariate time series model which can be used for detecting previously unseen anomalous sounds based on noisy training data. The presented approach doesn't assume the presence of labeled anomalies in the training dataset and uses a novel deep neural network architecture to learn the temporal dynamics of the multivariate time series at multiple resolutions while bein… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: Accepted paper at RobustML Workshop@ICLR 2021

    Journal ref: RobustML Workshop - ICLR 2021

  7. arXiv:2102.05245  [pdf, other

    eess.AS

    Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet

    Authors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy

    Abstract: Speech enhancement algorithms based on deep learning have greatly surpassed their traditional counterparts and are now being considered for the task of removing acoustic echo from hands-free communication systems. This is a challenging problem due to both real-world constraints like loudspeaker non-linearities, and to limited compute capabilities in some communication systems. In this work, we pro… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Accepted for ICASSP 2021, 5 pages

  8. arXiv:2102.05151  [pdf, other

    cs.SD cs.LG eess.AS

    Enhancing Audio Augmentation Methods with Consistency Learning

    Authors: Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang

    Abstract: Data augmentation is an inexpensive way to increase training data diversity and is commonly achieved via transformations of existing data. For tasks such as classification, there is a good case for learning representations of the data that are invariant to such transformations, yet this is not explicitly enforced by classification losses such as the cross-entropy loss. This paper investigates the… ▽ More

    Submitted 19 April, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: Accepted to 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

  9. arXiv:2008.04470  [pdf, other

    eess.AS cs.LG cs.NE cs.SD stat.ML

    PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

    Authors: Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy

    Abstract: Neural network applications generally benefit from larger-sized models, but for current speech enhancement models, larger scale networks often suffer from decreased robustness to the variety of real-world use cases beyond what is encountered in training data. We introduce several innovations that lead to better large neural networks for speech enhancement. The novel PoCoNet architecture is a convo… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: 5 pages, 3 figures, INTERSPEECH 2020

  10. arXiv:2008.04259  [pdf, other

    eess.AS

    A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

    Authors: Jean-Marc Valin, Umut Isik, Neerad Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy

    Abstract: Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation. Many of these new techniques operate directly in the the short-time Fourier transform (STFT) domain, resulting in a high computational complexity. In this work, we propose PercepNet, an efficient approach that relies on human p… ▽ More

    Submitted 27 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Proc. INTERSPEECH 2020, 5 pages