Search | arXiv e-print repository

Speech enhancement deep-learning architecture for efficient edge processing

Authors: Monisankha Pal, Arvind Ramanathan, Ted Wada, Ashutosh Pandey

Abstract: Deep learning has become a de facto method of choice for speech enhancement tasks with significant improvements in speech quality. However, real-time processing with reduced size and computations for low-power edge devices drastically degrades speech quality. Recently, transformer-based architectures have greatly reduced the memory requirements and provided ways to improve the model performance th… ▽ More Deep learning has become a de facto method of choice for speech enhancement tasks with significant improvements in speech quality. However, real-time processing with reduced size and computations for low-power edge devices drastically degrades speech quality. Recently, transformer-based architectures have greatly reduced the memory requirements and provided ways to improve the model performance through local and global contexts. However, the transformer operations remain computationally heavy. In this work, we introduce WaveUNet squeeze-excitation Res2 (WSR)-based metric generative adversarial network (WSR-MGAN) architecture that can be efficiently implemented on low-power edge devices for noise suppression tasks while maintaining speech quality. We utilize multi-scale features using Res2Net blocks that can be related to spectral content used in speech-processing tasks. In the generator, we integrate squeeze-excitation blocks (SEB) with multi-scale features for maintaining local and global contexts along with gated recurrent units (GRUs). The proposed approach is optimized through a combined loss function calculated over raw waveform, multi-resolution magnitude spectrogram, and objective metrics using a metric discriminator. Experimental results in terms of various objective metrics on VoiceBank+DEMAND and DNS-2020 challenge datasets demonstrate that the proposed speech enhancement (SE) approach outperforms the baselines and achieves state-of-the-art (SOTA) performance in the time domain. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:1907.05036 [pdf, ps, other]

doi 10.1527/tjsai.D-J13

Extension of Sinkhorn Method: Optimal Movement Estimation of Agents Moving at Constant Velocity

Authors: Daigo Okada, Naotoshi Nakamura, Takuya Wada, Ayako Iwasaki, Ryo Yamada

Abstract: In the field of bioimaging, an important part of analyzing the motion of objects is tracking. We propose a method that applies the Sinkhorn distance for solving the optimal transport problem to track objects. The advantage of this method is that it can flexibly incorporate various assumptions in tracking as a cost matrix. First, we extend the Sinkhorn distance from two dimensions to three dimensio… ▽ More In the field of bioimaging, an important part of analyzing the motion of objects is tracking. We propose a method that applies the Sinkhorn distance for solving the optimal transport problem to track objects. The advantage of this method is that it can flexibly incorporate various assumptions in tracking as a cost matrix. First, we extend the Sinkhorn distance from two dimensions to three dimensions. Using this three-dimensional distance, we compare the performance of two types of tracking technique, namely tracking that associates objects that are close to each other, which conventionally uses the nearest-neighbor method, and tracking that assumes that the object is moving at constant velocity, using three types of simulation data. The results suggest that when tracking objects moving at constant velocity, our method is superior to conventional nearest-neighbor tracking as long as the added noise is not excessively large. We show that the Sinkhorn method can be applied effectively to object tracking. Our simulation data analysis suggests that when objects are moving at constant velocity, our method, which sets acceleration as a cost, outperforms the traditional nearest-neighbor method in terms of tracking objects. To apply the proposed method to real bioimaging data, it is necessary to set an appropriate cost indicator based on the movement features. △ Less

Submitted 11 July, 2019; originally announced July 2019.

Comments: 12 pages, 7 figures, 2 tables

MSC Class: 92C55

Journal ref: Transactions of the Japanese Society for Artificial Intelligence 34.5 (2019) D-J13_1-7

arXiv:1804.09348 [pdf, other]

doi 10.1371/journal.pone.0237654

Generalized Gaussian Kernel Adaptive Filtering

Authors: Tomoya Wada, Kosuke Fukumori, Toshihisa Tanaka, Simone Fiori

Abstract: The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type… ▽ More The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and data-driven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed least-square-type rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positive-definiteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1-regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method. △ Less

Submitted 25 April, 2018; originally announced April 2018.

Showing 1–3 of 3 results for author: Wada, T