-
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
Authors:
Kentaro Seki,
Shinnosuke Takamichi,
Norihiro Takamune,
Yuki Saito,
Kanami Imamura,
Hiroshi Saruwatari
Abstract:
This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the stereo listening experience inherent in human hearing. Our baseline approach addresses this gap by integrating blind source separation (BSS), voice conve…
▽ More
This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the stereo listening experience inherent in human hearing. Our baseline approach addresses this gap by integrating blind source separation (BSS), voice conversion (VC), and spatial mixing to handle multi-channel waveforms. Through experimental evaluations, we organize and identify the key challenges inherent in this task, such as maintaining audio quality and accurately preserving spatial information. Our results highlight the fundamental difficulties in balancing these aspects, providing a benchmark for future research in spatial voice conversion. The proposed method's code is publicly available to encourage further exploration in this domain.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation
Authors:
Yuto Ishikawa,
Kohei Konaka,
Tomohiko Nakamura,
Norihiro Takamune,
Hiroshi Saruwatari
Abstract:
Real-time speech extraction is an important challenge with various applications such as speech recognition in a human-like avatar/robot. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank matrix analysis (ILRMA) and rank-constrained spatial covariance matrix estimation (RCSCME). The RCSCME-based method is a multichannel blind speech extrac…
▽ More
Real-time speech extraction is an important challenge with various applications such as speech recognition in a human-like avatar/robot. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank matrix analysis (ILRMA) and rank-constrained spatial covariance matrix estimation (RCSCME). The RCSCME-based method is a multichannel blind speech extraction method that demonstrates superior speech extraction performance in diffuse noise environments. To improve the performance, we introduce spatial regularization into the ILRMA part of the RCSCME-based speech extraction and design two regularizers. Speech extraction experiments demonstrated that the proposed methods can function in real time and the designed regularizers improve the speech extraction performance.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction
Authors:
Koki Nishida,
Norihiro Takamune,
Rintaro Ikeshita,
Daichi Kitamura,
Hiroshi Saruwatari,
Tomohiro Nakatani
Abstract:
In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA…
▽ More
In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA can achieve considerably fast BSE by incorporating an algorithm developed for independent vector extraction. In addition, to improve the BSE performance of NoisyILRMA, we propose a mechanism to switch the source model with ILRMA-like nonnegative matrix factorization to a more expressive source model during optimization. In the experiment, we show that NoisyILRMA runs faster than a FastMNMF algorithm while maintaining the BSE performance. We also confirm that the switching mechanism improves the BSE performance of NoisyILRMA.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
HumanDiffusion: diffusion model using perceptual gradients
Authors:
Yota Ueda,
Shinnosuke Takamichi,
Yuki Saito,
Norihiro Takamune,
Hiroshi Saruwatari
Abstract:
We propose {\it HumanDiffusion,} a diffusion model trained from humans' perceptual gradients to learn an acceptable range of data for humans (i.e., human-acceptable distribution). Conventional HumanGAN aims to model the human-acceptable distribution wider than the real-data distribution by training a neural network-based generator with human-based discriminators. However, HumanGAN training tends t…
▽ More
We propose {\it HumanDiffusion,} a diffusion model trained from humans' perceptual gradients to learn an acceptable range of data for humans (i.e., human-acceptable distribution). Conventional HumanGAN aims to model the human-acceptable distribution wider than the real-data distribution by training a neural network-based generator with human-based discriminators. However, HumanGAN training tends to converge in a meaningless distribution due to the gradient vanishing or mode collapse and requires careful heuristics. In contrast, our HumanDiffusion learns the human-acceptable distribution through Langevin dynamics based on gradients of human perceptual evaluations. Our training iterates a process to diffuse real data to cover a wider human-acceptable distribution and can avoid the issues in the HumanGAN training. The evaluation results demonstrate that our HumanDiffusion can successfully represent the human-acceptable distribution without any heuristics for the training.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides
Authors:
Kanami Imamura,
Tomohiko Nakamura,
Norihiro Takamune,
Kohei Yatabe,
Hiroshi Saruwatari
Abstract:
In this paper, we propose algorithms for handling non-integer strides in sampling-frequency-independent (SFI) convolutional and transposed convolutional layers. The SFI layers have been developed for handling various sampling frequencies (SFs) by a single neural network. They are replaceable with their non-SFI counterparts and can be introduced into various network architectures. However, they cou…
▽ More
In this paper, we propose algorithms for handling non-integer strides in sampling-frequency-independent (SFI) convolutional and transposed convolutional layers. The SFI layers have been developed for handling various sampling frequencies (SFs) by a single neural network. They are replaceable with their non-SFI counterparts and can be introduced into various network architectures. However, they could not handle some specific configurations when combined with non-SFI layers. For example, an SFI extension of Conv-TasNet, a standard audio source separation model, cannot handle some pairs of trained and target SFs because the strides of the SFI layers become non-integers. This problem cannot be solved by simple rounding or signal resampling, resulting in the significant performance degradation. To overcome this problem, we propose algorithms for handling non-integer strides by using windowed sinc interpolation. The proposed algorithms realize the continuous-time representations of features using the interpolation and enable us to sample instants with the desired stride. Experimental results on music source separation showed that the proposed algorithms outperformed the rounding- and signal-resampling-based methods at SFs lower than the trained SF.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis
Authors:
Sota Misawa,
Norihiro Takamune,
Tomohiko Nakamura,
Daichi Kitamura,
Hiroshi Saruwatari,
Masakazu Une,
Shoji Makino
Abstract:
Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method,…
▽ More
Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method, IDLMA requires deep neural networks (DNNs) to separate the target speech and the noise. We use Denoiser, which is a single-channel speech enhancement DNN, in IDLMA to estimate not only the target speech but also the noise. We also propose noise self-supervised RCSCME, in which we estimate the noise-only time intervals using the output of Denoiser and design the prior distribution of the noise spatial covariance matrix for RCSCME. We confirm that the proposed methods outperform the conventional methods under several noise conditions.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models
Authors:
Takuya Hasumi,
Tomohiko Nakamura,
Norihiro Takamune,
Hiroshi Saruwatari,
Daichi Kitamura,
Yu Takahashi,
Kazunobu Kondo
Abstract:
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mism…
▽ More
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mismatch causes the performance degradation of IDLMA. To tackle this problem, we focus on a blind source separation counterpart of IDLMA, independent low-rank matrix analysis. It uses nonnegative matrix factorization (NMF) as the source model, which can capture source spectral components that only appear in the target mixture, using the low-rank structure of the source spectrogram as a clue. We thus extend the DNN-based source model to encompass the NMF-based source model on the basis of the product-of-expert concept, which we call the product of source models (PoSM). For the proposed PoSM-based IDLMA, we derive a computationally efficient parameter estimation algorithm based on an optimization principle called the majorization-minimization algorithm. Experimental evaluations show the effectiveness of the proposed method.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation
Authors:
Naoki Narisawa,
Rintaro Ikeshita,
Norihiro Takamune,
Daichi Kitamura,
Tomohiko Nakamura,
Hiroshi Saruwatari,
Tomohiro Nakatani
Abstract:
We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor…
▽ More
We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor analysis has been proposed. This unsupervised (blind) method, however, severely restrict the structure of frequency covariance matrices (FCMs) to reduce the number of model parameters. As an extension of these conventional approaches, we here propose a supervised method that models FCMs using deep neural networks (DNNs). It is difficult to directly infer FCMs using DNNs. Therefore, we also propose a new FCM model represented as a convex combination of a diagonal FCM and a rank-1 FCM. Our FCM model is flexible enough to not only consider inter-frequency correlation, but also capture the dynamics of time-varying FCMs of nonstationary signals. We infer the proposed FCMs using two DNNs: DNN for power spectrum estimation and DNN for time-domain signal estimation. An experimental result of separating music signals shows that the proposed method provides higher separation performance than IDLMA.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation
Authors:
Takuya Hasumi,
Tomohiko Nakamura,
Norihiro Takamune,
Hiroshi Saruwatari,
Daichi Kitamura,
Yu Takahashi,
Kazunobu Kondo
Abstract:
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art supervised multichannel audio source separation methods. It blindly estimates the demixing filters on the basis of source independence, using the source model estimated by the deep neural network (DNN). However, since the ratios of the source to interferer signals vary widely among time-frequency (TF) slots, it is di…
▽ More
Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art supervised multichannel audio source separation methods. It blindly estimates the demixing filters on the basis of source independence, using the source model estimated by the deep neural network (DNN). However, since the ratios of the source to interferer signals vary widely among time-frequency (TF) slots, it is difficult to obtain reliable estimated power spectrograms of sources at all TF slots. In this paper, we propose an IDLMA extension, empirical Bayesian IDLMA (EB-IDLMA), by introducing a prior distribution of source power spectrograms and treating the source power spectrograms as latent random variables. This treatment allows us to implicitly consider the reliability of the estimated source power spectrograms for the estimation of demixing filters through the hyperparameters of the prior distribution estimated by the DNN. Experimental evaluations show the effectiveness of EB-IDLMA and the importance of introducing the reliability of the estimated source power spectrograms.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction
Authors:
Yuto Kondo,
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such…
▽ More
Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such as independent low-rank matrix analysis, and estimates the source model parameters simultaneously. In the conventional RCSCME, a direction of the deficient basis is fixed in advance and only the scale is estimated; however, the candidate of this deficient basis is not unique in general. In the proposed RCSCME model, the deficient basis itself can be accurately estimated as a vector variable by solving a vector optimization problem. Also, we derive new update rules based on the EM algorithm. We confirm that the proposed method outperforms conventional methods under several noise conditions.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution
Authors:
Keigo Kamo,
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
In this paper, we address a statistical model extension of multichannel nonnegative matrix factorization (MNMF) for blind source separation, and we propose a new parameter update algorithm used in the sub-Gaussian model. MNMF employs full-rank spatial covariance matrices and can simulate situations in which the reverberation is strong and the sources are not point sources. In conventional MNMF, sp…
▽ More
In this paper, we address a statistical model extension of multichannel nonnegative matrix factorization (MNMF) for blind source separation, and we propose a new parameter update algorithm used in the sub-Gaussian model. MNMF employs full-rank spatial covariance matrices and can simulate situations in which the reverberation is strong and the sources are not point sources. In conventional MNMF, spectrograms of observed signals are assumed to follow a multivariate Gaussian distribution. In this paper, first, to extend the MNMF model, we introduce the multivariate generalized Gaussian distribution as the multivariate sub-Gaussian distribution. Since the cost function of MNMF based on this multivariate sub-Gaussian model is difficult to minimize, we additionally introduce the joint-diagonalizability constraint in spatial covariance matrices to MNMF similarly to FastMNMF, and transform the cost function to the form to which we can apply the auxiliary functions to derive the valid parameter update rules. Finally, from blind source separation experiments, we show that the proposed method outperforms the conventional methods in source-separation accuracy.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution
Authors:
Tatsuki Kondo,
Kanta Fukushige,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Rintaro Ikeshita,
Tomohiro Nakatani
Abstract:
In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does…
▽ More
In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does not guarantee stable convergence. To resolve these problems, first, we propose to extend the generative model to a parametric multivariate Student's t distribution that can deal with various types of signal. Secondly, we derive a new parameter optimization algorithm that guarantees the monotonic nonincrease in the cost function, providing stable convergence. Experimental results reveal that the cost function in the conventional IPSDTA does not display monotonically nonincreasing properties. On the other hand, the proposed method guarantees the monotonic nonincrease in the cost function and outperforms the conventional ILRMA and IPSDTA in the source-separation performance.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process
Authors:
Keigo Kamo,
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. Recently, FastMNMF has been proposed as a fast version of multichannel nonnegative matrix factorization under the assumption that the spatial covariance matrices of multiple…
▽ More
In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. Recently, FastMNMF has been proposed as a fast version of multichannel nonnegative matrix factorization under the assumption that the spatial covariance matrices of multiple sources can be jointly diagonalized. However, its source-separation performance was not improved and the physical meaning of the joint-diagonalization process was unclear. To resolve these problems, we first reveal a close relationship between the joint-diagonalization process and the demixing system used in independent low-rank matrix analysis (ILRMA). Next, motivated by this fact, we propose a new regularized FastMNMF supported by ILRMA and derive convergence-guaranteed parameter update rules. From BSS experiments, we show that the proposed method outperforms the conventional FastMNMF in source-separation accuracy with almost the same computation time.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction
Authors:
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
In this paper, we propose new accelerated update rules for rank-constrained spatial covariance model estimation, which efficiently extracts a directional target source in diffuse background noise.The naive updat e rule requires heavy computation such as matrix inversion or matrix multiplication. We resolve this problem by expanding matrix inversion to reduce computational complexity; in the parame…
▽ More
In this paper, we propose new accelerated update rules for rank-constrained spatial covariance model estimation, which efficiently extracts a directional target source in diffuse background noise.The naive updat e rule requires heavy computation such as matrix inversion or matrix multiplication. We resolve this problem by expanding matrix inversion to reduce computational complexity; in the parameter update step, we need neither matrix inversion nor multiplication. In an experiment, we show that the proposed accelerated update rule achieves 87 times faster calculation than the naive one.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation
Authors:
Yuki Kubo,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
In this paper, we propose a new algorithm that efficiently separates a directional source and diffuse background noise based on independent low-rank matrix analysis (ILRMA). ILRMA is one of the state-of-the-art techniques of blind source separation (BSS) and is based on a rank-1 spatial model. Although such a model does not hold for diffuse noise, ILRMA can accurately estimate the spatial paramete…
▽ More
In this paper, we propose a new algorithm that efficiently separates a directional source and diffuse background noise based on independent low-rank matrix analysis (ILRMA). ILRMA is one of the state-of-the-art techniques of blind source separation (BSS) and is based on a rank-1 spatial model. Although such a model does not hold for diffuse noise, ILRMA can accurately estimate the spatial parameters of the directional source. Motivated by this fact, we utilize these estimates to restore the lost spatial basis of diffuse noise, which can be considered as an efficient full-rank spatial covariance estimation. BSS experiments show the efficacy of the proposed method in terms of the computational cost and separation performance.
△ Less
Submitted 18 June, 2019; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network
Authors:
Shinnosuke Takamichi,
Yuki Saito,
Norihiro Takamune,
Daichi Kitamura,
Hiroshi Saruwatari
Abstract:
This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic s…
▽ More
This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic speech. Addressing this problem, we introduce the von-Mises-distribution DNN for phase reconstruction. The DNN is a generative model having the von Mises distribution that can model distributions of a periodic variable such as a phase, and the model parameters of the DNN are estimated on the basis of the maximum likelihood criterion. Furthermore, we propose a group-delay loss for DNN training to make the predicted group delay close to a natural group delay. The experimental results demonstrate that 1) the trained DNN can predict group delay accurately more than phases themselves, and 2) our phase reconstruction methods achieve better speech quality than the conventional Griffin-Lim method.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation
Authors:
Shinichi Mogami,
Hayato Sumino,
Daichi Kitamura,
Norihiro Takamune,
Shinnosuke Takamichi,
Hiroshi Saruwatari,
Nobutaka Ono
Abstract:
In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source g…
▽ More
In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source generative model including both complex Gaussian and Cauchy distributions. Experiments are conducted using music signals with a training dataset, and the results show the validity of the proposed method in terms of separation accuracy and computational cost.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
Independent Low-Rank Matrix Analysis Based on Parametric Majorization-Equalization Algorithm
Authors:
Yoshiki Mitsui,
Daichi Kitamura,
Norihiro Takamune,
Hiroshi Saruwatari,
Yu Takahashi,
Kazunobu Kondo
Abstract:
In this paper, we propose a new optimization method for independent low-rank matrix analysis (ILRMA) based on a parametric majorization-equalization algorithm. ILRMA is an efficient blind source separation technique that simultaneously estimates a spatial demixing matrix (spatial model) and the power spectrograms of each estimated source (source model). In ILRMA, since both models are alternately…
▽ More
In this paper, we propose a new optimization method for independent low-rank matrix analysis (ILRMA) based on a parametric majorization-equalization algorithm. ILRMA is an efficient blind source separation technique that simultaneously estimates a spatial demixing matrix (spatial model) and the power spectrograms of each estimated source (source model). In ILRMA, since both models are alternately optimized by iterative update rules, the difference in the convergence speeds between these models often results in a poor local solution. To solve this problem, we introduce a new parameter that controls the convergence speed of the source model and find the best balance between the optimizations in the spatial and source models for ILRMA.
△ Less
Submitted 4 October, 2017;
originally announced October 2017.
-
Independent Low-Rank Matrix Analysis Based on Complex Student's $t$-Distribution for Blind Audio Source Separation
Authors:
Shinichi Mogami,
Daichi Kitamura,
Yoshiki Mitsui,
Norihiro Takamune,
Hiroshi Saruwatari,
Nobutaka Ono
Abstract:
In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we intro…
▽ More
In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we introduce an isotropic complex Student's $t$-distribution as a source generative model, which includes the isotropic complex Gaussian distribution used in conventional ILRMA. Experiments are conducted using both music and speech BSS tasks, and the results show the validity of the proposed method.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.