Skip to main content

Showing 1–19 of 19 results for author: Takamune, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17722  [pdf, other

    cs.SD eess.AS

    Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

    Authors: Kentaro Seki, Shinnosuke Takamichi, Norihiro Takamune, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari

    Abstract: This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the stereo listening experience inherent in human hearing. Our baseline approach addresses this gap by integrating blind source separation (BSS), voice conve… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2403.12477  [pdf, other

    cs.SD eess.AS

    Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation

    Authors: Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari

    Abstract: Real-time speech extraction is an important challenge with various applications such as speech recognition in a human-like avatar/robot. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank matrix analysis (ILRMA) and rank-constrained spatial covariance matrix estimation (RCSCME). The RCSCME-based method is a multichannel blind speech extrac… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures, accepted at HSCMA 2024

  3. arXiv:2306.12820  [pdf, other

    cs.SD eess.AS

    NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction

    Authors: Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani

    Abstract: In this paper, we address the multichannel blind source extraction (BSE) of a single source in diffuse noise environments. To solve this problem even faster than by fast multichannel nonnegative matrix factorization (FastMNMF) and its variant, we propose a BSE method called NoisyILRMA, which is a modification of independent low-rank matrix analysis (ILRMA) to account for diffuse noise. NoisyILRMA… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2023 (EUSIPCO 2023)

  4. arXiv:2306.12169  [pdf, other

    cs.HC

    HumanDiffusion: diffusion model using perceptual gradients

    Authors: Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari

    Abstract: We propose {\it HumanDiffusion,} a diffusion model trained from humans' perceptual gradients to learn an acceptable range of data for humans (i.e., human-acceptable distribution). Conventional HumanGAN aims to model the human-acceptable distribution wider than the real-data distribution by training a neural network-based generator with human-based discriminators. However, HumanGAN training tends t… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  5. Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides

    Authors: Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, Hiroshi Saruwatari

    Abstract: In this paper, we propose algorithms for handling non-integer strides in sampling-frequency-independent (SFI) convolutional and transposed convolutional layers. The SFI layers have been developed for handling various sampling frequencies (SFs) by a single neural network. They are replaceable with their non-SFI counterparts and can be introduced into various network architectures. However, they cou… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2023 (EUSIPCO 2023)

    Journal ref: European Signal Processing Conference, Sep. 2023, pp. 326--330

  6. arXiv:2109.04658  [pdf, ps, other

    cs.SD eess.AS

    Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis

    Authors: Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

    Abstract: Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method,… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: accepted for APSIPA2021

  7. arXiv:2109.00704  [pdf, ps, other

    cs.SD eess.AS

    Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models

    Authors: Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo

    Abstract: Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art multichannel audio source separation methods using the source power estimation based on deep neural networks (DNNs). The DNN-based power estimation works well for sounds having timbres similar to the DNN training data. However, the sounds to which IDLMA is applied do not always have such timbres, and the timbral mism… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: 8 pages, 5 figures, accepted for Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA ASC 2021)

  8. arXiv:2106.05529  [pdf, other

    cs.SD eess.AS

    Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation

    Authors: Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani

    Abstract: We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 5 pages, 2 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)

  9. arXiv:2106.03492  [pdf, other

    cs.SD eess.AS

    Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation

    Authors: Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo

    Abstract: Independent deeply learned matrix analysis (IDLMA) is one of the state-of-the-art supervised multichannel audio source separation methods. It blindly estimates the demixing filters on the basis of source independence, using the source model estimated by the deep neural network (DNN). However, since the ratios of the source to interferer signals vary widely among time-frequency (TF) slots, it is di… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 5 pages, 4 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)

  10. arXiv:2105.02491  [pdf, other

    cs.SD eess.AS

    Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction

    Authors: Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: Rank-constrained spatial covariance matrix estimation (RCSCME) is a state-of-the-art blind speech extraction method applied to cases where one directional target speech and diffuse noise are mixed. In this paper, we proposed a new algorithmic extension of RCSCME. RCSCME complements a deficient one rank of the diffuse noise spatial covariance matrix, which cannot be estimated via preprocessing such… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, ICASSP2021

  11. arXiv:2007.00416  [pdf, other

    cs.SD eess.AS

    Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution

    Authors: Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: In this paper, we address a statistical model extension of multichannel nonnegative matrix factorization (MNMF) for blind source separation, and we propose a new parameter update algorithm used in the sub-Gaussian model. MNMF employs full-rank spatial covariance matrices and can simulate situations in which the reverberation is strong and the sources are not point sources. In conventional MNMF, sp… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

    Comments: 5 pages, 3 figures, To appear in the Proceedings of the 28th European Signal Processing Conference (EUSIPCO 2020). arXiv admin note: text overlap with arXiv:2002.00579

  12. arXiv:2002.08582  [pdf, ps, other

    cs.SD eess.AS eess.SP

    Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution

    Authors: Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani

    Abstract: In this paper, we address a blind source separation (BSS) problem and propose a new extended framework of independent positive semidefinite tensor analysis (IPSDTA). IPSDTA is a state-of-the-art BSS method that enables us to take interfrequency correlations into account, but the generative model is limited within the multivariate Gaussian distribution and its parameter optimization algorithm does… ▽ More

    Submitted 20 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

  13. arXiv:2002.00579  [pdf, other

    cs.SD eess.AS

    Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process

    Authors: Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: In this paper, we address a convolutive blind source separation (BSS) problem and propose a new extended framework of FastMNMF by introducing prior information for joint diagonalization of the spatial covariance matrix model. Recently, FastMNMF has been proposed as a fast version of multichannel nonnegative matrix factorization under the assumption that the spatial covariance matrices of multiple… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

    Comments: 5 pages, 3 figures, To appear in the Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

  14. arXiv:1908.01964  [pdf, other

    cs.SD eess.AS

    Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction

    Authors: Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: In this paper, we propose new accelerated update rules for rank-constrained spatial covariance model estimation, which efficiently extracts a directional target source in diffuse background noise.The naive updat e rule requires heavy computation such as matrix inversion or matrix multiplication. We resolve this problem by expanding matrix inversion to reduce computational complexity; in the parame… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

    Comments: 7 pages, 3 figures, To appear in the Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2019 (APSIPA 2019)

  15. arXiv:1906.02482  [pdf, other

    cs.SD eess.AS

    Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation

    Authors: Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: In this paper, we propose a new algorithm that efficiently separates a directional source and diffuse background noise based on independent low-rank matrix analysis (ILRMA). ILRMA is one of the state-of-the-art techniques of blind source separation (BSS) and is based on a rank-1 spatial model. Although such a model does not hold for diffuse noise, ILRMA can accurately estimate the spatial paramete… ▽ More

    Submitted 18 June, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: 5 pages, 3 figures, To appear in the Proceedings of the 27th European Signal Processing Conference (EUSIPCO 2019)

  16. arXiv:1807.03474  [pdf, ps, other

    cs.SD eess.AS

    Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network

    Authors: Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari

    Abstract: This paper presents a deep neural network (DNN)-based phase reconstruction from amplitude spectrograms. In audio signal and speech processing, the amplitude spectrogram is often used for processing, and the corresponding phase spectrogram is reconstructed from the amplitude spectrogram on the basis of the Griffin-Lim method. However, the Griffin-Lim method causes unnatural artifacts in synthetic s… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: To appear in the Proc. of IWAENC2018

  17. arXiv:1806.10307  [pdf, other

    eess.AS cs.SD

    Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation

    Authors: Shinichi Mogami, Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source g… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

    Comments: 5 pages, 4 figures, To appear in the Proceedings of the 26th European Signal Processing Conference (EUSIPCO 2018)

  18. arXiv:1710.01589  [pdf, other

    cs.SD eess.AS

    Independent Low-Rank Matrix Analysis Based on Parametric Majorization-Equalization Algorithm

    Authors: Yoshiki Mitsui, Daichi Kitamura, Norihiro Takamune, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo

    Abstract: In this paper, we propose a new optimization method for independent low-rank matrix analysis (ILRMA) based on a parametric majorization-equalization algorithm. ILRMA is an efficient blind source separation technique that simultaneously estimates a spatial demixing matrix (spatial model) and the power spectrograms of each estimated source (source model). In ILRMA, since both models are alternately… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

    Comments: Preprint Manuscript of 2017 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP 2017)

  19. arXiv:1708.04795  [pdf, ps, other

    cs.SD

    Independent Low-Rank Matrix Analysis Based on Complex Student's $t$-Distribution for Blind Audio Source Separation

    Authors: Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we generalize a source generative model in a state-of-the-art blind source separation (BSS), independent low-rank matrix analysis (ILRMA). ILRMA is a unified method of frequency-domain independent component analysis and nonnegative matrix factorization and can provide better performance for audio BSS tasks. To further improve the performance and stability of the separation, we intro… ▽ More

    Submitted 16 August, 2017; originally announced August 2017.

    Comments: Preprint manuscript of 2017 IEEE International Workshop on Machine Learning for Signal Processing