Skip to main content

Showing 1–19 of 19 results for author: Onoe, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.08264  [pdf, other

    cs.MM cs.CV eess.AS

    Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

    Authors: Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono

    Abstract: Observations with distributed sensors are essential in analyzing a series of human and machine activities (referred to as 'events' in this paper) in complex and extensive real-world environments. This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze eve… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 13page, 7figure, under review

  2. arXiv:2307.12232  [pdf, other

    cs.SD eess.AS eess.SP

    Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase

    Authors: Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

    Abstract: We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE WASPAA 2023

  3. arXiv:2307.12231  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

    Authors: Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

    Abstract: Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. In detail, we explore multi-channel separation methods, mask-based beamforming and comp… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: Accepted to IEEE WASPAA 2023

  4. arXiv:2302.10536  [pdf, other

    cs.SD cs.AI eess.AS

    Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

    Authors: Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe

    Abstract: Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches convert emotions for seen speaker-emotion combinations only. In this paper, we tackle the problem of converting the emotion of speakers whose only neutral data ar… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Demo Samples at https://demosamplesites.github.io/EVCUP/

  5. arXiv:2302.07928  [pdf, other

    eess.AS cs.SD eess.SP

    Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

    Authors: Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe, Manuel Pariente, Nobutaka Ono

    Abstract: This paper describes our submission to the Second Clarity Enhancement Challenge (CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy-reverberant environments with multiple interferers such as music and competing speakers. Our approach builds upon the powerful iterative neural/beamforming enhancement (iNeuBe) framework introduced in our recent work, and this p… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  6. arXiv:2210.10742  [pdf, other

    cs.SD eess.AS

    End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation

    Authors: Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe, Nobutaka Ono

    Abstract: Self-supervised learning representation (SSLR) has demonstrated its significant effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work pointed out the strength of integrating SSLR with single-channel speech enhancement for ASR in noisy environments. This paper further advances this integration by dealing with multi-channel input. We propose a novel end-to-end ar… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  7. arXiv:2209.00937  [pdf, other

    eess.AS

    Inverse-free Online Independent Vector Analysis with Flexible Iterative Source Steering

    Authors: Taishi Nakashima, Nobutaka Ono

    Abstract: In this paper, we propose a new online independent vector analysis (IVA) algorithm for real-time blind source separation (BSS). In many BSS algorithms, the iterative projection (IP) has been used for updating the demixing matrix, a parameter to be estimated in BSS. However, it requires matrix inversion, which can be costly, particularly in online processing. To improve this situation, we introduce… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

    Comments: 5 pages, 2 figures. Submitted to APSIPA 2022

  8. arXiv:2207.04357  [pdf, ps, other

    cs.SD eess.AS

    Joint Analysis of Acoustic Scenes and Sound Events with Weakly labeled Data

    Authors: Shunsuke Tsubaki, Keisuke Imoto, Nobutaka Ono

    Abstract: Considering that acoustic scenes and sound events are closely related to each other, in some previous papers, a joint analysis of acoustic scenes and sound events utilizing multitask learning (MTL)-based neural networks was proposed. In conventional methods, a strongly supervised scheme is applied to sound event detection in MTL models, which requires strong labels of sound events in model trainin… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: Accepted to IWAENC2022

  9. arXiv:2206.13014  [pdf, other

    eess.AS cs.SD eess.SP

    Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones

    Authors: Yoshiki Masuyama, Kouei Yamaoka, Nobutaka Ono

    Abstract: In this paper, we propose to simultaneously estimate all the sampling rate offsets (SROs) of multiple devices. In a distributed microphone array, the SRO is inevitable, which deteriorates the performance of array signal processing. Most of the existing SRO estimation methods focused on synchronizing two microphones. When synchronizing more than two microphones, we select one reference microphone a… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: 5 pages, 2 figures,accepted by Interspeech2022

  10. arXiv:2206.02187  [pdf, other

    cs.CV cs.SD eess.AS

    M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

    Authors: Vishal Chudasama, Purbayan Kar, Ashish Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Naoyuki Onoe

    Abstract: Emotion Recognition in Conversations (ERC) is crucial in develo** sympathetic human-machine interaction. In conversational videos, emotion can be present in multiple modalities, i.e., audio, video, and transcript. However, due to the inherent characteristics of these modalities, multi-modal ERC has always been considered a challenging undertaking. Existing ERC research focuses mainly on using te… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in the 5th Multimodal Learning and Applications (MULA) Workshop at CVPR 2022

  11. arXiv:2204.03173  [pdf, other

    cs.LG cs.AI eess.SP

    Automated Sleep Staging via Parallel Frequency-Cut Attention

    Authors: Zheng Chen, Ziwei Yang, Lingwei Zhu, Wei Chen, Toshiyo Tamura, Naoaki Ono, MD Altaf-Ul-Amin, Shigehiko Kanaya, Ming Huang

    Abstract: This paper proposes a novel framework for automatically capturing the time-frequency nature of electroencephalogram (EEG) signals of human sleep based on the authoritative sleep medicine guidance. The framework consists of two parts: the first part extracts informative features by partitioning the input EEG spectrograms into a sequence of time-frequency patches. The second part is constituted by a… ▽ More

    Submitted 12 January, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: 10 pages, 9 figures

  12. arXiv:2203.09723  [pdf, other

    eess.SP eess.AS

    Estimation of Consistent Time Delays in Subsample via Auxiliary-Function-Based Iterative Updates

    Authors: Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono

    Abstract: In this paper, we propose a new algorithm for the estimation of multiple time delays (TDs). Since a TD is a fundamental spatial cue for sensor array signal processing techniques, many methods for estimating it have been studied. Most of them, including generalized cross correlation (CC)-based methods, focus on how to estimate a TD between two sensors. These methods can then be easily adapted for m… ▽ More

    Submitted 23 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: 13 pages, 8 figures

  13. Joint Dereverberation and Separation with Iterative Source Steering

    Authors: Taishi Nakashima, Robin Scheibler, Masahito Togami, Nobutaka Ono

    Abstract: We propose a new algorithm for joint dereverberation and blind source separation (DR-BSS). Our work builds upon the IRLMA-T framework that applies a unified filter combining dereverberation and separation. One drawback of this framework is that it requires several matrix inversions, an operation inherently costly and with potential stability issues. We leverage the recently introduced iterative so… ▽ More

    Submitted 31 May, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, accepted at ICASSP 2021

  14. arXiv:2004.03926  [pdf, other

    eess.SP cs.SD eess.AS

    MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: In this work, we propose efficient algorithms for joint independent subspace analysis (JISA), an extension of independent component analysis that deals with parallel mixtures, where not all the components are independent. We derive an algorithmic framework for JISA based on the majorization-minimization (MM) optimization technique (JISA-MM). We use a well-known inequality for super-Gaussian source… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 15 pages, 4 figures

  15. arXiv:1910.10654  [pdf, other

    cs.SD eess.AS eess.SP

    Fast Independent Vector Extraction by Iterative SINR Maximization

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We propose fast independent vector extraction (FIVE), a new algorithm that blindly extracts a single non-Gaussian source from a Gaussian background. The algorithm iteratively computes beamforming weights maximizing the signal-to-interference-and-noise ratio for an approximate noise covariance matrix. We demonstrate that this procedure minimizes the negative log-likelihood of the input data accordi… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 5 pages, 4 figures, Submitted to ICASSP 2020

  16. arXiv:1905.07880  [pdf, other

    cs.SD eess.AS

    Independent Vector Analysis with more Microphones than Sources

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. The proposed algorithm is based on a parametrization of the demixing matrix decreasing the number of parameters to estimate. Furthermore, orthogonal constraints between the signal… ▽ More

    Submitted 7 August, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: Accepted to WASPAA 2019, 5 pages, 3 figures

  17. Multi-modal Blind Source Separation with Microphones and Blinkies

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We propose a blind source separation algorithm that jointly exploits measurements by a conventional microphone array and an ad hoc array of low-rate sound power sensors called blinkies. While providing less information than microphones, blinkies circumvent some difficulties of microphone arrays in terms of manufacturing, synchronization, and deployment. The algorithm is derived from a joint probab… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figures

  18. arXiv:1808.08056  [pdf, other

    eess.AS

    Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model

    Authors: Shinichi Mogami, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Hiroaki Nakajima, Nobutaka Ono

    Abstract: Independent low-rank matrix analysis (ILRMA) is a fast and stable method for blind audio source separation. Conventional ILRMAs assume time-variant (super-)Gaussian source models, which can only represent signals that follow a super-Gaussian distribution. In this paper, we focus on ILRMA based on a generalized Gaussian distribution (GGD-ILRMA) and propose a new type of GGD-ILRMA that adopts a time… ▽ More

    Submitted 24 August, 2018; originally announced August 2018.

    Comments: 8 pages, 5 figures, To appear in the Proceedings of APSIPA ASC 2018

  19. arXiv:1806.10307  [pdf, other

    eess.AS cs.SD

    Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation

    Authors: Shinichi Mogami, Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono

    Abstract: In this paper, we address a multichannel audio source separation task and propose a new efficient method called independent deeply learned matrix analysis (IDLMA). IDLMA estimates the demixing matrix in a blind manner and updates the time-frequency structures of each source using a pretrained deep neural network (DNN). Also, we introduce a complex Student's t-distribution as a generalized source g… ▽ More

    Submitted 27 June, 2018; originally announced June 2018.

    Comments: 5 pages, 4 figures, To appear in the Proceedings of the 26th European Signal Processing Conference (EUSIPCO 2018)