Skip to main content

Showing 1–23 of 23 results for author: Scheibler, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12194  [pdf, other

    eess.AS cs.SD

    Universal Score-based Speech Enhancement with High Content Preservation

    Authors: Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu

    Abstract: We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions are three-fold. First, we make several modifications to the network architecture, improving training stability and final performance. Second, we intr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 5 pages, 5 figures, accepted at Interspeech 2024

  2. arXiv:2406.04660  [pdf, other

    eess.AS cs.SD

    URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

    Authors: Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

    Abstract: The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generaliza… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 figures, 3 tables. Accepted by Interspeech 2024. An extended version of the accepted manuscript with appendix

  3. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, **chuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by develo** impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  4. arXiv:2303.06806  [pdf, other

    eess.AS cs.CL cs.SD

    Neural Diarization with Non-autoregressive Intermediate Attractors

    Authors: Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

    Abstract: End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speaker labels simultaneously, it disregards output label dependency. In this work, we propose a novel EEND model that introduces the label dependency betw… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  5. arXiv:2210.17327  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion-based Generative Speech Source Separation

    Authors: Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe, Min-Seok Choi

    Abstract: We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a… ▽ More

    Submitted 2 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2023

  6. arXiv:2207.09514  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

    Authors: Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe

    Abstract: This paper presents recent progress on integrating speech separation and enhancement (SSE) into the ESPnet toolkit. Compared with the previous ESPnet-SE work, numerous features have been added, including recent state-of-the-art speech enhancement models with their respective training and evaluation recipes. Importantly, a new interface has been designed to flexibly combine speech enhancement front… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: To appear in Interspeech 2022

  7. arXiv:2204.00218  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Multi-speaker ASR with Independent Vector Analysis

    Authors: Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe, Yanmin Qian

    Abstract: We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition. We propose a frontend for joint source separation and dereverberation based on the independent vector analysis (IVA) paradigm. It uses the fast and stable iterative source steering algorithm together with a neural source model. The parameters from the ASR module and the neural source model are optimized… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH2022. 5 pages, 2 figures, 3 tables

  8. arXiv:2204.00210  [pdf, other

    eess.AS

    Spatial Loss for Unsupervised Multi-channel Source Separation

    Authors: Kohei Saijo, Robin Scheibler

    Abstract: We propose a spatial loss for unsupervised multi-channel source separation. The proposed loss exploits the duality of direction of arrival (DOA) and beamforming: the steering and beamforming vectors should be aligned for the target source, but orthogonal for interfering ones. The spatial loss encourages consistency between the mixing and demixing systems from a classic DOA estimator and a neural s… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH2022

  9. arXiv:2202.08456  [pdf, other

    eess.AS cs.LG cs.SD

    MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

    Authors: ** Sakuma, Tatsuya Komatsu, Robin Scheibler

    Abstract: We propose multi-layer perceptron (MLP)-based architectures suitable for variable length input. MLP-based architectures, recently proposed for image classification, can only be used for inputs of a fixed, pre-defined size. However, many types of data are naturally variable in length, for example, acoustic signals. We propose three approaches to extend MLP-based architectures for use with sequences… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 8 pages, 4 figures

  10. arXiv:2110.06545  [pdf, other

    eess.AS

    Independence-based Joint Dereverberation and Separation with Neural Source Model

    Authors: Kohei Saijo, Robin Scheibler

    Abstract: We propose an independence-based joint dereverberation and separation method with a neural source model. We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation. The network is trained in an end-to-end manner with a permutation invariant loss on the time-domain separ… ▽ More

    Submitted 1 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Submitted to INTERSPEECH2022

  11. arXiv:2110.06440  [pdf, other

    eess.AS cs.SD eess.SP

    SDR -- Medium Rare with Fast Computations

    Authors: Robin Scheibler

    Abstract: We revisit the widely used bss eval metrics for source separation with an eye out for performance. We propose a fast algorithm fixing shortcomings of publicly available implementations. First, we show that the metrics are fully specified by the squared cosine of just two angles between estimate and reference subspaces. Second, large linear systems are involved. However, they are structured, and we… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2022

  12. arXiv:2106.01011  [pdf, other

    eess.SP cs.SD eess.AS math.OC

    Refinement of Direction of Arrival Estimators by Majorization-Minimization Optimization on the Array Manifold

    Authors: Robin Scheibler, Masahito Togami

    Abstract: We propose a generalized formulation of direction of arrival estimation that includes many existing methods such as steered response power, subspace, coherent and incoherent, as well as speech sparsity-based methods. Unlike most conventional methods that rely exclusively on grid search, we introduce a continuous optimization algorithm to refine DOA estimates beyond the resolution of the initial gr… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 5 pages, 2 figures, 2 tables. Presented at IEEE ICASSP 2021

    Journal ref: Proc. IEEE ICASSP, pp. 436-440, June, 2021

  13. Joint Dereverberation and Separation with Iterative Source Steering

    Authors: Taishi Nakashima, Robin Scheibler, Masahito Togami, Nobutaka Ono

    Abstract: We propose a new algorithm for joint dereverberation and blind source separation (DR-BSS). Our work builds upon the IRLMA-T framework that applies a unified filter combining dereverberation and separation. One drawback of this framework is that it requires several matrix inversions, an operation inherently costly and with potential stability issues. We leverage the recently introduced iterative so… ▽ More

    Submitted 31 May, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: 5 pages, 2 figures, accepted at ICASSP 2021

  14. arXiv:2011.05540  [pdf, other

    eess.AS cs.SD eess.SP

    Surrogate Source Model Learning for Determined Source Separation

    Authors: Robin Scheibler, Masahito Togami

    Abstract: We propose to learn surrogate functions of universal speech priors for determined blind speech separation. Deep speech priors are highly desirable due to their high modelling power, but are not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required surrogate function is not easy, nor always possible. Instead, we do away… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: 5 pages, 3 figures, 1 table. Submitted to ICASSP 2021

  15. arXiv:2009.05288  [pdf, other

    eess.AS cs.SD eess.SP

    Generalized Minimal Distortion Principle for Blind Source Separation

    Authors: Robin Scheibler

    Abstract: We revisit the source image estimation problem from blind source separation (BSS). We generalize the traditional minimum distortion principle to maximum likelihood estimation with a model for the residual spectrograms. Because residual spectrograms typically contain other sources, we propose to use a mixed-norm model that lets us finely tune sparsity in time and frequency. We propose to carry out… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Comments: 5 pages, 1 figure, 2 tables, Accepted at INTERSPEECH 2020

  16. arXiv:2008.10048  [pdf, other

    eess.SP cs.SD eess.AS

    Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization

    Authors: Robin Scheibler

    Abstract: We propose a new algorithm for blind source separation (BSS) using independent vector analysis (IVA). This is an improvement over the popular auxiliary function based IVA (AuxIVA) with iterative projection (IP) or iterative source steering (ISS). We introduce iterative projection with adjustment (IPA), where we update one demixing filter and jointly adjust all the other sources along its current d… ▽ More

    Submitted 18 May, 2021; v1 submitted 23 August, 2020; originally announced August 2020.

    Comments: 16 pages, 6 figures, 4 tables

    Journal ref: IEEE Transactions on Signal Processing, Vol. 69, pp. 2509 - 2524, April 2021

  17. arXiv:2006.02774  [pdf, other

    cs.SD eess.AS

    A study on more realistic room simulation for far-field keyword spotting

    Authors: Eric Bezzam, Robin Scheibler, Cyril Cadoux, Thibault Gisselbrecht

    Abstract: We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study… ▽ More

    Submitted 18 November, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

    Comments: 7 pages, 4 figures, accepted at APSIPA 2020, room impulse response generation code can be found at https://github.com/ebezzam/room-simulation

  18. arXiv:2004.03926  [pdf, other

    eess.SP cs.SD eess.AS

    MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: In this work, we propose efficient algorithms for joint independent subspace analysis (JISA), an extension of independent component analysis that deals with parallel mixtures, where not all the components are independent. We derive an algorithmic framework for JISA based on the majorization-minimization (MM) optimization technique (JISA-MM). We use a well-known inequality for super-Gaussian source… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: 15 pages, 4 figures

  19. arXiv:1910.10654  [pdf, other

    cs.SD eess.AS eess.SP

    Fast Independent Vector Extraction by Iterative SINR Maximization

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We propose fast independent vector extraction (FIVE), a new algorithm that blindly extracts a single non-Gaussian source from a Gaussian background. The algorithm iteratively computes beamforming weights maximizing the signal-to-interference-and-noise ratio for an approximate noise covariance matrix. We demonstrate that this procedure minimizes the negative log-likelihood of the input data accordi… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 5 pages, 4 figures, Submitted to ICASSP 2020

  20. arXiv:1905.07880  [pdf, other

    cs.SD eess.AS

    Independent Vector Analysis with more Microphones than Sources

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. The proposed algorithm is based on a parametrization of the demixing matrix decreasing the number of parameters to estimate. Furthermore, orthogonal constraints between the signal… ▽ More

    Submitted 7 August, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: Accepted to WASPAA 2019, 5 pages, 3 figures

  21. Multi-modal Blind Source Separation with Microphones and Blinkies

    Authors: Robin Scheibler, Nobutaka Ono

    Abstract: We propose a blind source separation algorithm that jointly exploits measurements by a conventional microphone array and an ad hoc array of low-rate sound power sensors called blinkies. While providing less information than microphones, blinkies circumvent some difficulties of microphone arrays in terms of manufacturing, synchronization, and deployment. The algorithm is derived from a joint probab… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figures

  22. Separake: Source Separation with a Little Help From Echoes

    Authors: Robin Scheibler, Diego Di Carlo, Antoine Deleforge, Ivan Dokmanić

    Abstract: It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show that multipath in fact helps sound source separation, even with very simple propagation models. Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. We rather assume that we know the positions of a few virtual microphones… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.

  23. Pyroomacoustics: A Python package for audio room simulations and array processing algorithms

    Authors: Robin Scheibler, Eric Bezzam, Ivan Dokmanić

    Abstract: We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the imag… ▽ More

    Submitted 11 October, 2017; originally announced October 2017.

    Comments: 5 pages, 5 figures, describes a software package