Skip to main content

Showing 1–14 of 14 results for author: Kleijn, W B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2303.12984  [pdf, other

    cs.SD eess.AS

    LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models

    Authors: Teerapat Jenrungrot, Michael Chinen, W. Bastiaan Kleijn, Jan Skoglund, Zalán Borsos, Neil Zeghidour, Marco Tagliasacchi

    Abstract: We introduce LMCodec, a causal neural speech codec that provides high quality audio at very low bitrates. The backbone of the system is a causal convolutional codec that encodes audio into a hierarchy of coarse-to-fine tokens using residual vector quantization. LMCodec trains a Transformer language model to predict the fine tokens from the coarse ones in a generative fashion, allowing for the tran… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 5 pages, accepted to ICASSP 2023, project page: https://mjenrungrot.github.io/chrome-media-audio-papers/publications/lmcodec

  2. arXiv:2301.09198  [pdf, other

    eess.AS cs.SD

    Estimation of Source and Receiver Positions, Room Geometry and Reflection Coefficients From a Single Room Impulse Response

    Authors: Wangyang Yu, W. Bastiaan Kleijn

    Abstract: We propose an algorithm to estimate source and receiver positions, room geometry and reflection coefficients from a single room impulse response simultaneously. It is based on a symmetry analysis of the room impulse response. The proposed method utilizes the times of arrivals of the direct path, first order reflections and second order reflections. The proposed method is robust to erroneous pulses… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.

  3. arXiv:2207.02262  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

    Authors: Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund

    Abstract: Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective rec… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Proceedings of INTERSPEECH 2022

  4. arXiv:2204.02040  [pdf

    cs.SD cs.CR eess.AS

    On the Relevance of Bandwidth Extension for Speaker Verification

    Authors: Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn

    Abstract: In this paper, we consider the effect of a bandwidth extension of narrow-band speech signals (0.3-3.4 kHz) to 0.3-8 kHz on speaker verification. Using covariance matrix based verification systems together with detection error trade-off curves, we compare the performance between systems operating on narrow-band, wide-band (0-8 kHz), and bandwidth-extended speech. The experiments were conducted usin… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 4 pages published in 7th International Conference on Spoken Language Processing, September 16-20, 2002, Denver, Colorado, USA. arXiv admin note: text overlap with arXiv:2202.13865

    Journal ref: 7th International Conference on Spoken Language Processing (ICSLP2002), September 16-20, 2002

  5. arXiv:2202.13865  [pdf

    cs.SD cs.LG eess.AS

    On the relevance of bandwidth extension for speaker identification

    Authors: Marcos Faundez-Zanuy, Mattias Nilsson, W. Bastiaan Kleijn

    Abstract: In this paper we discuss the relevance of bandwidth extension for speaker identification tasks. Mainly we want to study if it is possible to recognize voices that have been bandwith extended. For this purpose, we created two different databases (microphonic and ISDN) of speech signals that were bandwidth extended from telephone bandwidth ([300, 3400] Hz) to full bandwidth ([100, 8000] Hz). We have… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: 4 pages

    Journal ref: 2002 11th European Signal Processing Conference, 2002, pp. 1-4

  6. arXiv:2102.11906  [pdf, other

    eess.AS cs.SD

    Handling Background Noise in Neural Speech Generation

    Authors: Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund

    Abstract: Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: 5 pages, 3 figures, presented at the Asilomar Conference on Signals, Systems, and Computers 2020

  7. arXiv:2102.09660  [pdf, other

    eess.AS cs.SD

    Generative Speech Coding with Predictive Variance Regularization

    Authors: W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh

    Abstract: The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the in… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    MSC Class: 94 ACM Class: I.m

  8. arXiv:1912.08308  [pdf, other

    eess.SP

    Distributed Network Privacy using Error Correcting Codes

    Authors: Matt O'Connor, W. Bastiaan Kleijn

    Abstract: Most current distributed processing research deals with improving the flexibility and convergence speed of algorithms for networks of finite size with no constraints on information sharing and no concept for expected levels of signal privacy. In this work we investigate the concept of data privacy in unbounded public networks, where linear codes are used to create hard limits on the number of node… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

  9. arXiv:1909.04776  [pdf, other

    eess.AS cs.SD

    Generative Speech Enhancement Based on Cloned Networks

    Authors: Michael Chinen, W. Bastiaan Kleijn, Felicia S. C. Lim, Jan Skoglund

    Abstract: We propose to implement speech enhancement by the regeneration of clean speech from a salient representation extracted from the noisy signal. The network that extracts salient features is trained using a set of weight-sharing clones of the extractor network. The clones receive mel-frequency spectra of different noisy versions of the same speech signal as input. By encouraging the outputs of the cl… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: Accepted WASPAA 2019

  10. arXiv:1908.07045  [pdf, other

    eess.AS cs.SD

    Salient Speech Representations Based on Cloned Networks

    Authors: W. Bastiaan Kleijn, Felicia S. C. Lim, Michael Chinen, Jan Skoglund

    Abstract: We define salient features as features that are shared by signals that are defined as being equivalent by a system designer. The definition allows the designer to contribute qualitative information. We aim to find salient features that are useful as conditioning for generative networks. We extract salient features by jointly training a set of clones of an encoder network. Each network clone receiv… ▽ More

    Submitted 19 August, 2019; originally announced August 2019.

    Comments: Interspeech 2019

  11. arXiv:1904.00869  [pdf, ps, other

    eess.AS

    Room Geometry Estimation from Room Impulse Responses using Convolutional Neural Networks

    Authors: Wangyang Yu, W. Bastiaan Kleijn

    Abstract: We describe a new method to estimate the geometry of a room given room impulse responses. The method utilises convolutional neural networks to estimate the room geometry and uses the mean square error as the loss function. In contrast to existing methods, we do not require the position or distance of sources or receivers in the room. The method can be used with only a single room impulse response… ▽ More

    Submitted 15 May, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

  12. arXiv:1807.11320  [pdf, other

    cs.LG eess.SP stat.ML

    Kernel Density Estimation-Based Markov Models with Hidden State

    Authors: Gustav Eje Henter, Arne Leijon, W. Bastiaan Kleijn

    Abstract: We consider Markov models of stochastic processes where the next-step conditional distribution is defined by a kernel density estimator (KDE), similar to Markov forecast densities and certain time-series bootstrap schemes. The KDE Markov models (KDE-MMs) we discuss are nonlinear, nonparametric, fully probabilistic representations of stationary processes, based on techniques with strong asymptotic… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: 14 pages, 6 figures

    MSC Class: 62M10; 62G07 ACM Class: G.3

  13. arXiv:1803.06718  [pdf, other

    eess.AS cs.SD

    Directional emphasis in ambisonics

    Authors: W. Bastiaan Kleijn

    Abstract: We describe an ambisonics enhancement method that increases the signal strength in specified directions at low computational cost. The method can be used in a static setup to emphasize the signal arriving from a particular direction or set of directions. It can also be used in an adaptive arrangement where it sharpens directionality and reduces the distortion in timbre associated with low-degree a… ▽ More

    Submitted 24 May, 2018; v1 submitted 18 March, 2018; originally announced March 2018.

  14. arXiv:1712.01120  [pdf, other

    eess.AS cs.SD eess.SP

    Wavenet based low rate speech coding

    Authors: W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters

    Abstract: Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative m… ▽ More

    Submitted 1 December, 2017; originally announced December 2017.

    Comments: 5 pages, 2 figures