Search | arXiv e-print repository

arXiv:2005.12230 [pdf, other]

Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features

Authors: Atıl İlerialkan, Alptekin Temizel, Hüseyin Hacıhabiboğlu

Abstract: Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude… ▽ More Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude features are extracted using the Hilbert-Huang transform (HHT) and fed into a CNN-GRU network for classification of recordings from the open intraspeech breathing sound dataset, BreathBase, that we collected for this study. Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 5 pages, 3 figures

arXiv:1907.11425 [pdf, other]

doi 10.1109/TASLP.2020.2975419

Localization Uncertainty in Time-Amplitude Stereophonic Reproduction

Authors: Enzo De Sena, Zoran Cvetkovic, Huseyin Hacihabiboglu, Marc Moonen, Toon van Waterschoot

Abstract: This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and com… ▽ More This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0.99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty. △ Less

Submitted 6 September, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

Journal ref: IEEE/ACM Trans. Audio, Speech and Language Process. vol 28, pp. 1000 - 1015, Feb. 2020

arXiv:1803.01339 [pdf, other]

Multiple Sound Source Localisation with Steered Response Power Density and Hierarchical Grid Refinement

Authors: Mert Burkay Coteli, Orhun Olgun, Huseyin Hacihabiboglu

Abstract: Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering directi… ▽ More Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering direction that maximises the output power of a maximally-directive beam. This approach is computationally costly since it requires steering the beam in all possible directions. This paper presents an extension to SRP called steered response power density (SRPD) and an associated, signal-adaptive search method called hierarchical grid refinement (HiGRID) for reducing the number of steering directions needed for DOA estimation. The proposed method can localise coherent as well as incoherent sources while jointly providing the number of prominent sources in the scene. It is shown to be robust to reverberation and additive white noise. An evaluation of the proposed method using simulations and real recordings under highly reverberant conditions as well as a comparison with state- of-the-art methods are presented. △ Less

Submitted 4 March, 2018; originally announced March 2018.

Comments: 14 pages, 10 figures, 4 tables, submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing (03 March 2018)

Showing 1–3 of 3 results for author: Hacihabiboglu, H