-
Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features
Authors:
Atıl İlerialkan,
Alptekin Temizel,
Hüseyin Hacıhabiboğlu
Abstract:
Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude…
▽ More
Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude features are extracted using the Hilbert-Huang transform (HHT) and fed into a CNN-GRU network for classification of recordings from the open intraspeech breathing sound dataset, BreathBase, that we collected for this study. Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction
Authors:
Enzo De Sena,
Zoran Cvetkovic,
Huseyin Hacihabiboglu,
Marc Moonen,
Toon van Waterschoot
Abstract:
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and com…
▽ More
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0.99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty.
△ Less
Submitted 6 September, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Multiple Sound Source Localisation with Steered Response Power Density and Hierarchical Grid Refinement
Authors:
Mert Burkay Coteli,
Orhun Olgun,
Huseyin Hacihabiboglu
Abstract:
Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering directi…
▽ More
Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering direction that maximises the output power of a maximally-directive beam. This approach is computationally costly since it requires steering the beam in all possible directions. This paper presents an extension to SRP called steered response power density (SRPD) and an associated, signal-adaptive search method called hierarchical grid refinement (HiGRID) for reducing the number of steering directions needed for DOA estimation. The proposed method can localise coherent as well as incoherent sources while jointly providing the number of prominent sources in the scene. It is shown to be robust to reverberation and additive white noise. An evaluation of the proposed method using simulations and real recordings under highly reverberant conditions as well as a comparison with state- of-the-art methods are presented.
△ Less
Submitted 4 March, 2018;
originally announced March 2018.