-
Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features
Authors:
Atıl İlerialkan,
Alptekin Temizel,
Hüseyin Hacıhabiboğlu
Abstract:
Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude…
▽ More
Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude features are extracted using the Hilbert-Huang transform (HHT) and fed into a CNN-GRU network for classification of recordings from the open intraspeech breathing sound dataset, BreathBase, that we collected for this study. Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained.
△ Less
Submitted 25 May, 2020;
originally announced May 2020.
-
Localization Uncertainty in Time-Amplitude Stereophonic Reproduction
Authors:
Enzo De Sena,
Zoran Cvetkovic,
Huseyin Hacihabiboglu,
Marc Moonen,
Toon van Waterschoot
Abstract:
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and com…
▽ More
This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0.99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty.
△ Less
Submitted 6 September, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
Multiple Sound Source Localisation with Steered Response Power Density and Hierarchical Grid Refinement
Authors:
Mert Burkay Coteli,
Orhun Olgun,
Huseyin Hacihabiboglu
Abstract:
Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering directi…
▽ More
Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering direction that maximises the output power of a maximally-directive beam. This approach is computationally costly since it requires steering the beam in all possible directions. This paper presents an extension to SRP called steered response power density (SRPD) and an associated, signal-adaptive search method called hierarchical grid refinement (HiGRID) for reducing the number of steering directions needed for DOA estimation. The proposed method can localise coherent as well as incoherent sources while jointly providing the number of prominent sources in the scene. It is shown to be robust to reverberation and additive white noise. An evaluation of the proposed method using simulations and real recordings under highly reverberant conditions as well as a comparison with state- of-the-art methods are presented.
△ Less
Submitted 4 March, 2018;
originally announced March 2018.
-
Efficient Synthesis of Room Acoustics via Scattering Delay Networks
Authors:
Enzo De Sena,
Huseyin Hacihabiboglu,
Zoran Cvetkovic,
Julius O. Smith III
Abstract:
An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly…
▽ More
An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible.
△ Less
Submitted 9 July, 2015; v1 submitted 19 February, 2015;
originally announced February 2015.