Search | arXiv e-print repository

arXiv:2005.12230 [pdf, other]

Speaker and Posture Classification using Instantaneous Intraspeech Breathing Features

Authors: Atıl İlerialkan, Alptekin Temizel, Hüseyin Hacıhabiboğlu

Abstract: Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude… ▽ More Acoustic features extracted from speech are widely used in problems such as biometric speaker identification and first-person activity detection. However, the use of speech for such purposes raises privacy issues as the content is accessible to the processing party. In this work, we propose a method for speaker and posture classification using intraspeech breathing sounds. Instantaneous magnitude features are extracted using the Hilbert-Huang transform (HHT) and fed into a CNN-GRU network for classification of recordings from the open intraspeech breathing sound dataset, BreathBase, that we collected for this study. Using intraspeech breathing sounds, 87% speaker classification, and 98% posture classification accuracy were obtained. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 5 pages, 3 figures

arXiv:1907.11425 [pdf, other]

doi 10.1109/TASLP.2020.2975419

Localization Uncertainty in Time-Amplitude Stereophonic Reproduction

Authors: Enzo De Sena, Zoran Cvetkovic, Huseyin Hacihabiboglu, Marc Moonen, Toon van Waterschoot

Abstract: This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and com… ▽ More This article studies the effects of inter-channel time and level differences in stereophonic reproduction on perceived localization uncertainty, which is defined as how difficult it is for a listener to tell where a sound source is located. Towards this end, a computational model of localization uncertainty is proposed first. The model calculates inter-aural time and level difference cues, and compares them to those associated to free-field point-like sources. The comparison is carried out using a particular distance functional that replicates the increased uncertainty observed experimentally with inconsistent inter-aural time and level difference cues. The model is validated by formal listening tests, achieving a Pearson correlation of 0.99. The model is then used to predict localization uncertainty for stereophonic setups and a listener in central and off-central positions. Results show that amplitude methods achieve a slightly lower localization uncertainty for a listener positioned exactly in the center of the sweet spot. As soon as the listener moves away from that position, the situation reverses, with time-amplitude methods achieving a lower localization uncertainty. △ Less

Submitted 6 September, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

Journal ref: IEEE/ACM Trans. Audio, Speech and Language Process. vol 28, pp. 1000 - 1015, Feb. 2020

arXiv:1803.01339 [pdf, other]

Multiple Sound Source Localisation with Steered Response Power Density and Hierarchical Grid Refinement

Authors: Mert Burkay Coteli, Orhun Olgun, Huseyin Hacihabiboglu

Abstract: Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering directi… ▽ More Estimation of the direction-of-arrival (DOA) of sound sources is an important step in sound field analysis. Rigid spherical microphone arrays allow the calculation of a compact spherical harmonic representation of the sound field. A basic method for analysing sound fields recorded using such arrays is steered response power (SRP) maps wherein the source DOA can be estimated as the steering direction that maximises the output power of a maximally-directive beam. This approach is computationally costly since it requires steering the beam in all possible directions. This paper presents an extension to SRP called steered response power density (SRPD) and an associated, signal-adaptive search method called hierarchical grid refinement (HiGRID) for reducing the number of steering directions needed for DOA estimation. The proposed method can localise coherent as well as incoherent sources while jointly providing the number of prominent sources in the scene. It is shown to be robust to reverberation and additive white noise. An evaluation of the proposed method using simulations and real recordings under highly reverberant conditions as well as a comparison with state- of-the-art methods are presented. △ Less

Submitted 4 March, 2018; originally announced March 2018.

Comments: 14 pages, 10 figures, 4 tables, submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing (03 March 2018)

arXiv:1502.05751 [pdf, other]

doi 10.1109/TASLP.2015.2438547

Efficient Synthesis of Room Acoustics via Scattering Delay Networks

Authors: Enzo De Sena, Huseyin Hacihabiboglu, Zoran Cvetkovic, Julius O. Smith III

Abstract: An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly… ▽ More An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible. △ Less

Submitted 9 July, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 9, September 2015

Showing 1–4 of 4 results for author: Hacıhabiboğlu, H