Skip to main content

Showing 1–15 of 15 results for author: Kitić, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2305.03558  [pdf, other

    eess.AS cs.SD eess.SP

    Blind identification of Ambisonic reduced room impulse response

    Authors: Srđan Kitić, Jérôme Daniel

    Abstract: Recently proposed Generalized Time-domain Velocity Vector (GTVV) is a generalization of relative room impulse response in spherical harmonic (aka Ambisonic) domain that allows for blind estimation of early-echo parameters: the directions and relative delays of individual reflections. However, the derived closed-form expression of GTVV mandates few assumptions to hold, most important being that the… ▽ More

    Submitted 6 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at the IEEE/ACM Transactions on Audio, Speech, and Language Processing

  2. arXiv:2203.05265  [pdf, other

    eess.AS cs.SD

    Echo-enabled Direction-of-Arrival and range estimation of a mobile source in Ambisonic domain

    Authors: Jérôme Daniel, Srđan Kitić

    Abstract: Range estimation of a far field sound source in a reverberant environment is known to be a notoriously difficult problem, hence most localization methods are only capable of estimating the source's Direction-of-Arrival (DoA). In an earlier work, we have demonstrated that, under certain restrictive acoustic conditions and given the orientation of a reflecting surface, one can exploit the dominant a… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: Submitted

  3. arXiv:2110.06304  [pdf, other

    eess.AS cs.SD eess.SP

    Generalized Time Domain Velocity Vector

    Authors: Srđan Kitić, Jérôme Daniel

    Abstract: We introduce and analyze Generalized Time Domain Velocity Vector (GTVV), an extension of the previously presented acoustic multipath footprint extracted from the Ambisonic recordings. GTVV is better adapted to adverse acoustic conditions, and enables efficient parameter estimation of multiple plane wave components in the recorded multichannel mixture. Experiments on simulated data confirm the pred… ▽ More

    Submitted 19 May, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Submitted

  4. arXiv:2109.03465  [pdf, other

    cs.SD cs.LG eess.AS

    A Survey of Sound Source Localization with Deep Learning Methods

    Authors: Pierre-Amaury Grumiaux, Srđan Kitić, Laurent Girin, Alexandre Guérin

    Abstract: This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network… ▽ More

    Submitted 17 June, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in The Journal of the Acoustical Society of America

  5. arXiv:2107.11066  [pdf, other

    cs.SD eess.AS

    SALADnet: Self-Attentive multisource Localization in the Ambisonics Domain

    Authors: Pierre-Amaury Grumiaux, Srdan Kitic, Prerak Srivastava, Laurent Girin, Alexandre Guérin

    Abstract: In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data,… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: Accepted to Workshop on Applications of Signal Processing to Audio and Acoustics

  6. arXiv:2105.01897  [pdf, other

    cs.SD eess.AS

    Improved feature extraction for CRNN-based multiple sound source localization

    Authors: Pierre-Amaury Grumiaux, Srdan Kitic, Laurent Girin, Alexandre Guérin

    Abstract: In this work, we propose to extend a state-of-the-art multi-source localization system based on a convolutional recurrent neural network and Ambisonics signals. We significantly improve the performance of the baseline network by changing the layout between convolutional and pooling layers. We propose several configurations with more convolutional layers and smaller pooling sizes in-between, so tha… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 5 pages, 2 figures. Accepted to EUSIPCO 2021

  7. arXiv:2101.01977  [pdf, other

    cs.SD eess.AS

    Multichannel CRNN for Speaker Counting: an Analysis of Performance

    Authors: Pierre-Amaury Grumiaux, Srdan Kitic, Laurent Girin, Alexandre Guérin

    Abstract: Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. In a previous work… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

    Comments: Presented at Forum Acusticum 2020

  8. arXiv:2006.02099  [pdf, other

    eess.AS cs.SD

    Time Domain Velocity Vector for Retracing the Multipath Propagation

    Authors: Jérôme Daniel, Srđan Kitić

    Abstract: We propose a conceptually and computationally simple form of sound velocity that offers a readable view of the interference between direct and indirect sound waves. Unlike most approaches in the literature, it jointly exploits both active and reactive sound intensity measurements, as typically derived from a first order ambisonics recording. This representation has a potential both as a valuable t… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: Presented at ICASSP 2020

  9. arXiv:2006.01708  [pdf, other

    eess.AS cs.SD eess.SP

    Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings

    Authors: Amélie Bosca, Alexandre Guérin, Lauréline Perotin, Srđan Kitić

    Abstract: We present a CNN architecture for speech enhancement from multichannel first-order Ambisonics mixtures. The data-dependent spatial filters, deduced from a mask-based approach, are used to help an automatic speech recognition engine to face adverse conditions of reverberation and competitive speakers. The mask predictions are provided by a neural network, fed with rough estimations of speech and no… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: Accepted for EUSIPCO 2020

  10. arXiv:2005.10228  [pdf, other

    cs.SD eess.AS eess.SP

    Sparsity-based audio declip** methods: selected overview, new algorithms, and large-scale evaluation

    Authors: Clément Gaultier, Srđan Kitić, Rémi Gribonval, Nancy Bertin

    Abstract: Recent advances in audio declip** have substantially improved the state of the art.% in certain saturation regimes. Yet, practitioners need guidelines to choose a method, and while existing benchmarks have been instrumental in advancing the field, larger-scale experiments are needed to guide such choices. First, we show that the clip** levels in existing small-scale benchmarks are moderate and… ▽ More

    Submitted 30 November, 2020; v1 submitted 19 May, 2020; originally announced May 2020.

  11. arXiv:2003.07839  [pdf, other

    cs.SD eess.AS

    High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

    Authors: Pierre-Amaury Grumiaux, Srdjan Kitic, Laurent Girin, Alexandre Guérin

    Abstract: Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. For that purpose,… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: 5 pages, 1 figure

  12. arXiv:2001.08830  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Scattering Features for Multimodal Gait Recognition

    Authors: Srđan Kitić, Gilles Puy, Patrick Pérez, Philippe Gilberton

    Abstract: We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we prop… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: Published at IEEE GlobalSIP 2017

  13. arXiv:1910.10661  [pdf, other

    cs.SD eess.AS eess.SP

    A Comparative Study of Multilateration Methods for Single-Source Localization in Distributed Audio

    Authors: Srđan Kitić, Clément Gaultier, Grégory Pallone

    Abstract: In this article we analyze the state-of-the-art in multilateration - the family of localization methods enabled by the range difference observations. These methods are computationally efficient, signal-independent, and flexible with regards to the number of sensing nodes and their spatial arrangement. However, the multilateration problem does not admit a closed-form solution in the general case, a… ▽ More

    Submitted 28 July, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: To appear at IWIS - The 1st International Workshop on the Internet of Sounds

  14. arXiv:1810.04080  [pdf, ps, other

    cs.SD eess.AS

    TRAMP: Tracking by a Real-time AMbisonic-based Particle filter

    Authors: Srđan Kitić, Alexandre Guérin

    Abstract: This article presents a multiple sound source localization and tracking system, fed by the Eigenmike array. The First Order Ambisonics (FOA) format is used to build a pseudointensity-based spherical histogram, from which the source position estimates are deduced. These instantaneous estimates are processed by a wellknown tracking system relying on a set of particle filters. While the novelty withi… ▽ More

    Submitted 4 December, 2018; v1 submitted 9 October, 2018; originally announced October 2018.

    Comments: In Proceedings of the LOCATA ChallengeWorkshop - a satellite event of IWAENC 2018 (arXiv:1811.08482 )

    Report number: LOCATAchallenge/2018/09

  15. arXiv:1711.11259  [pdf, other

    cs.SD eess.AS

    A modeling and algorithmic framework for (non)social (co)sparse audio restoration

    Authors: Clément Gaultier, Nancy Bertin, Srđan Kitić, Rémi Gribonval

    Abstract: We propose a unified modeling and algorithmic framework for audio restoration problem. It encompasses analysis sparse priors as well as more classical synthesis sparse priors, and regular sparsity as well as various forms of structured sparsity embodied by shrinkage operators (such as social shrinkage). The versatility of the framework is illustrated on two restoration scenarios: denoising, and de… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.