-
Sound Source Distance Estimation in Diverse and Dynamic Acoustic Conditions
Authors:
Saksham Singh Kushwaha,
Iran R. Roman,
Magdalena Fuentes,
Juan Pablo Bello
Abstract:
Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing a…
▽ More
Localizing a moving sound source in the real world involves determining its direction-of-arrival (DOA) and distance relative to a microphone. Advancements in DOA estimation have been facilitated by data-driven methods optimized with large open-source datasets with microphone array recordings in diverse environments. In contrast, estimating a sound source's distance remains understudied. Existing approaches assume recordings by non-coincident microphones to use methods that are susceptible to differences in room reverberation. We present a CRNN able to estimate the distance of moving sound sources across multiple datasets featuring diverse rooms, outperforming a recently-published approach. We also characterize our model's performance as a function of sound source distance and different training losses. This analysis reveals optimal training using a loss that weighs model errors as an inverse function of the sound source true distance. Our study is the first to demonstrate that sound source distance estimation can be performed across diverse acoustic conditions using deep learning.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
A Multimodal Prototypical Approach for Unsupervised Sound Classification
Authors:
Saksham Singh Kushwaha,
Magdalena Fuentes
Abstract:
In the context of environmental sound classification, the adaptability of systems is key: which sound classes are interesting depends on the context and the user's needs. Recent advances in text-to-audio retrieval allow for zero-shot audio classification, but performance compared to supervised models remains limited. This work proposes a multimodal prototypical approach that exploits local audio-t…
▽ More
In the context of environmental sound classification, the adaptability of systems is key: which sound classes are interesting depends on the context and the user's needs. Recent advances in text-to-audio retrieval allow for zero-shot audio classification, but performance compared to supervised models remains limited. This work proposes a multimodal prototypical approach that exploits local audio-text embeddings to provide more relevant answers to audio queries, augmenting the adaptability of sound detection in the wild. We do this by first using text to query a nearby community of audio embeddings that best characterize each query sound, and select the group's centroids as our prototypes. Second, we compare unseen audio to these prototypes for classification. We perform multiple ablation studies to understand the impact of the embedding models and prompts. Our unsupervised approach improves upon the zero-shot state-of-the-art in three sound recognition benchmarks by an average of 12%.
△ Less
Submitted 17 August, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Compressing the Data Densely by New Geflochtener to Accelerate Web
Authors:
Hemant Kumar Saini,
Satpal Singh Kushwaha,
C. Rama Krishna
Abstract:
At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any loss, a new algorithm is proposed based on L Z 77 family which selectively models the references with backward movement and encodes the longest matches through gr…
▽ More
At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any loss, a new algorithm is proposed based on L Z 77 family which selectively models the references with backward movement and encodes the longest matches through greedy parsing with the shortest path technique to compresses the data with high density. This idea seems to be useful since the single Web Page contains many repetitive words which create havoc in consuming space, so let it removes such unnecessary redundancies with 70% efficiency and compress the pages with 23.75 - 35% compression ratio.
△ Less
Submitted 16 May, 2014;
originally announced May 2014.