Search | arXiv e-print repository

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

Authors: **hua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell

Abstract: Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples. Previous efforts in this area have utilized different architectures… ▽ More Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims to train a versatile animal sound detector using only a small set of audio samples. Previous efforts in this area have utilized different architectures and data augmentation techniques to enhance model performance. However, these approaches have not fully bridged the domain gap between source and target distributions, limiting their applicability in real-world scenarios. In this work, we introduce an new dataset designed to augment the diversity and breadth of classes available for few-shot bioacoustic event detection, building on the foundations of our previous datasets. To establish a robust baseline system tailored for the DCASE 2024 Task 5 challenge, we delve into an array of acoustic features and adopt negative hard sampling as our primary domain adaptation strategy. This approach, chosen in alignment with the challenge's guidelines that necessitate the independent treatment of each audio file, sidesteps the use of transductive learning to ensure compliance while aiming to enhance the system's adaptability to domain shifts. Our experiments show that the proposed baseline system achieves a better performance compared with the vanilla prototypical network. The findings also confirm the effectiveness of each domain adaptation method by ablating different components within the networks. This highlights the potential to improve few-shot bioacoustic sound event detection by further reducing the impact of domain shift. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2306.09223 [pdf, other]

Few-shot bioacoustic event detection at the DCASE 2023 challenge

Authors: Ines Nolasco, Burooj Ghani, Shubhr Singh, Ester Vidaña-Vila, Helen Whitehead, Emily Grout, Michael Emmerson, Frants Jensen, Ivan Kiskin, Joe Morford, Ariana Strandburg-Peshkin, Lisa Gill, Hanna Pamuła, Vincent Lostanlen, Dan Stowell

Abstract: Few-shot bioacoustic event detection consists in detecting sound events of specified types, in varying soundscapes, while having access to only a few examples of the class of interest. This task ran as part of the DCASE challenge for the third time this year with an evaluation set expanded to include new animal species, and a new rule: ensemble models were no longer allowed. The 2023 few shot task… ▽ More Few-shot bioacoustic event detection consists in detecting sound events of specified types, in varying soundscapes, while having access to only a few examples of the class of interest. This task ran as part of the DCASE challenge for the third time this year with an evaluation set expanded to include new animal species, and a new rule: ensemble models were no longer allowed. The 2023 few shot task received submissions from 6 different teams with F-scores reaching as high as 63% on the evaluation set. Here we describe the task, focusing on describing the elements that differed from previous years. We also take a look back at past editions to describe how the task has evolved. Not only have the F-score results steadily improved (40% to 60% to 63%), but the type of systems proposed have also become more complex. Sound event detection systems are no longer simple variations of the baselines provided: multiple few-shot learning methodologies are still strong contenders for the task. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: submitted to DCASE 2023 workshop

arXiv:2305.13210 [pdf, other]

doi 10.1016/j.ecoinf.2023.102258

Learning to detect an animal sound from five examples

Authors: Inês Nolasco, Shubhr Singh, Veronica Morfi, Vincent Lostanlen, Ariana Strandburg-Peshkin, Ester Vidaña-Vila, Lisa Gill, Hanna Pamuła, Helen Whitehead, Ivan Kiskin, Frants H. Jensen, Joe Morford, Michael G. Emmerson, Elisabetta Versace, Emily Grout, Haohe Liu, Dan Stowell

Abstract: Automatic detection and classification of animal sounds has many applications in biodiversity monitoring and animal behaviour. In the past twenty years, the volume of digitised wildlife sound available has massively increased, and automatic classification through deep learning now shows strong results. However, bioacoustics is not a single task but a vast range of small-scale tasks (such as indivi… ▽ More Automatic detection and classification of animal sounds has many applications in biodiversity monitoring and animal behaviour. In the past twenty years, the volume of digitised wildlife sound available has massively increased, and automatic classification through deep learning now shows strong results. However, bioacoustics is not a single task but a vast range of small-scale tasks (such as individual ID, call type, emotional indication) with wide variety in data characteristics, and most bioacoustic tasks do not come with strongly-labelled training data. The standard paradigm of supervised learning, focussed on a single large-scale dataset and/or a generic pre-trained algorithm, is insufficient. In this work we recast bioacoustic sound event detection within the AI framework of few-shot learning. We adapt this framework to sound event detection, such that a system can be given the annotated start/end times of as few as 5 events, and can then detect events in long-duration audio -- even when the sound category was not known at the time of algorithm training. We introduce a collection of open datasets designed to strongly test a system's ability to perform few-shot sound event detections, and we present the results of a public contest to address the task. We show that prototypical networks are a strong-performing method, when enhanced with adaptations for general characteristics of animal sounds. We demonstrate that widely-varying sound event durations are an important factor in performance, as well as non-stationarity, i.e. gradual changes in conditions throughout the duration of a recording. For fine-grained bioacoustic recognition tasks without massive annotated training data, our results demonstrate that few-shot sound event detection is a powerful new method, strongly outperforming traditional signal-processing detection methods in the fully automated scenario. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2207.07911 [pdf, other]

Few-shot bioacoustic event detection at the DCASE 2022 challenge

Authors: I. Nolasco, S. Singh, E. Vidana-Villa, E. Grout, J. Morford, M. Emmerson, F. Jensens, H. Whitehead, I. Kiskin, A. Strandburg-Peshkin, L. Gill, H. Pamula, V. Lostanlen, V. Morfi, D. Stowell

Abstract: Few-shot sound event detection is the task of detecting sound events, despite having only a few labelled examples of the class of interest. This framework is particularly useful in bioacoustics, where often there is a need to annotate very long recordings but the expert annotator time is limited. This paper presents an overview of the second edition of the few-shot bioacoustic sound event detectio… ▽ More Few-shot sound event detection is the task of detecting sound events, despite having only a few labelled examples of the class of interest. This framework is particularly useful in bioacoustics, where often there is a need to annotate very long recordings but the expert annotator time is limited. This paper presents an overview of the second edition of the few-shot bioacoustic sound event detection task included in the DCASE 2022 challenge. A detailed description of the task objectives, dataset, and baselines is presented, together with the main results obtained and characteristics of the submitted systems. This task received submissions from 15 different teams from which 13 scored higher than the baselines. The highest F-score was of 60% on the evaluation set, which leads to a huge improvement over last year's edition. Highly-performing methods made use of prototypical networks, transductive learning, and addressed the variable length of events from all target classes. Furthermore, by analysing results on each of the subsets we can identify the main difficulties that the systems face, and conclude that few-show bioacoustic sound event detection remains an open challenge. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: submitted to DCASE2022 workshop

arXiv:2110.05941 [pdf, ps, other]

doi 10.1109/ICASSP43922.2022.9746907

Rank-based loss for learning hierarchical representations

Authors: Ines Nolasco, Dan Stowell

Abstract: Hierarchical taxonomies are common in many contexts, and they are a very natural structure humans use to organise information. In machine learning, the family of methods that use the 'extra' information is called hierarchical classification. However, applied to audio classification, this remains relatively unexplored. Here we focus on how to integrate the hierarchical information of a problem to l… ▽ More Hierarchical taxonomies are common in many contexts, and they are a very natural structure humans use to organise information. In machine learning, the family of methods that use the 'extra' information is called hierarchical classification. However, applied to audio classification, this remains relatively unexplored. Here we focus on how to integrate the hierarchical information of a problem to learn embeddings representative of the hierarchical relationships. Previously, triplet loss has been proposed to address this problem, however it presents some issues like requiring the careful construction of the triplets, and being limited in the extent of hierarchical information it uses at each iteration. In this work we propose a rank based loss function that uses hierarchical information and translates this into a rank ordering of target distances between the examples. We show that rank based loss is suitable to learn hierarchical representations of the data. By testing on unseen fine level classes we show that this method is also capable of learning hierarchically correct representations of the new classes. Rank based loss has two promising aspects, it is generalisable to hierarchies with any number of levels, and is capable of dealing with data with incomplete hierarchical labels. △ Less

Submitted 11 February, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: This version corrects a bug in the baseline results

arXiv:1911.03682 [pdf, other]

Optimized geometrical metrics satisfying free-stream preservation

Authors: Irving Reyna Nolasco, Lisandro Dalcin, David C. Del Rey Fernandez, Stefano Zampini, Matteo Parsani

Abstract: Computational fluid dynamics and aerodynamics, which complement more expensive empirical approaches, are critical for develo** aerospace vehicles. During the past three decades, computational aerodynamics capability has improved remarkably, following advances in computer hardware and algorithm development. However, for complex applications, the demands on computational fluid dynamics continue to… ▽ More Computational fluid dynamics and aerodynamics, which complement more expensive empirical approaches, are critical for develo** aerospace vehicles. During the past three decades, computational aerodynamics capability has improved remarkably, following advances in computer hardware and algorithm development. However, for complex applications, the demands on computational fluid dynamics continue to increase in a quest to gain a few percent improvements in accuracy. Herein, we numerically demonstrate that optimizing the metric terms which arise from smoothly map** each cell to a reference element, lead to a solution whose accuracy is practically never worse and often noticeably better than the one obtained using the widely adopted Thomas and Lombard metric terms computation (Geometric conservation law and its application to flow computations on moving grids, AIAA Journal, 1979). Low and high-order accurate entropy stable schemes on distorted, high-order tensor product elements are used to simulate three-dimensional inviscid and viscous compressible test cases for which an analytical solution is known. △ Less

Submitted 27 November, 2019; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: 22 pages and one appendix section

MSC Class: G.1; G.4; G.1.8 ACM Class: G.1; G.4; G.1.8

arXiv:1904.10408 [pdf, other]

Towards joint sound scene and polyphonic sound event recognition

Authors: Helen L. Bear, Ines Nolasco, Emmanouil Benetos

Abstract: Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more effic… ▽ More Acoustic Scene Classification (ASC) and Sound Event Detection (SED) are two separate tasks in the field of computational sound scene analysis. In this work, we present a new dataset with both sound scene and sound event labels and use this to demonstrate a novel method for jointly classifying sound scenes and recognizing sound events. We show that by taking a joint approach, learning is more efficient and whilst improvements are still needed for sound event detection, SED results are robust in a dataset where the sample distribution is skewed towards sound scenes. △ Less

Submitted 1 July, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

Comments: Accepted to Interspeech 2019

arXiv:1811.06330 [pdf, other]

Audio-based identification of beehive states

Authors: Inês Nolasco, Alessandro Terenzi, Stefania Cecchi, Simone Orcioni, Helen L. Bear, Emmanouil Benetos

Abstract: The absence of the queen in a beehive is a very strong indicator of the need for beekeeper intervention. Manually searching for the queen is an arduous recurrent task for beekeepers that disrupts the normal life cycle of the beehive and can be a source of stress for bees. Sound is an indicator for signalling different states of the beehive, including the absence of the queen bee. In this work, we… ▽ More The absence of the queen in a beehive is a very strong indicator of the need for beekeeper intervention. Manually searching for the queen is an arduous recurrent task for beekeepers that disrupts the normal life cycle of the beehive and can be a source of stress for bees. Sound is an indicator for signalling different states of the beehive, including the absence of the queen bee. In this work, we apply machine learning methods to automatically recognise different states in a beehive using audio as input. % The system is built on top of a method for beehive sound recognition in order to detect bee sounds from other external sounds. We investigate both support vector machines and convolutional neural networks for beehive state recognition, using audio data of beehives collected from the NU-Hive project. Results indicate the potential of machine learning methods as well as the challenges of generalizing the system to new hives. △ Less

Submitted 15 February, 2019; v1 submitted 15 November, 2018; originally announced November 2018.

Comments: Accepted for ICASSP 2019

arXiv:1811.06016 [pdf, other]

To bee or not to bee: Investigating machine learning approaches for beehive sound recognition

Authors: Inês Nolasco, Emmanouil Benetos

Abstract: In this work, we aim to explore the potential of machine learning methods to the problem of beehive sound recognition. A major contribution of this work is the creation and release of annotations for a selection of beehive recordings. By experimenting with both support vector machines and convolutional neural networks, we explore important aspects to be considered in the development of beehive sou… ▽ More In this work, we aim to explore the potential of machine learning methods to the problem of beehive sound recognition. A major contribution of this work is the creation and release of annotations for a selection of beehive recordings. By experimenting with both support vector machines and convolutional neural networks, we explore important aspects to be considered in the development of beehive sound recognition systems using machine learning approaches. △ Less

Submitted 2 December, 2021; v1 submitted 14 November, 2018; originally announced November 2018.

Comments: Presented at Detection and Classification of Acoustic Scenes and Events (DCASE) workshop 2018

Journal ref: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)

Showing 1–9 of 9 results for author: Nolasco, I