Search | arXiv e-print repository

BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics

Authors: Jenny Hamer, Eleni Triantafillou, Bart van Merriënboer, Stefan Kahl, Holger Klinck, Tom Denton, Vincent Dumoulin

Abstract: The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We p… ▽ More The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models. △ Less

Submitted 13 December, 2023; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2307.06292 [pdf, other]

doi 10.1038/s41598-023-49989-z

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Authors: Burooj Ghani, Tom Denton, Stefan Kahl, Holger Klinck

Abstract: Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for res… ▽ More Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for research and decision-making in biodiversity monitoring, animal behaviour studies, and natural resource management. However, deep learning models are often data-hungry and require a significant amount of labeled training data to perform well. While sufficient training data is available for certain taxonomic groups (e.g., common bird species), many classes (such as rare and endangered species, many non-bird taxa, and call-type) lack enough data to train a robust model from scratch. This study investigates the utility of feature embeddings extracted from audio classification models to identify bioacoustic classes other than the ones these models were originally trained on. We evaluate models on diverse datasets, including different bird calls and dialect types, bat calls, marine mammals calls, and amphibians calls. The embeddings extracted from the models trained on bird vocalization data consistently allowed higher quality classification than the embeddings trained on general audio datasets. The results of this study indicate that high-quality feature embeddings from large-scale acoustic bird classifiers can be harnessed for few-shot transfer learning, enabling the learning of new classes from a limited quantity of training data. Our findings reveal the potential for efficient analyses of novel bioacoustic tasks, even in scenarios where available training data is limited to a few samples. △ Less

Submitted 17 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

arXiv:2304.02714 [pdf, other]

doi 10.1109/TMM.2023.3251109

Learning Stage-wise GANs for Whistle Extraction in Time-Frequency Spectrograms

Authors: Pu Li, Marie Roch, Holger Klinck, Erica Fleishman, Douglas Gillespie, Eva-Marie Nosal, Yu Shiu, Xiaobai Liu

Abstract: Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process lar… ▽ More Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process large volumes of recording data. Recently, a deep learning-based method demonstrated superior performance in extracting whistles under varying noise conditions. However, training such networks requires a large amount of labor-intensive annotation, which is not available for many species. To overcome this limitation, we present a framework of stage-wise generative adversarial networks (GANs), which compile new whistle data suitable for deep model training via three stages: generation of background noise in the spectrogram, generation of whistle contours, and generation of whistle signals. By separating the generation of different components in the samples, our framework composes visually promising whistle data and labels even when few expert annotated data are available. Regardless of the amount of human-annotated data, the proposed data augmentation framework leads to a consistent improvement in performance of the whistle extraction model, with a maximum increase of 1.69 in the whistle extraction mean F1-score. Our stage-wise GAN also surpasses one single GAN in improving whistle extraction models with augmented data. The data and code will be available at https://github.com/Paul-LiPu/CompositeGAN\_WhistleAugment. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE Transactions of Multimedia (2023)

arXiv:2110.12951 [pdf, other]

doi 10.1038/s41467-022-27980-y

Seeing biodiversity: perspectives in machine learning for wildlife conservation

Authors: Devis Tuia, Benjamin Kellenberger, Sara Beery, Blair R. Costelloe, Silvia Zuffi, Benjamin Risse, Alexander Mathis, Mackenzie W. Mathis, Frank van Langevelde, Tilo Burghardt, Roland Kays, Holger Klinck, Martin Wikelski, Iain D. Couzin, Grant van Horn, Margaret C. Crofoot, Charles V. Stewart, Tanya Berger-Wolf

Abstract: Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how the… ▽ More Data acquisition in animal ecology is rapidly accelerating due to inexpensive and accessible sensors such as smartphones, drones, satellites, audio recorders and bio-logging devices. These new technologies and the data they generate hold great potential for large-scale environmental monitoring and understanding, but are limited by current data processing approaches which are inefficient in how they ingest, digest, and distill data into relevant information. We argue that machine learning, and especially deep learning approaches, can meet this analytic challenge to enhance our understanding, monitoring capacity, and conservation of wildlife species. Incorporating machine learning into ecological workflows could improve inputs for population and behavior models and eventually lead to integrated hybrid modeling tools, with ecological models acting as constraints for machine learning models and the latter providing data-supported insights. In essence, by combining new machine learning approaches with ecological domain knowledge, animal ecologists can capitalize on the abundance of data generated by modern sensor technologies in order to reliably estimate population abundances, study animal behavior and mitigate human/wildlife conflicts. To succeed, this approach will require close collaboration and cross-disciplinary education between the computer science and animal ecology communities in order to ensure the quality of machine learning approaches and train a new generation of data scientists in ecology and conservation. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:2108.09203 [pdf, other]

Parsing Birdsong with Deep Audio Embeddings

Authors: Irina Tolkova, Brian Chu, Marcel Hedman, Stefan Kahl, Holger Klinck

Abstract: Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring, and accompanying analytical tools, such as deep learning. However, machine learning models frequently have difficulty generalizing to examples not encountered in… ▽ More Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring, and accompanying analytical tools, such as deep learning. However, machine learning models frequently have difficulty generalizing to examples not encountered in the training data. In our work, we present a semi-supervised approach to identify characteristic calls and environmental noise. We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks, and group the resulting embeddings for a domain expert to identify cluster labels. We show that our approach can improve classification precision and provide insight into the latent structure of environmental acoustic datasets. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: IJCAI 2021 Artificial Intelligence for Social Good (AI4SG) Workshop

arXiv:2005.08894 [pdf]

Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

Authors: Pu Li, Xiaobai Liua, K. J. Palmer, Erica Fleishman, Douglas Gillespie, Eva-Marie Nosal, Yu Shiu, Holger Klinck, Danielle Cholewiak, Tyler Helble, Marie A. Roch

Abstract: We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-fre… ▽ More We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-frequency peaks that are associated with whistles. We also developed a comprehensive method to synthesize training samples from background environments and train the network with minimal human annotation effort. We applied the proposed learn-from-synthesis method to a subset of the public Detection, Classification, Localization, and Density Estimation (DCLDE) 2011 workshop data to extract whistle confidence maps, which we then processed with an existing contour extractor to produce whistle annotations. The F1-score of our best synthesis method was 0.158 greater than our baseline whistle extraction algorithm (~25% improvement) when applied to common dolphin (Delphinus spp.) and bottlenose dolphin (Tursiops truncatus) whistles. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: Invited paper for International Joint Conference on Neural Networks

Report number: IJCNN paper 6435539

Journal ref: in Intl. Joint Conf. Neural Net. (Glasgow, Scotland, July 19-24), pp. 10 (2020)

arXiv:1911.00417 [pdf, other]

doi 10.33682/ts6e-sn53

Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

Authors: Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

Abstract: This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalize… ▽ More This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 5 pages, 3 figures. Presented at the 3rd International Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE). 25--26 October 2019, New York, NY, USA

arXiv:1906.02572 [pdf]

GIBBONFINDR: An R package for the detection and classification of acoustic signals

Authors: Dena J. Clink, Holger Klinck

Abstract: The recent improvements in recording technology, data storage and battery life have led to an increased interest in the use of passive acoustic monitoring for a variety of research questions. One of the main obstacles in implementing wide scale acoustic monitoring programs in terrestrial environments is the lack of user-friendly, open source programs for processing large sound archives. Here we de… ▽ More The recent improvements in recording technology, data storage and battery life have led to an increased interest in the use of passive acoustic monitoring for a variety of research questions. One of the main obstacles in implementing wide scale acoustic monitoring programs in terrestrial environments is the lack of user-friendly, open source programs for processing large sound archives. Here we describe the new, open-source R package GIBBONFINDR which has functions for detection, classification and visualization of acoustic signals using a variety of readily available machine learning algorithms in the R programming environment. We provide a case study showing how GIBBONFINDR functions can be used in a workflow to detect and classify Bornean gibbon (Hylobates muelleri) calls in long-term acoustic data sets recorded in Danum Valley Conservation Area, Sabah, Malaysia. Machine learning is currently one of the most rapidly growing fields-- with applications across many disciplines-- and our goal is to make commonly used signal processing techniques and machine learning algorithms readily available for ecologists who are interested in incorporating bioacoustics techniques into their research. △ Less

Submitted 15 November, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: R package

arXiv:1804.07177 [pdf, other]

Recognizing Birds from Sound - The 2018 BirdCLEF Baseline System

Authors: Stefan Kahl, Thomas Wilhelm-Stein, Holger Klinck, Danny Kowerko, Maximilian Eibl

Abstract: Reliable identification of bird species in recorded audio files would be a transformative tool for researchers, conservation biologists, and birders. In recent years, artificial neural networks have greatly improved the detection quality of machine learning systems for bird species recognition. We present a baseline system using convolutional neural networks. We publish our code base as reference… ▽ More Reliable identification of bird species in recorded audio files would be a transformative tool for researchers, conservation biologists, and birders. In recent years, artificial neural networks have greatly improved the detection quality of machine learning systems for bird species recognition. We present a baseline system using convolutional neural networks. We publish our code base as reference for participants in the 2018 LifeCLEF bird identification task and discuss our experiments and potential improvements. △ Less

Submitted 19 April, 2018; originally announced April 2018.

Comments: The repository and a continuative tutorial can be found here: https://github.com/kahst/BirdCLEF-Baseline

arXiv:1610.03772 [pdf]

RAVEN X High Performance Data Mining Toolbox for Bioacoustic Data Analysis

Authors: Peter J. Dugan, Holger Klinck, Marie A. Roch, Tyler A. Helble

Abstract: Objective of this work is to integrate high performance computing (HPC) technologies and bioacoustics data-mining capabilities by offering a MATLAB-based toolbox called Raven-X. Raven-X will provide a hardware-independent solution, for processing large acoustic datasets - the toolkit will be available to the community at no cost. This goal will be achieved by leveraging prior work done which succe… ▽ More Objective of this work is to integrate high performance computing (HPC) technologies and bioacoustics data-mining capabilities by offering a MATLAB-based toolbox called Raven-X. Raven-X will provide a hardware-independent solution, for processing large acoustic datasets - the toolkit will be available to the community at no cost. This goal will be achieved by leveraging prior work done which successfully deployed MATLAB based HPC tools within Cornell University's Bioacoustics Research Program (BRP). These tools enabled commonly available multi-core computers to process data at accelerated rates to detect and classify whale sounds in large multi-channel sound archives. Through this collaboration, we will expand on this effort which was featured through Mathworks research and industry forums incorporate new cutting-edge detectors and classifiers, and disseminate Raven-X to the broader bioacoustics community. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Report number: N00014-16-1-3156

arXiv:1607.08482 [pdf]

Early and Late Time Acoustic Measures for Underwater Seismic Airgun Signals In Long-Term Acoustic Data Sets

Authors: Peter Dugan, Melania Guerra, Dimitri Ponirakis, Holger Klinck, Christopher W. Clark

Abstract: This work presents a new toolkit for describing the acoustic properties of the ocean environment before, during and after a sound event caused by an underwater seismic air-gun. The toolkit uses existing sound measures, but uniquely applies these to capture the early time period (actual pulse) and late time period (reverberation and multiple arrivals). In total, 183 features are produced for each a… ▽ More This work presents a new toolkit for describing the acoustic properties of the ocean environment before, during and after a sound event caused by an underwater seismic air-gun. The toolkit uses existing sound measures, but uniquely applies these to capture the early time period (actual pulse) and late time period (reverberation and multiple arrivals). In total, 183 features are produced for each air-gun sound. This toolkit was utilized on data retrieved from a field deployment encompassing five marine autonomous recording units during a 46-day seismic air-gun survey in Baffin Bay, Greenland. Using this toolkit, a total of 147 million data points were identified from the Greenland deployment recordings. The feasibility of extracting a large number of features was then evaluated using two separate methods: a serial computer and a high performance system. Results indicate that data extraction performance took an estimated 216 hours for the serial system, and 18 hours for the high performance computer. This paper provides an analytical description of the new toolkit along with details for using it to identify relevant data. △ Less

Submitted 5 May, 2016; originally announced July 2016.

Comments: Camera copy version of the paper for publication in IEEE explore. Paper was withdrawn by the co-authors for submission to JASA Express Letters

Showing 1–11 of 11 results for author: Klinck, H