Search | arXiv e-print repository

doi 10.1038/s41598-023-49989-z

Global birdsong embeddings enable superior transfer learning for bioacoustic classification

Authors: Burooj Ghani, Tom Denton, Stefan Kahl, Holger Klinck

Abstract: Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for res… ▽ More Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for research and decision-making in biodiversity monitoring, animal behaviour studies, and natural resource management. However, deep learning models are often data-hungry and require a significant amount of labeled training data to perform well. While sufficient training data is available for certain taxonomic groups (e.g., common bird species), many classes (such as rare and endangered species, many non-bird taxa, and call-type) lack enough data to train a robust model from scratch. This study investigates the utility of feature embeddings extracted from audio classification models to identify bioacoustic classes other than the ones these models were originally trained on. We evaluate models on diverse datasets, including different bird calls and dialect types, bat calls, marine mammals calls, and amphibians calls. The embeddings extracted from the models trained on bird vocalization data consistently allowed higher quality classification than the embeddings trained on general audio datasets. The results of this study indicate that high-quality feature embeddings from large-scale acoustic bird classifiers can be harnessed for few-shot transfer learning, enabling the learning of new classes from a limited quantity of training data. Our findings reveal the potential for efficient analyses of novel bioacoustic tasks, even in scenarios where available training data is limited to a few samples. △ Less

Submitted 17 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

arXiv:2304.02714 [pdf, other]

doi 10.1109/TMM.2023.3251109

Learning Stage-wise GANs for Whistle Extraction in Time-Frequency Spectrograms

Authors: Pu Li, Marie Roch, Holger Klinck, Erica Fleishman, Douglas Gillespie, Eva-Marie Nosal, Yu Shiu, Xiaobai Liu

Abstract: Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process lar… ▽ More Whistle contour extraction aims to derive animal whistles from time-frequency spectrograms as polylines. For toothed whales, whistle extraction results can serve as the basis for analyzing animal abundance, species identity, and social activities. During the last few decades, as long-term recording systems have become affordable, automated whistle extraction algorithms were proposed to process large volumes of recording data. Recently, a deep learning-based method demonstrated superior performance in extracting whistles under varying noise conditions. However, training such networks requires a large amount of labor-intensive annotation, which is not available for many species. To overcome this limitation, we present a framework of stage-wise generative adversarial networks (GANs), which compile new whistle data suitable for deep model training via three stages: generation of background noise in the spectrogram, generation of whistle contours, and generation of whistle signals. By separating the generation of different components in the samples, our framework composes visually promising whistle data and labels even when few expert annotated data are available. Regardless of the amount of human-annotated data, the proposed data augmentation framework leads to a consistent improvement in performance of the whistle extraction model, with a maximum increase of 1.69 in the whistle extraction mean F1-score. Our stage-wise GAN also surpasses one single GAN in improving whistle extraction models with augmented data. The data and code will be available at https://github.com/Paul-LiPu/CompositeGAN\_WhistleAugment. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Accepted by IEEE Transactions of Multimedia (2023)

arXiv:2108.09203 [pdf, other]

Parsing Birdsong with Deep Audio Embeddings

Authors: Irina Tolkova, Brian Chu, Marcel Hedman, Stefan Kahl, Holger Klinck

Abstract: Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring, and accompanying analytical tools, such as deep learning. However, machine learning models frequently have difficulty generalizing to examples not encountered in… ▽ More Monitoring of bird populations has played a vital role in conservation efforts and in understanding biodiversity loss. The automation of this process has been facilitated by both sensing technologies, such as passive acoustic monitoring, and accompanying analytical tools, such as deep learning. However, machine learning models frequently have difficulty generalizing to examples not encountered in the training data. In our work, we present a semi-supervised approach to identify characteristic calls and environmental noise. We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks, and group the resulting embeddings for a domain expert to identify cluster labels. We show that our approach can improve classification precision and provide insight into the latent structure of environmental acoustic datasets. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: IJCAI 2021 Artificial Intelligence for Social Good (AI4SG) Workshop

arXiv:2005.08894 [pdf]

Learning Deep Models from Synthetic Data for Extracting Dolphin Whistle Contours

Authors: Pu Li, Xiaobai Liua, K. J. Palmer, Erica Fleishman, Douglas Gillespie, Eva-Marie Nosal, Yu Shiu, Holger Klinck, Danielle Cholewiak, Tyler Helble, Marie A. Roch

Abstract: We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-fre… ▽ More We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-frequency peaks that are associated with whistles. We also developed a comprehensive method to synthesize training samples from background environments and train the network with minimal human annotation effort. We applied the proposed learn-from-synthesis method to a subset of the public Detection, Classification, Localization, and Density Estimation (DCLDE) 2011 workshop data to extract whistle confidence maps, which we then processed with an existing contour extractor to produce whistle annotations. The F1-score of our best synthesis method was 0.158 greater than our baseline whistle extraction algorithm (~25% improvement) when applied to common dolphin (Delphinus spp.) and bottlenose dolphin (Tursiops truncatus) whistles. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: Invited paper for International Joint Conference on Neural Networks

Report number: IJCNN paper 6435539

Journal ref: in Intl. Joint Conf. Neural Net. (Glasgow, Scotland, July 19-24), pp. 10 (2020)

arXiv:1911.00417 [pdf, other]

doi 10.33682/ts6e-sn53

Long-distance Detection of Bioacoustic Events with Per-channel Energy Normalization

Authors: Vincent Lostanlen, Kaitlin Palmer, Elly Knight, Christopher Clark, Holger Klinck, Andrew Farnsworth, Tina Wong, Jason Cramer, Juan Pablo Bello

Abstract: This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalize… ▽ More This paper proposes to perform unsupervised detection of bioacoustic events by pooling the magnitudes of spectrogram frames after per-channel energy normalization (PCEN). Although PCEN was originally developed for speech recognition, it also has beneficial effects in enhancing animal vocalizations, despite the presence of atmospheric absorption and intermittent noise. We prove that PCEN generalizes logarithm-based spectral flux, yet with a tunable time scale for background noise estimation. In comparison with pointwise logarithm, PCEN reduces false alarm rate by 50x in the near field and 5x in the far field, both on avian and marine bioacoustic datasets. Such improvements come at moderate computational cost and require no human intervention, thus heralding a promising future for PCEN in bioacoustics. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 5 pages, 3 figures. Presented at the 3rd International Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE). 25--26 October 2019, New York, NY, USA

arXiv:1906.02572 [pdf]

GIBBONFINDR: An R package for the detection and classification of acoustic signals

Authors: Dena J. Clink, Holger Klinck

Abstract: The recent improvements in recording technology, data storage and battery life have led to an increased interest in the use of passive acoustic monitoring for a variety of research questions. One of the main obstacles in implementing wide scale acoustic monitoring programs in terrestrial environments is the lack of user-friendly, open source programs for processing large sound archives. Here we de… ▽ More The recent improvements in recording technology, data storage and battery life have led to an increased interest in the use of passive acoustic monitoring for a variety of research questions. One of the main obstacles in implementing wide scale acoustic monitoring programs in terrestrial environments is the lack of user-friendly, open source programs for processing large sound archives. Here we describe the new, open-source R package GIBBONFINDR which has functions for detection, classification and visualization of acoustic signals using a variety of readily available machine learning algorithms in the R programming environment. We provide a case study showing how GIBBONFINDR functions can be used in a workflow to detect and classify Bornean gibbon (Hylobates muelleri) calls in long-term acoustic data sets recorded in Danum Valley Conservation Area, Sabah, Malaysia. Machine learning is currently one of the most rapidly growing fields-- with applications across many disciplines-- and our goal is to make commonly used signal processing techniques and machine learning algorithms readily available for ecologists who are interested in incorporating bioacoustics techniques into their research. △ Less

Submitted 15 November, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: R package

Showing 1–6 of 6 results for author: Klinck, H