Search | arXiv e-print repository

BEANS: The Benchmark of Animal Sounds

Authors: Masato Hagiwara, Benjamin Hoffman, Jen-Yu Liu, Maddie Cusimano, Felix Effenberger, Katie Zacarian

Abstract: The use of machine learning (ML) based techniques has become increasingly popular in the field of bioacoustics over the last years. Fundamental requirements for the successful application of ML based techniques are curated, agreed upon, high-quality datasets and benchmark tasks to be learned on a given dataset. However, the field of bioacoustics so far lacks such public benchmarks which cover mult… ▽ More The use of machine learning (ML) based techniques has become increasingly popular in the field of bioacoustics over the last years. Fundamental requirements for the successful application of ML based techniques are curated, agreed upon, high-quality datasets and benchmark tasks to be learned on a given dataset. However, the field of bioacoustics so far lacks such public benchmarks which cover multiple tasks and species to measure the performance of ML techniques in a controlled and standardized way and that allows for benchmarking newly proposed techniques to existing ones. Here, we propose BEANS (the BEnchmark of ANimal Sounds), a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics. The benchmark proposed here consists of two common tasks in bioacoustics: classification and detection. It includes 12 datasets covering various species, including birds, land and marine mammals, anurans, and insects. In addition to the datasets, we also present the performance of a set of standard ML methods as the baseline for task performance. The benchmark and baseline code is made publicly available at \url{https://github.com/earthspecies/beans} in the hope of establishing a new standard dataset for ML-based bioacoustic research. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.10857 [pdf, other]

Modeling Animal Vocalizations through Synthesizers

Authors: Masato Hagiwara, Maddie Cusimano, Jen-Yu Liu

Abstract: Modeling real-world sound is a fundamental problem in the creative use of machine learning and many other fields, including human speech processing and bioacoustics. Transformer-based generative models and some prior work (e.g., DDSP) are known to produce realistic sound, although they have limited control and are hard to interpret. As an alternative, we aim to use modular synthesizers, i.e., comp… ▽ More Modeling real-world sound is a fundamental problem in the creative use of machine learning and many other fields, including human speech processing and bioacoustics. Transformer-based generative models and some prior work (e.g., DDSP) are known to produce realistic sound, although they have limited control and are hard to interpret. As an alternative, we aim to use modular synthesizers, i.e., compositional, parametric electronic musical instruments, for modeling non-music sounds. However, inferring synthesizer parameters given a target sound, i.e., the parameter inference task, is not trivial for general sounds, and past research has typically focused on musical sound. In this work, we optimize a differentiable synthesizer from TorchSynth in order to model, emulate, and creatively generate animal vocalizations. We compare an array of optimization methods, from gradient-based search to genetic algorithms, for inferring its parameters, and then demonstrate how one can control and interpret the parameters for modeling non-music sounds. △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2112.08984 [pdf, other]

Object-based synthesis of scra** and rolling sounds based on non-linear physical constraints

Authors: Vinayak Agarwal, Maddie Cusimano, James Traer, Josh McDermott

Abstract: Sustained contact interactions like scra** and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scra** and rolling sounds with physically and perceptually relevant co… ▽ More Sustained contact interactions like scra** and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scra** and rolling sounds with physically and perceptually relevant controllable parameters constrained by principles of mechanics. Key features of our model include non-linearities to constrain the contact force, naturalistic normal force variation for different motions, and a method for morphing impulse responses within a material to achieve location-dependence. Perceptual experiments show that the presented model is able to synthesize realistic scra** and rolling sounds while conveying physical information similar to that in recorded sounds. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Journal ref: Proceeding of the 24th International Conference on Digital Audio Effects (DAFx-20in21), 2021

Showing 1–3 of 3 results for author: Cusimano, M