-
Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics
Authors:
Ben Williams,
Bart van Merriënboer,
Vincent Dumoulin,
Jenny Hamer,
Eleni Triantafillou,
Abram B. Fleishman,
Matthew McKown,
Jill E. Munger,
Aaron N. Rice,
Ashlee Lillis,
Clemency E. White,
Catherine A. D. Hobbs,
Tries B. Razak,
Kate E. Jones,
Tom Denton
Abstract:
Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pr…
▽ More
Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pretraining strategy for a data-deficient domain using coral reef bioacoustics. We assemble ReefSet, a large annotated library of reef sounds, though modest compared to bird libraries at 2% of the sample count. Through testing few-shot transfer learning performance, we observe that pretraining on bird audio provides notably superior generalizability compared to pretraining on ReefSet or unrelated audio alone. However, our key findings show that cross-domain mixing which leverages bird, reef and unrelated audio during pretraining maximizes reef generalizability. SurfPerch, our pretrained network, provides a strong foundation for automated analysis of marine PAM data with minimal annotation and compute costs.
△ Less
Submitted 7 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Simultaneous linear connectivity of neural networks modulo permutation
Authors:
Ekansh Sharma,
Devin Kwok,
Tom Denton,
Daniel M. Roy,
David Rolnick,
Gintare Karolina Dziugaite
Abstract:
Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between traine…
▽ More
Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between trained networks if they are permuted appropriately. In this work, we refine these arguments into three distinct claims of increasing strength. We show that existing evidence only supports "weak linear connectivity"-that for each pair of networks belonging to a set of SGD solutions, there exist (multiple) permutations that linearly connect it with the other networks. In contrast, the claim "strong linear connectivity"-that for each network, there exists one permutation that simultaneously connects it with the other networks-is both intuitively and practically more desirable. This stronger claim would imply that the loss landscape is convex after accounting for permutation, and enable linear interpolation between three or more independently trained models without increased loss. In this work, we introduce an intermediate claim-that for certain sequences of networks, there exists one permutation that simultaneously aligns matching pairs of networks from these sequences. Specifically, we discover that a single permutation aligns sequences of iteratively trained as well as iteratively pruned networks, meaning that two networks exhibit low loss barriers at each step of their optimization and sparsification trajectories respectively. Finally, we provide the first evidence that strong linear connectivity may be possible under certain conditions, by showing that barriers decrease with increasing network width when interpolating among three networks.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
All Thresholds Barred: Direct Estimation of Call Density in Bioacoustic Data
Authors:
Amanda K. Navine,
Tom Denton,
Matthew J. Weldy,
Patrick J. Hart
Abstract:
Passive acoustic monitoring (PAM) studies generate thousands of hours of audio, which may be used to monitor specific animal populations, conduct broad biodiversity surveys, detect threats such as poachers, and more. Machine learning classifiers for species identification are increasingly being used to process the vast amount of audio generated by bioacoustic surveys, expediting analysis and incre…
▽ More
Passive acoustic monitoring (PAM) studies generate thousands of hours of audio, which may be used to monitor specific animal populations, conduct broad biodiversity surveys, detect threats such as poachers, and more. Machine learning classifiers for species identification are increasingly being used to process the vast amount of audio generated by bioacoustic surveys, expediting analysis and increasing the utility of PAM as a management tool. In common practice, a threshold is applied to classifier output scores, and scores above the threshold are aggregated into a detection count. The choice of threshold produces biased counts of vocalizations, which are subject to false positive/negative rates that may vary across subsets of the dataset. In this work, we advocate for directly estimating call density: The proportion of detection windows containing the target vocalization, regardless of classifier score. Our approach targets a desirable ecological estimator and provides a more rigorous grounding for identifying the core problems caused by distribution shifts -- when the defining characteristics of the data distribution change -- and designing strategies to mitigate them. We propose a validation scheme for estimating call density in a body of data and obtain, through Bayesian reasoning, probability distributions of confidence scores for both the positive and negative classes. We use these distributions to predict site-level densities, which may be subject to distribution shifts. We test our proposed methods on a real-world study of Hawaiian birds and provide simulation results leveraging existing fully annotated datasets, demonstrating robustness to variations in call density and classifier model quality.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics
Authors:
Jenny Hamer,
Eleni Triantafillou,
Bart van Merriënboer,
Stefan Kahl,
Holger Klinck,
Tom Denton,
Vincent Dumoulin
Abstract:
The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We p…
▽ More
The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models.
△ Less
Submitted 13 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Updating Clinical Risk Stratification Models Using Rank-Based Compatibility: Approaches for Evaluating and Optimizing Clinician-Model Team Performance
Authors:
Erkin Ötleş,
Brian T. Denton,
Jenna Wiens
Abstract:
As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision threshol…
▽ More
As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Global birdsong embeddings enable superior transfer learning for bioacoustic classification
Authors:
Burooj Ghani,
Tom Denton,
Stefan Kahl,
Holger Klinck
Abstract:
Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for res…
▽ More
Automated bioacoustic analysis aids understanding and protection of both marine and terrestrial animals and their habitats across extensive spatiotemporal scales, and typically involves analyzing vast collections of acoustic data. With the advent of deep learning models, classification of important signals from these datasets has markedly improved. These models power critical data analyses for research and decision-making in biodiversity monitoring, animal behaviour studies, and natural resource management. However, deep learning models are often data-hungry and require a significant amount of labeled training data to perform well. While sufficient training data is available for certain taxonomic groups (e.g., common bird species), many classes (such as rare and endangered species, many non-bird taxa, and call-type) lack enough data to train a robust model from scratch. This study investigates the utility of feature embeddings extracted from audio classification models to identify bioacoustic classes other than the ones these models were originally trained on. We evaluate models on diverse datasets, including different bird calls and dialect types, bat calls, marine mammals calls, and amphibians calls. The embeddings extracted from the models trained on bird vocalization data consistently allowed higher quality classification than the embeddings trained on general audio datasets. The results of this study indicate that high-quality feature embeddings from large-scale acoustic bird classifiers can be harnessed for few-shot transfer learning, enabling the learning of new classes from a limited quantity of training data. Our findings reveal the potential for efficient analyses of novel bioacoustic tasks, even in scenarios where available training data is limited to a few samples.
△ Less
Submitted 17 November, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
In Search for a Generalizable Method for Source Free Domain Adaptation
Authors:
Malik Boudiaf,
Tom Denton,
Bart van Merriënboer,
Vincent Dumoulin,
Eleni Triantafillou
Abstract:
Source-free domain adaptation (SFDA) is compelling because it allows adapting an off-the-shelf model to a new domain using only unlabelled data. In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. We find existing methods perform differently relat…
▽ More
Source-free domain adaptation (SFDA) is compelling because it allows adapting an off-the-shelf model to a new domain using only unlabelled data. In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. We find existing methods perform differently relative to each other than observed in vision benchmarks, and sometimes perform worse than no adaptation at all. We propose a new simple method which outperforms the existing methods on our new shifts while exhibiting strong performance on a range of vision datasets. Our findings suggest that existing SFDA methods are not as generalizable as previously thought and that considering diverse modalities can be a useful avenue for designing more robust models.
△ Less
Submitted 24 June, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Ultra-Low-Bitrate Speech Coding with Pretrained Transformers
Authors:
Ali Siahkoohi,
Michael Chinen,
Tom Denton,
W. Bastiaan Kleijn,
Jan Skoglund
Abstract:
Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective rec…
▽ More
Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. As such, we use a pretrained Transformer in tandem with a convolutional encoder, which is trained end-to-end with a quantizer and a generative adversarial net decoder. Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of $600\,\mathrm{bps}$ that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. Subjective human evaluations suggest that the quality of the resulting codec is comparable or better than that of conventional codecs operating at three to four times the rate.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Improving Bird Classification with Unsupervised Sound Separation
Authors:
Tom Denton,
Scott Wisdom,
John R. Hershey
Abstract:
This paper addresses the problem of species classification in bird song recordings. The massive amount of available field recordings of birds presents an opportunity to use machine learning to automatically track bird populations. However, it also poses a problem: such field recordings typically contain significant environmental noise and overlap** vocalizations that interfere with classificatio…
▽ More
This paper addresses the problem of species classification in bird song recordings. The massive amount of available field recordings of birds presents an opportunity to use machine learning to automatically track bird populations. However, it also poses a problem: such field recordings typically contain significant environmental noise and overlap** vocalizations that interfere with classification. The widely available training datasets for species identification also typically leave background species unlabeled. This leads classifiers to ignore vocalizations with a low signal-to-noise ratio. However, recent advances in unsupervised sound separation, such as \emph{mixture invariant training} (MixIT), enable high quality separation of bird songs to be learned from such noisy recordings. In this paper, we demonstrate improved separation quality when training a MixIT model specifically for birdsong data, outperforming a general audio separation model by over 5 dB in SI-SNR improvement of reconstructed mixtures. We also demonstrate precision improvements with a downstream multi-species bird classifier across three independent datasets. The best classifier performance is achieved by taking the maximum model activations over the separated channels and original audio. Finally, we document additional classifier improvements, including taxonomic classification, augmentation by random low-pass filters, and additional channel normalization.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Handling Background Noise in Neural Speech Generation
Authors:
Tom Denton,
Alejandro Luebs,
Felicia S. C. Lim,
Andrew Storus,
Hengchin Yeh,
W. Bastiaan Kleijn,
Jan Skoglund
Abstract:
Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing…
▽ More
Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Generative Speech Coding with Predictive Variance Regularization
Authors:
W. Bastiaan Kleijn,
Andrew Storus,
Michael Chinen,
Tom Denton,
Felicia S. C. Lim,
Alejandro Luebs,
Jan Skoglund,
Hengchin Yeh
Abstract:
The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the in…
▽ More
The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model. We introduce predictive-variance regularization to reduce the sensitivity to outliers, resulting in a significant increase in performance. We show that noise reduction to remove unwanted signals can significantly increase performance. We provide extensive subjective performance evaluations that show that our system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Combinatorics of the zeta map on rational Dyck paths
Authors:
Cesar Ceballos,
Tom Denton,
Christopher R. H. Hanusa
Abstract:
An $(a,b)$-Dyck path $P$ is a lattice path from $(0,0)$ to $(b,a)$ that stays above the line $y=\frac{a}{b}x$. The zeta map is a curious rule that maps the set of $(a,b)$-Dyck paths into itself; it is conjecturally bijective, and we provide progress towards proof of bijectivity in this paper, by showing that knowing zeta of $P$ and zeta of $¶$ conjugate is enough to recover $P$.
Our method beget…
▽ More
An $(a,b)$-Dyck path $P$ is a lattice path from $(0,0)$ to $(b,a)$ that stays above the line $y=\frac{a}{b}x$. The zeta map is a curious rule that maps the set of $(a,b)$-Dyck paths into itself; it is conjecturally bijective, and we provide progress towards proof of bijectivity in this paper, by showing that knowing zeta of $P$ and zeta of $¶$ conjugate is enough to recover $P$.
Our method begets an area-preserving involution $χ$ on the set of $(a,b)$-Dyck paths when $ζ$ is a bijection, as well as a new method for calculating $ζ^{-1}$ on classical Dyck paths. For certain nice $(a,b)$-Dyck paths we give an explicit formula for $ζ^{-1}$ and $χ$ and for additional $(a,b)$-Dyck paths we discuss how to compute $ζ^{-1}$ and $χ$ inductively.
We also explore Armstrong's skew length statistic and present two new combinatorial methods for calculating the zeta map involving lasers and interval intersections. We provide a combinatorial statistic $δ$ that can be used to recursively compute $ζ^{-1}$ and show that $δ$ is computable from $ζ(P)$ in the Fuss-Catalan case.
△ Less
Submitted 18 February, 2016; v1 submitted 23 April, 2015;
originally announced April 2015.
-
Algebraic and Affine Pattern Avoidance
Authors:
Tom Denton
Abstract:
We investigate various connections between the 0-Hecke monoid, Catalan monoid, and pattern avoidance in permutations, providing new tools for approaching pattern avoidance in an algebraic framework. In particular, we characterize containment of a class of `long' patterns as equivalent to the existence of a corresponding factorization. We then generalize some of our constructions to the affine sett…
▽ More
We investigate various connections between the 0-Hecke monoid, Catalan monoid, and pattern avoidance in permutations, providing new tools for approaching pattern avoidance in an algebraic framework. In particular, we characterize containment of a class of `long' patterns as equivalent to the existence of a corresponding factorization. We then generalize some of our constructions to the affine setting.
△ Less
Submitted 15 March, 2013;
originally announced March 2013.
-
Canonical Decompositions of Affine Permutations, Affine Codes, and Split $k$-Schur Functions
Authors:
Tom Denton
Abstract:
We study the unique maximal decomposition of an arbitrary affine permutation into a product of cyclically decreasing elements, providing a new perspective on work of Thomas Lam. This decomposition is closely related to the affine code, which generalizes the $k$-bounded partition associated to Grassmannian elements. We also show that the affine code readily encodes a number of basic combinatorial p…
▽ More
We study the unique maximal decomposition of an arbitrary affine permutation into a product of cyclically decreasing elements, providing a new perspective on work of Thomas Lam. This decomposition is closely related to the affine code, which generalizes the $k$-bounded partition associated to Grassmannian elements. We also show that the affine code readily encodes a number of basic combinatorial properties of an affine permutation. As an application, we prove a new special case of the Littlewood-Richardson Rule for $k$-Schur functions, using the canonical decomposition to control for which permutations appear in the expansion of the $k$-Schur function in noncommuting variables over the affine nil-Coxeter algebra.
△ Less
Submitted 27 October, 2012; v1 submitted 11 April, 2012;
originally announced April 2012.
-
Excursions into Algebra and Combinatorics at $q=0$
Authors:
Tom Denton
Abstract:
We explore combinatorics associated with the degenerate Hecke algebra at $q=0$, obtaining a formula for a system of orthogonal idempotents, and also exploring various pattern avoidance results. Generalizing constructions for the 0-Hecke algebra, we explore the representation theory of $\JJ$-trivial monoids.
We then discuss two-tensors of crystal bases for $U_q(\tilde{\mathfrak{sl}_2})$, establis…
▽ More
We explore combinatorics associated with the degenerate Hecke algebra at $q=0$, obtaining a formula for a system of orthogonal idempotents, and also exploring various pattern avoidance results. Generalizing constructions for the 0-Hecke algebra, we explore the representation theory of $\JJ$-trivial monoids.
We then discuss two-tensors of crystal bases for $U_q(\tilde{\mathfrak{sl}_2})$, establishing a complementary result to one of Bandlow, Schilling, and Thiéry on affine crystals arising from promotion operators. Finally, we give a computer implementation of Stembridge's local axioms for simply-laced crystal bases.
△ Less
Submitted 22 August, 2011;
originally announced August 2011.
-
On the representation theory of finite J-trivial monoids
Authors:
Tom Denton,
Florent Hivert,
Anne Schilling,
Nicolas M. Thiéry
Abstract:
In 1979, Norton showed that the representation theory of the 0-Hecke algebra admits a rich combinatorial description. Her constructions rely heavily on some triangularity property of the product, but do not use explicitly that the 0-Hecke algebra is a monoid algebra.
The thesis of this paper is that considering the general setting of monoids admitting such a triangularity, namely J-trivial monoi…
▽ More
In 1979, Norton showed that the representation theory of the 0-Hecke algebra admits a rich combinatorial description. Her constructions rely heavily on some triangularity property of the product, but do not use explicitly that the 0-Hecke algebra is a monoid algebra.
The thesis of this paper is that considering the general setting of monoids admitting such a triangularity, namely J-trivial monoids, sheds further light on the topic. This is a step to use representation theory to automatically extract combinatorial structures from (monoid) algebras, often in the form of posets and lattices, both from a theoretical and computational point of view, and with an implementation in Sage.
Motivated by ongoing work on related monoids associated to Coxeter systems, and building on well-known results in the semi-group community (such as the description of the simple modules or the radical), we describe how most of the data associated to the representation theory (Cartan matrix, quiver) of the algebra of any J-trivial monoid M can be expressed combinatorially by counting appropriate elements in M itself. As a consequence, this data does not depend on the ground field and can be calculated in O(n^2), if not O(nm), where n=|M| and m is the number of generators. Along the way, we construct a triangular decomposition of the identity into orthogonal idempotents, using the usual Möbius inversion formula in the semi-simple quotient (a lattice), followed by an algorithmic lifting step.
Applying our results to the 0-Hecke algebra (in all finite types), we recover previously known results and additionally provide an explicit labeling of the edges of the quiver. We further explore special classes of J-trivial monoids, and in particular monoids of order preserving regressive functions on a poset, generalizing known results on the monoids of nondecreasing parking functions.
△ Less
Submitted 4 March, 2011; v1 submitted 17 October, 2010;
originally announced October 2010.
-
A Combinatorial Formula for Orthogonal Idempotents in the $0$-Hecke Algebra of the Symmetric Group
Authors:
Tom Denton
Abstract:
Building on the work of P.N. Norton, we give combinatorial formulae for two maximal decompositions of the identity into orthogonal idempotents in the $0$-Hecke algebra of the symmetric group, $\mathbb{C}H_0(S_N)$. This construction is compatible with the branching from $S_{N-1}$ to $S_{N}$.
Building on the work of P.N. Norton, we give combinatorial formulae for two maximal decompositions of the identity into orthogonal idempotents in the $0$-Hecke algebra of the symmetric group, $\mathbb{C}H_0(S_N)$. This construction is compatible with the branching from $S_{N-1}$ to $S_{N}$.
△ Less
Submitted 13 August, 2010;
originally announced August 2010.