Search | arXiv e-print repository

Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search

Authors: Matthew C. McCallum, Florian Henkel, Jaehun Kim, Samuel E. Sandberg, Matthew E. P. Davies

Abstract: Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces rep… ▽ More Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces representing specific, yet possibly correlated, attributes can be weighted to emphasize those attributes in downstream tasks. However, no research has been conducted into the independence of these subspaces, nor their manipulation, in order to retrieve tracks that are similar but different in a specific way. Here, we explore the manipulation of tempo in embedding spaces as a case-study towards this goal. We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre. As this translation is specific to tempo it enables retrieval of tracks that are similar but have specifically different tempi. We show that such a function can be used as an efficient data augmentation strategy for both training of downstream tempo predictors, and improved nearest neighbor retrieval of properties largely independent of tempo. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

arXiv:2401.08891 [pdf, other]

Tempo estimation as fully self-supervised binary classification

Authors: Florian Henkel, Jaehun Kim, Matthew C. McCallum, Samuel E. Sandberg, Matthew E. P. Davies

Abstract: This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. Towards alleviating this issue, we propose a fully self-supervised approach that does not rely on any human labeled data. Our method builds on the… ▽ More This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. Towards alleviating this issue, we propose a fully self-supervised approach that does not rely on any human labeled data. Our method builds on the fact that generic (music) audio embeddings already encode a variety of properties, including information about tempo, making them easily adaptable for downstream tasks. While recent work in self-supervised tempo estimation aimed to learn a tempo specific representation that was subsequently used to train a supervised classifier, we reformulate the task into the binary classification problem of predicting whether a target track has the same or a different tempo compared to a reference. While the former still requires labeled training data for the final classification model, our approach uses arbitrary unlabeled music data in combination with time-stretching for model training as well as a small set of synthetically created reference samples for predicting the final tempo. Evaluation of our approach in comparison with the state-of-the-art reveals highly competitive performance when the constraint of finding the precise tempo octave is relaxed. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

arXiv:2401.08889 [pdf, other]

On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations

Authors: Matthew C. McCallum, Matthew E. P. Davies, Florian Henkel, Jaehun Kim, Samuel E. Sandberg

Abstract: Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties of the embedding spaces themselves which are important in nearest neighbor algorithms, commonly used in music search and recommendation. In this wo… ▽ More Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties of the embedding spaces themselves which are important in nearest neighbor algorithms, commonly used in music search and recommendation. In this work we show that when learning audio representations on music datasets via contrastive learning, musical properties that are typically homogeneous within a track (e.g., key and tempo) are reflected in the locality of neighborhoods in the resulting embedding space. By applying appropriate data augmentation strategies, localisation of such properties can not only be reduced but the localisation of other attributes is increased. For example, locality of features such as pitch and tempo that are less relevant to non-expert listeners, may be mitigated while improving the locality of more salient features such as genre and mood, achieving state-of-the-art performance in nearest neighbor retrieval accuracy. Similarly, we show that the optimal selection of data augmentation strategies for contrastive learning of music audio embeddings is dependent on the downstream task, highlighting this as an important embedding design decision. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

arXiv:2308.10355 [pdf, other]

Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music

Authors: Ching-Yu Chiu, Meinard Müller, Matthew E. P. Davies, Alvin Wen-Yu Su, Yi-Hsuan Yang

Abstract: To model the periodicity of beats, state-of-the-art beat tracking systems use "post-processing trackers" (PPTs) that rely on several empirically determined global assumptions for tempo transition, which work well for music with a steady tempo. For expressive classical music, however, these assumptions can be too rigid. With two large datasets of Western classical piano music, namely the Aligned Sc… ▽ More To model the periodicity of beats, state-of-the-art beat tracking systems use "post-processing trackers" (PPTs) that rely on several empirically determined global assumptions for tempo transition, which work well for music with a steady tempo. For expressive classical music, however, these assumptions can be too rigid. With two large datasets of Western classical piano music, namely the Aligned Scores and Performances (ASAP) dataset and a dataset of Chopin's Mazurkas (Maz-5), we report on experiments showing the failure of existing PPTs to cope with local tempo changes, thus calling for new methods. In this paper, we propose a new local periodicity-based PPT, called predominant local pulse-based dynamic programming (PLPDP) tracking, that allows for more flexible tempo transitions. Specifically, the new PPT incorporates a method called "predominant local pulses" (PLP) in combination with a dynamic programming (DP) component to jointly consider the locally detected periodicity and beat activation strength at each time instant. Accordingly, PLPDP accounts for the local periodicity, rather than relying on a global tempo assumption. Compared to existing PPTs, PLPDP particularly enhances the recall values at the cost of a lower precision, resulting in an overall improvement of F1-score for beat tracking in ASAP (from 0.473 to 0.493) and Maz-5 (from 0.595 to 0.838). △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (July 2023)

arXiv:2304.06868 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095292

Tempo vs. Pitch: understanding self-supervised tempo estimation

Authors: Giovana Morais, Matthew E. P. Davies, Marcelo Queiroz, Magdalena Fuentes

Abstract: Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, environmental sound analysis, and recently in music information retrieval, e.g. for pitch estimation. Particularly in the context of music, there are… ▽ More Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, environmental sound analysis, and recently in music information retrieval, e.g. for pitch estimation. Particularly in the context of music, there are few insights about the fragility of these models regarding different distributions of data, and how they could be mitigated. In this paper, we explore these questions by dissecting a self-supervised model for pitch estimation adapted for tempo estimation via rigorous experimentation with synthetic data. Specifically, we study the relationship between the input representation and data distribution for self-supervised tempo estimation. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 5 pages, 3 figures, published on 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing

arXiv:2211.14162 [pdf, other]

A Gaussian Process Regression based Dynamical Models Learning Algorithm for Target Tracking

Authors: Mengwei Sun, Mike E. Davies, Ian K. Proudler, James R. Hopgood

Abstract: Maneuvering target tracking is a challenging problem for sensor systems because of the unpredictability of the targets' motions. This paper proposes a novel data-driven method for learning the dynamical motion model of a target. Non-parametric Gaussian process regression (GPR) is used to learn a target's naturally shift invariant motion (NSIM) behavior, which is translationally invariant and does… ▽ More Maneuvering target tracking is a challenging problem for sensor systems because of the unpredictability of the targets' motions. This paper proposes a novel data-driven method for learning the dynamical motion model of a target. Non-parametric Gaussian process regression (GPR) is used to learn a target's naturally shift invariant motion (NSIM) behavior, which is translationally invariant and does not need to be constantly updated as the target moves. The learned Gaussian processes (GPs) can be applied to track targets within different surveillance regions from the surveillance region of the training data by being incorporated into the particle filter (PF) implementation. The performance of our proposed approach is evaluated over different maneuvering scenarios by being compared with commonly used interacting multiple model (IMM)-PF methods and provides around $90\%$ performance improvement for a multi-target tracking (MTT) highly maneuvering scenario. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 11 pages, 10 figures

arXiv:2210.07314 [pdf, ps, other]

Spline Sketches: An Efficient Approach for Photon Counting Lidar

Authors: Michael Patrick Sheehan, Julian Tachella, Mike E. Davies

Abstract: Photon counting lidar has become an invaluable tool for 3D depth imaging due to the fine-precision it can achieve over long ranges. However, high frame rate, high resolution lidar devices produce an enormous amount of time-of-flight (ToF) data which can cause a severe data processing bottleneck hindering the deployment of real-time systems. In this paper, an efficient photon counting approach is p… ▽ More Photon counting lidar has become an invaluable tool for 3D depth imaging due to the fine-precision it can achieve over long ranges. However, high frame rate, high resolution lidar devices produce an enormous amount of time-of-flight (ToF) data which can cause a severe data processing bottleneck hindering the deployment of real-time systems. In this paper, an efficient photon counting approach is proposed that exploits the simplicity of piecewise polynomial splines to form a hardware-friendly compressed statistic, or a so-called spline sketch, of the ToF data without sacrificing the quality of the recovered image. As each piecewise polynomial spline is a simple function with limited support over the timing depth window, the spline sketch can be computed efficiently on-chip with minimal computational overhead. We show that a piecewise linear or quadratic spline sketch, requiring minimal on-chip arithmetic computation per photon detection, can reconstruct real-world depth images with negligible loss of resolution whilst achieving $95\%$ compression compared to the full ToF data, as well as offering multi-peak detection performance. These contrast with previously proposed coarse binning histograms that suffer from a highly nonuniform accuracy across depth and can fail catastrophically when associated with bright reflectors. Further, by building range-walk correction into the proposed estimation algorithms, it is demonstrated that the spline sketches can be made robust to photon pile-up effects. The computational complexity of both the reconstruction and range walk correction algorithms scale only with the size of the spline sketch which is independent to both the photon count and temporal resolution of the lidar device. △ Less

Submitted 29 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: 13 pages, 13 figures

arXiv:2210.06817 [pdf, other]

doi 10.1109/LSP.2022.3215106

An Analysis Method for Metric-Level Switching in Beat Tracking

Authors: Ching-Yu Chiu, Meinard Müller, Matthew E. P. Davies, Alvin Wen-Yu Su, Yi-Hsuan Yang

Abstract: For expressive music, the tempo may change over time, posing challenges to tracking the beats by an automatic model. The model may first tap to the correct tempo, but then may fail to adapt to a tempo change, or switch between several incorrect but perceptually plausible ones (e.g., half- or double-tempo). Existing evaluation metrics for beat tracking do not reflect such behaviors, as they typical… ▽ More For expressive music, the tempo may change over time, posing challenges to tracking the beats by an automatic model. The model may first tap to the correct tempo, but then may fail to adapt to a tempo change, or switch between several incorrect but perceptually plausible ones (e.g., half- or double-tempo). Existing evaluation metrics for beat tracking do not reflect such behaviors, as they typically assume a fixed relationship between the reference beats and estimated beats. In this paper, we propose a new performance analysis method, called annotation coverage ratio (ACR), that accounts for a variety of possible metric-level switching behaviors of beat trackers. The idea is to derive sequences of modified reference beats of all metrical levels for every two consecutive reference beats, and compare every sequence of modified reference beats to the subsequences of estimated beats. We show via experiments on three datasets of different genres the usefulness of ACR when utilized alongside existing metrics, and discuss the new insights to be gained. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted to IEEE Signal Processing Letters (Oct. 2022)

arXiv:2203.16165 [pdf, other]

doi 10.1109/ACCESS.2022.3169744

Symbolic music generation conditioned on continuous-valued emotions

Authors: Serkan Sulun, Matthew E. P. Davies, Paula Viana

Abstract: In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We… ▽ More In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art. △ Less

Submitted 4 May, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: Published in IEEE Access

Journal ref: volume:10, year:2022, pages:44617-44626

arXiv:2203.08300 [pdf, other]

doi 10.1109/TSP.2023.3250829

Adaptive Kernel Kalman Filter

Authors: Mengwei Sun, Mike E. Davies, Ian K. Proudler, James R. Hopgood

Abstract: Sequential Bayesian filters in non-linear dynamic systems require the recursive estimation of the predictive and posterior distributions. This paper introduces a Bayesian filter called the adaptive kernel Kalman filter (AKKF). With this filter, the arbitrary predictive and posterior distributions of hidden states are approximated using the empirical kernel mean embeddings (KMEs) in reproducing ker… ▽ More Sequential Bayesian filters in non-linear dynamic systems require the recursive estimation of the predictive and posterior distributions. This paper introduces a Bayesian filter called the adaptive kernel Kalman filter (AKKF). With this filter, the arbitrary predictive and posterior distributions of hidden states are approximated using the empirical kernel mean embeddings (KMEs) in reproducing kernel Hilbert spaces (RKHSs). In parallel with the KMEs, some particles, in the data space, are used to capture the properties of the dynamical system model. Specifically, particles are generated and updated in the data space, while the corresponding kernel weight mean vector and covariance matrix associated with the feature map**s of the particles are predicted and updated in the RKHSs based on the kernel Kalman rule (KKR). Simulation results are presented to confirm the improved performance of our approach with significantly reduced particle numbers, by comparing with the unscented Kalman filter (UKF), particle filter (PF) and Gaussian particle filter (GPF). For example, compared with the GPF, the proposed approach provides around 5% logarithmic mean square error (LMSE) tracking performance improvement in the bearing-only tracking (BOT) system when using 50 particles. △ Less

Submitted 27 February, 2023; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: The manuscript has been accepted for publication as a regular paper in the IEEE Transactions on Signal Processing

arXiv:2203.00952 [pdf, other]

Sketched RT3D: How to reconstruct billions of photons per second

Authors: Julián Tachella, Michael P. Sheehan, Mike E. Davies

Abstract: Single-photon light detection and ranging (lidar) captures depth and intensity information of a 3D scene. Reconstructing a scene from observed photons is a challenging task due to spurious detections associated with background illumination sources. To tackle this problem, there is a plethora of 3D reconstruction algorithms which exploit spatial regularity of natural scenes to provide stable recons… ▽ More Single-photon light detection and ranging (lidar) captures depth and intensity information of a 3D scene. Reconstructing a scene from observed photons is a challenging task due to spurious detections associated with background illumination sources. To tackle this problem, there is a plethora of 3D reconstruction algorithms which exploit spatial regularity of natural scenes to provide stable reconstructions. However, most existing algorithms have computational and memory complexity proportional to the number of recorded photons. This complexity hinders their real-time deployment on modern lidar arrays which acquire billions of photons per second. Leveraging a recent lidar sketching framework, we show that it is possible to modify existing reconstruction algorithms such that they only require a small sketch of the photon information. In particular, we propose a sketched version of a recent state-of-the-art algorithm which uses point cloud denoisers to provide spatially regularized reconstructions. A series of experiments performed on real lidar datasets demonstrates a significant reduction of execution time and memory requirements, while achieving the same reconstruction performance than in the full data case. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: Accepted at ICASSP 2022

arXiv:2201.09375 [pdf, other]

Deep Unrolling for Magnetic Resonance Fingerprinting

Authors: Dongdong Chen, Mike E. Davies, Mohammad Golbabaee

Abstract: Magnetic Resonance Fingerprinting (MRF) has emerged as a promising quantitative MR imaging approach. Deep learning methods have been proposed for MRF and demonstrated improved performance over classical compressed sensing algorithms. However many of these end-to-end models are physics-free, while consistency of the predictions with respect to the physical forward model is crucial for reliably solv… ▽ More Magnetic Resonance Fingerprinting (MRF) has emerged as a promising quantitative MR imaging approach. Deep learning methods have been proposed for MRF and demonstrated improved performance over classical compressed sensing algorithms. However many of these end-to-end models are physics-free, while consistency of the predictions with respect to the physical forward model is crucial for reliably solving inverse problems. To address this, recently [1] proposed a proximal gradient descent framework that directly incorporates the forward acquisition and Bloch dynamic models within an unrolled learning mechanism. However, [1] only evaluated the unrolled model on synthetic data using Cartesian sampling trajectories. In this paper, as a complementary to [1], we investigate other choices of encoders to build the proximal neural network, and evaluate the deep unrolling algorithm on real accelerated MRF scans with non-Cartesian k-space sampling trajectories. △ Less

Submitted 25 January, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

Comments: Tech report. arXiv admin note: substantial text overlap with arXiv:2006.15271

arXiv:2111.12855 [pdf, other]

Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements

Authors: Dongdong Chen, Julián Tachella, Mike E. Davies

Abstract: Deep networks provide state-of-the-art performance in multiple imaging inverse problems ranging from medical imaging to computational photography. However, most existing networks are trained with clean signals which are often hard or impossible to obtain. Equivariant imaging (EI) is a recent self-supervised learning framework that exploits the group invariance present in signal distributions to le… ▽ More Deep networks provide state-of-the-art performance in multiple imaging inverse problems ranging from medical imaging to computational photography. However, most existing networks are trained with clean signals which are often hard or impossible to obtain. Equivariant imaging (EI) is a recent self-supervised learning framework that exploits the group invariance present in signal distributions to learn a reconstruction function from partial measurement data alone. While EI results are impressive, its performance degrades with increasing noise. In this paper, we propose a Robust Equivariant Imaging (REI) framework which can learn to image from noisy partial measurements alone. The proposed method uses Stein's Unbiased Risk Estimator (SURE) to obtain a fully unsupervised training loss that is robust to noise. We show that REI leads to considerable performance gains on linear and nonlinear inverse problems, thereby paving the way for robust unsupervised imaging with deep networks. Code is available at: https://github.com/edongdongchen/REI. △ Less

Submitted 15 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: CVPR 2022. Code: https://github.com/edongdongchen/REI

arXiv:2110.08045 [pdf, ps, other]

Compressive Independent Component Analysis: Theory and Algorithms

Authors: Michael P. Sheehan, Mike E. Davies

Abstract: Compressive learning forms the exciting intersection between compressed sensing and statistical learning where one exploits forms of sparsity and structure to reduce the memory and/or computational complexity of the learning task. In this paper, we look at the independent component analysis (ICA) model through the compressive learning lens. In particular, we show that solutions to the cumulant bas… ▽ More Compressive learning forms the exciting intersection between compressed sensing and statistical learning where one exploits forms of sparsity and structure to reduce the memory and/or computational complexity of the learning task. In this paper, we look at the independent component analysis (ICA) model through the compressive learning lens. In particular, we show that solutions to the cumulant based ICA model have particular structure that induces a low dimensional model set that resides in the cumulant tensor space. By showing a restricted isometry property holds for random cumulants e.g. Gaussian ensembles, we prove the existence of a compressive ICA scheme. Thereafter, we propose two algorithms of the form of an iterative projection gradient (IPG) and an alternating steepest descent (ASD) algorithm for compressive ICA, where the order of compression asserted from the restricted isometry property is realised through empirical results. We provide analysis of the CICA algorithms including the effects of finite samples. The effects of compression are characterised by a trade-off between the sketch size and the statistical efficiency of the ICA estimates. By considering synthetic and real datasets, we show the substantial memory gains achieved over well-known ICA algorithms by using one of the proposed CICA algorithms. Finally, we conclude the paper with open problems including interesting challenges from the emerging field of compressive learning. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: 27 pages, 8 figures, under review

arXiv:2105.06920 [pdf, ps, other]

Surface Detection for Sketched Single Photon Lidar

Authors: Michael P. Sheehan, Julián Tachella, Mike E. Davies

Abstract: Single-photon lidar devices are able to collect an ever-increasing amount of time-stamped photons in small time periods due to increasingly larger arrays, generating a memory and computational bottleneck on the data processing side. Recently, a sketching technique was introduced to overcome this bottleneck which compresses the amount of information to be stored and processed. The size of the sketc… ▽ More Single-photon lidar devices are able to collect an ever-increasing amount of time-stamped photons in small time periods due to increasingly larger arrays, generating a memory and computational bottleneck on the data processing side. Recently, a sketching technique was introduced to overcome this bottleneck which compresses the amount of information to be stored and processed. The size of the sketch scales with the number of underlying parameters of the time delay distribution and not, fundamentally, with either the number of detected photons or the time-stamp resolution. In this paper, we propose a detection algorithm based solely on a small sketch that determines if there are surfaces or objects in the scene or not. If a surface is detected, the depth and intensity of a single object can be computed in closed-form directly from the sketch. The computational load of the proposed detection algorithm depends solely on the size of the sketch, in contrast to previous algorithms that depend at least linearly in the number of collected photons or histogram bins, paving the way for fast, accurate and memory efficient lidar estimation. Our experiments demonstrate the memory and statistical efficiency of the proposed algorithm both on synthetic and real lidar datasets. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Comments: 5 pages, Accepted at EUSIPCO 2021

arXiv:2103.14756 [pdf, other]

Equivariant Imaging: Learning Beyond the Range Space

Authors: Dongdong Chen, Julián Tachella, Mike E. Davies

Abstract: In various imaging problems, we only have access to compressed measurements of the underlying signals, hindering most learning-based strategies which usually require pairs of signals and associated measurements for training. Learning only from compressed measurements is impossible in general, as the compressed observations do not contain information outside the range of the forward sensing operato… ▽ More In various imaging problems, we only have access to compressed measurements of the underlying signals, hindering most learning-based strategies which usually require pairs of signals and associated measurements for training. Learning only from compressed measurements is impossible in general, as the compressed observations do not contain information outside the range of the forward sensing operator. We propose a new end-to-end self-supervised framework that overcomes this limitation by exploiting the equivariances present in natural signals. Our proposed learning strategy performs as well as fully supervised methods. Experiments demonstrate the potential of this framework on inverse problems including sparse-view X-ray computed tomography on real clinical data and image inpainting on natural images. Code has been made available at: https://github.com/edongdongchen/EI. △ Less

Submitted 23 August, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: ICCV 2021. Code: https://github.com/edongdongchen/EI

arXiv:2102.08732 [pdf, ps, other]

doi 10.1109/TCI.2021.3113495

A Sketching Framework for Reduced Data Transfer in Photon Counting Lidar

Authors: Michael P. Sheehan, Julián Tachella, Mike E. Davies

Abstract: Single-photon lidar has become a prominent tool for depth imaging in recent years. At the core of the technique, the depth of a target is measured by constructing a histogram of time delays between emitted light pulses and detected photon arrivals. A major data processing bottleneck arises on the device when either the number of photons per pixel is large or the resolution of the time stamp is fin… ▽ More Single-photon lidar has become a prominent tool for depth imaging in recent years. At the core of the technique, the depth of a target is measured by constructing a histogram of time delays between emitted light pulses and detected photon arrivals. A major data processing bottleneck arises on the device when either the number of photons per pixel is large or the resolution of the time stamp is fine, as both the space requirement and the complexity of the image reconstruction algorithms scale with these parameters. We solve this limiting bottleneck of existing lidar techniques by sampling the characteristic function of the time of flight (ToF) model to build a compressive statistic, a so-called sketch of the time delay distribution, which is sufficient to infer the spatial distance and intensity of the object. The size of the sketch scales with the degrees of freedom of the ToF model (number of objects) and not, fundamentally, with the number of photons or the time stamp resolution. Moreover, the sketch is highly amenable for on-chip online processing. We show theoretically that the loss of information for compression is controlled and the mean squared error of the inference quickly converges towards the optimal Cramér-Rao bound (i.e. no loss of information) for modest sketch sizes. The proposed compressed single-photon lidar framework is tested and evaluated on real life datasets of complex scenes where it is shown that a compression rate of up-to 150 is achievable in practice without sacrificing the overall resolution of the reconstructed image. △ Less

Submitted 5 January, 2022; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: 16 pages, 20 figures. Figure 8 Corrected. Accepted at IEEE TCI

Journal ref: IEEE Transactions on Computational Imaging, Volume 7, 2021, Pages 989 - 1004

arXiv:2011.07274 [pdf, other]

doi 10.1109/JSTSP.2020.3037485

On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Authors: Serkan Sulun, Matthew E. P. Davies

Abstract: In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when t… ▽ More In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when training and subsequently testing the network. For two different state of the art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low pass filters during training and leads to improved generalization to unseen filtering conditions at test time. △ Less

Submitted 6 January, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

Comments: Qualitative examples on https://serkansulun.com/bwe. Source code on https://github.com/serkansulun/deep-music-enhancer

arXiv:2008.11529 [pdf, other]

TIV.lib: an open-source library for the tonal description of musical audio

Authors: António Ramires, Gilberto Bernardes, Matthew E. P. Davies, Xavier Serra

Abstract: In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed - e.g., harmonic change, dissonance, diatonicity, and m… ▽ More In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed - e.g., harmonic change, dissonance, diatonicity, and musical key. The library is cross-platform, implemented in Python and the graphical programming language Pure Data, and can be used in both online and offline scenarios. Of note is its potential for enhanced Music Information Retrieval, where tonal descriptors sit at the core of numerous methods and applications. △ Less

Submitted 26 August, 2020; originally announced August 2020.

arXiv:2008.02957 [pdf, ps, other]

Dual Convolutional Neural Networks for Breast Mass Segmentation and Diagnosis in Mammography

Authors: Heyi Li, Dongdong Chen, William H. Nailon, Mike E. Davies, David Laurenson

Abstract: Deep convolutional neural networks (CNNs) have emerged as a new paradigm for Mammogram diagnosis. Contemporary CNN-based computer-aided-diagnosis (CAD) for breast cancer directly extract latent features from input mammogram image and ignore the importance of morphological features. In this paper, we introduce a novel deep learning framework for mammogram image processing, which computes mass segme… ▽ More Deep convolutional neural networks (CNNs) have emerged as a new paradigm for Mammogram diagnosis. Contemporary CNN-based computer-aided-diagnosis (CAD) for breast cancer directly extract latent features from input mammogram image and ignore the importance of morphological features. In this paper, we introduce a novel deep learning framework for mammogram image processing, which computes mass segmentation and simultaneously predict diagnosis results. Specifically, our method is constructed in a dual-path architecture that solves the map** in a dual-problem manner, with an additional consideration of important shape and boundary knowledge. One path called the Locality Preserving Learner (LPL), is devoted to hierarchically extracting and exploiting intrinsic features of the input. Whereas the other path, called the Conditional Graph Learner (CGL) focuses on generating geometrical features via modeling pixel-wise image to mask correlations. By integrating the two learners, both the semantics and structure are well preserved and the component learning paths in return complement each other, contributing an improvement to the mass segmentation and cancer classification problem at the same time. We evaluated our method on two most used public mammography datasets, DDSM and INbreast. Experimental results show that DualCoreNet achieves the best mammography segmentation and classification simultaneously, outperforming recent state-of-the-art models. △ Less

Submitted 11 August, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

arXiv:2007.14281 [pdf, ps, other]

DeepMP for Non-Negative Sparse Decomposition

Authors: Konstantinos A. Voulgaris, Mike E. Davies, Mehrdad Yaghoobi

Abstract: Non-negative signals form an important class of sparse signals. Many algorithms have already beenproposed to recover such non-negative representations, where greedy and convex relaxed algorithms are among the most popular methods. The greedy techniques are low computational cost algorithms, which have also been modified to incorporate the non-negativity of the representations. One such modificatio… ▽ More Non-negative signals form an important class of sparse signals. Many algorithms have already beenproposed to recover such non-negative representations, where greedy and convex relaxed algorithms are among the most popular methods. The greedy techniques are low computational cost algorithms, which have also been modified to incorporate the non-negativity of the representations. One such modification has been proposed for Matching Pursuit (MP) based algorithms, which first chooses positive coefficients and uses a non-negative optimisation technique that guarantees the non-negativity of the coefficients. The performance of greedy algorithms, like all non-exhaustive search methods, suffer from high coherence with the linear generative model, called the dictionary. We here first reformulate the non-negative matching pursuit algorithm in the form of a deep neural network. We then show that the proposed model after training yields a significant improvement in terms of exact recovery performance, compared to other non-trained greedy algorithms, while kee** the complexity low. △ Less

Submitted 28 July, 2020; originally announced July 2020.

arXiv:2006.15271 [pdf, other]

Compressive MR Fingerprinting reconstruction with Neural Proximal Gradient iterations

Authors: Dongdong Chen, Mike E. Davies, Mohammad Golbabaee

Abstract: Consistency of the predictions with respect to the physical forward model is pivotal for reliably solving inverse problems. This consistency is mostly un-controlled in the current end-to-end deep learning methodologies proposed for the Magnetic Resonance Fingerprinting (MRF) problem. To address this, we propose ProxNet, a learned proximal gradient descent framework that directly incorporates the f… ▽ More Consistency of the predictions with respect to the physical forward model is pivotal for reliably solving inverse problems. This consistency is mostly un-controlled in the current end-to-end deep learning methodologies proposed for the Magnetic Resonance Fingerprinting (MRF) problem. To address this, we propose ProxNet, a learned proximal gradient descent framework that directly incorporates the forward acquisition and Bloch dynamic models within a recurrent learning mechanism. The ProxNet adopts a compact neural proximal model for de-aliasing and quantitative inference, that can be flexibly trained on scarce MRF training datasets. Our numerical experiments show that the ProxNet can achieve a superior quantitative inference accuracy, much smaller storage requirement, and a comparable runtime to the recent deep learning MRF baselines, while being much faster than the dictionary matching schemes. Code has been released at https://github.com/edongdongchen/PGD-Net. △ Less

Submitted 6 July, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: To appear in MICCAI 2020

arXiv:2006.04472 [pdf, ps, other]

Accelerated Search for Non-Negative Greedy Sparse Decomposition via Dimensionality Reduction

Authors: Konstantinos Voulgaris, Mike E. Davies, Mehrdad Yaghoobi

Abstract: Non-negative signals form an important class of sparse signals. Many algorithms have already beenproposed to recover such non-negative representations, where greedy and convex relaxed algorithms are among the most popular methods. One fast implementation is the FNNOMP algorithm that updates the non-negative coefficients in an iterative manner. Even though FNNOMP is a good approach when working on… ▽ More Non-negative signals form an important class of sparse signals. Many algorithms have already beenproposed to recover such non-negative representations, where greedy and convex relaxed algorithms are among the most popular methods. One fast implementation is the FNNOMP algorithm that updates the non-negative coefficients in an iterative manner. Even though FNNOMP is a good approach when working on libraries of small size, the operational time of the algorithm grows significantly when the size of the library is large. This is mainly due to the selection step of the algorithm that relies on matrix vector multiplications. We here introduce the Embedded Nearest Neighbor (E-NN) algorithm which accelerates the search over large datasets while it is guaranteed to find the most correlated atoms. We then replace the selection step of FNNOMP by E-NN. Furthermore we introduce the Update Nearest Neighbor (U-NN) at the look up table of FNNOMP in order to assure the non-negativity criteria of FNNOMP. The results indicate that the proposed methodology can accelerate FNNOMP with a factor 4 on a real dataset of Raman Spectra and with a factor of 22 on a synthetic dataset. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2004.09211 [pdf, other]

doi 10.1109/TIP.2020.3046882

Robust 3D reconstruction of dynamic scenes from single-photon lidar using Beta-divergences

Authors: Quentin Legros, Julian Tachella, Rachael Tobin, Aongus McCarthy, Sylvain Meignen, Gerald S. Buller, Yoann Altmann, Stephen McLaughlin, Michael E. Davies

Abstract: In this paper, we present a new algorithm for fast, online 3D reconstruction of dynamic scenes using times of arrival of photons recorded by single-photon detector arrays. One of the main challenges in 3D imaging using single-photon lidar in practical applications is the presence of strong ambient illumination which corrupts the data and can jeopardize the detection of peaks/surface in the signals… ▽ More In this paper, we present a new algorithm for fast, online 3D reconstruction of dynamic scenes using times of arrival of photons recorded by single-photon detector arrays. One of the main challenges in 3D imaging using single-photon lidar in practical applications is the presence of strong ambient illumination which corrupts the data and can jeopardize the detection of peaks/surface in the signals. This background noise not only complicates the observation model classically used for 3D reconstruction but also the estimation procedure which requires iterative methods. In this work, we consider a new similarity measure for robust depth estimation, which allows us to use a simple observation model and a non-iterative estimation procedure while being robust to mis-specification of the background illumination model. This choice leads to a computationally attractive depth estimation procedure without significant degradation of the reconstruction performance. This new depth estimation procedure is coupled with a spatio-temporal model to capture the natural correlation between neighboring pixels and successive frames for dynamic scene analysis. The resulting online inference process is scalable and well suited for parallel implementation. The benefits of the proposed method are demonstrated through a series of experiments conducted with simulated and real single-photon lidar videos, allowing the analysis of dynamic scenes at 325 m observed under extreme ambient illumination conditions. △ Less

Submitted 18 December, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: 12 pages

arXiv:1912.02880 [pdf, ps, other]

doi 10.1109/LSP.2020.2973506

(l1,l2)-RIP and Projected Back-Projection Reconstruction for Phase-Only Measurements

Authors: Thomas Feuillen, Mike E. Davies, Luc Vandendorpe, Laurent Jacques

Abstract: This letter analyzes the performances of a simple reconstruction method, namely the Projected Back-Projection (PBP), for estimating the direction of a sparse signal from its phase-only (or amplitude-less) complex Gaussian random measurements, i.e., an extension of one-bit compressive sensing to the complex field. To study the performances of this algorithm, we show that complex Gaussian random mat… ▽ More This letter analyzes the performances of a simple reconstruction method, namely the Projected Back-Projection (PBP), for estimating the direction of a sparse signal from its phase-only (or amplitude-less) complex Gaussian random measurements, i.e., an extension of one-bit compressive sensing to the complex field. To study the performances of this algorithm, we show that complex Gaussian random matrices respect, with high probability, a variant of the Restricted Isometry Property (RIP) relating to the l1 -norm of the sparse signal measurements to their l2 -norm. This property allows us to upper-bound the reconstruction error of PBP in the presence of phase noise. Monte Carlo simulations are performed to highlight the performance of our approach in this phase-only acquisition model when compared to error achieved by PBP in classical compressive sensing. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: 4 pages, 2 figures

arXiv:1911.11028 [pdf, other]

Deep Decomposition Learning for Inverse Imaging Problems

Authors: Dongdong Chen, Mike E. Davies

Abstract: Deep learning is emerging as a new paradigm for solving inverse imaging problems. However, the deep learning methods often lack the assurance of traditional physics-based methods due to the lack of physical information considerations in neural network training and deploying. The appropriate supervision and explicit calibration by the information of the physic model can enhance the neural network l… ▽ More Deep learning is emerging as a new paradigm for solving inverse imaging problems. However, the deep learning methods often lack the assurance of traditional physics-based methods due to the lack of physical information considerations in neural network training and deploying. The appropriate supervision and explicit calibration by the information of the physic model can enhance the neural network learning and its practical performance. In this paper, inspired by the geometry that data can be decomposed by two components from the null-space of the forward operator and the range space of its pseudo-inverse, we train neural networks to learn the two components and therefore learn the decomposition, i.e. we explicitly reformulate the neural network layers as learning range-nullspace decomposition functions with reference to the layer inputs, instead of learning unreferenced functions. We empirically show that the proposed framework demonstrates superior performance over recent deep residual learning, unrolled learning and nullspace learning on tasks including compressive sensing medical imaging and natural image super-resolution. Our code is available at https://github.com/edongdongchen/DDN. △ Less

Submitted 16 July, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: To appear in ECCV 2020

arXiv:1911.09846 [pdf, other]

A Fully Convolutional Network for MR Fingerprinting

Authors: Dongdong Chen, Mohammad Golbabaee, Pedro A. Gomez, Marion I. Menzel, Mike E. Davies

Abstract: Magnetic Resonance Fingerprinting (MRF) methods typically rely on dictionary matching to map the temporal MRF signals to quantitative tissue parameters. These methods suffer from heavy storage and computation requirements as the dictionary size grows. To address these issues, we proposed an end to end fully convolutional neural network for MRF reconstruction (MRF-FCNN), which firstly employ linear… ▽ More Magnetic Resonance Fingerprinting (MRF) methods typically rely on dictionary matching to map the temporal MRF signals to quantitative tissue parameters. These methods suffer from heavy storage and computation requirements as the dictionary size grows. To address these issues, we proposed an end to end fully convolutional neural network for MRF reconstruction (MRF-FCNN), which firstly employ linear dimensionality reduction and then use neural network to project the data into the tissue parameters manifold space. Experiments on the MAGIC data demonstrate the effectiveness of the method. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: The Signal Processing with Adaptive Sparse Structured Representations (SPARS'2019) workshop

arXiv:1907.00300 [pdf, other]

Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis

Authors: Heyi Li, Dongdong Chen, William H. Nailon, Mike E. Davies, David I. Laurenson

Abstract: Computer-aided breast cancer diagnosis in mammography is limited by inadequate data and the similarity between benign and cancerous masses. To address this, we propose a signed graph regularized deep neural network with adversarial augmentation, named \textsc{DiagNet}. Firstly, we use adversarial learning to generate positive and negative mass-contained mammograms for each mass class. After that,… ▽ More Computer-aided breast cancer diagnosis in mammography is limited by inadequate data and the similarity between benign and cancerous masses. To address this, we propose a signed graph regularized deep neural network with adversarial augmentation, named \textsc{DiagNet}. Firstly, we use adversarial learning to generate positive and negative mass-contained mammograms for each mass class. After that, a signed similarity graph is built upon the expanded data to further highlight the discrimination. Finally, a deep convolutional neural network is trained by jointly optimizing the signed graph regularization and classification loss. Experiments show that the \textsc{DiagNet} framework outperforms the state-of-the-art in breast mass diagnosis in mammography. △ Less

Submitted 14 September, 2019; v1 submitted 29 June, 2019; originally announced July 2019.

Comments: To appear in MICCAI October 2019

arXiv:1905.06058 [pdf, other]

doi 10.1364/OE.379216

Blur resolved OCT: full-range interferometric synthetic aperture microscopy through dispersion encoding

Authors: Jonathan H. Mason, Mike E. Davies, Pierre O. Bagnaninchi

Abstract: We present a computational method for full-range interferometric synthetic aperture microscopy (ISAM) under dispersion encoding. With this, one can effectively double the depth range of optical coherence tomography (OCT), whilst dramatically enhancing the spatial resolution away from the focal plane. To this end, we propose a model-based iterative reconstruction (MBIR) method, where ISAM is direct… ▽ More We present a computational method for full-range interferometric synthetic aperture microscopy (ISAM) under dispersion encoding. With this, one can effectively double the depth range of optical coherence tomography (OCT), whilst dramatically enhancing the spatial resolution away from the focal plane. To this end, we propose a model-based iterative reconstruction (MBIR) method, where ISAM is directly considered in an optimization approach, and we make the discovery that sparsity promoting regularization effectively recovers the full-range signal. Within this work, we adopt an optimal nonuniform discrete fast Fourier transform (NUFFT) implementation of ISAM, which is both fast and numerically stable throughout iterations. We validate our method with several complex samples, scanned with a commercial SD-OCT system with no hardware modification. With this, we both demonstrate full-range ISAM imaging, and significantly outperform combinations of existing methods. △ Less

Submitted 31 January, 2020; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: 17 pages, 7 figures. The images have been compressed for arxiv - please follow DOI for full resolution

Journal ref: Opt. Express 28, 3879-3894 (2020)

arXiv:1905.02944 [pdf, other]

doi 10.1109/TIP.2019.2952008

Fast online 3D reconstruction of dynamic scenes from individual single-photon detection events

Authors: Yoann Altmann, Stephen McLaughlin, Michael E. Davies

Abstract: In this paper, we present an algorithm for online 3D reconstruction of dynamic scenes using individual times of arrival (ToA) of photons recorded by single-photon detector arrays. One of the main challenges in 3D imaging using single-photon Lidar is the integration time required to build ToA histograms and reconstruct reliable 3D profiles in the presence of non-negligible ambient illumination. Thi… ▽ More In this paper, we present an algorithm for online 3D reconstruction of dynamic scenes using individual times of arrival (ToA) of photons recorded by single-photon detector arrays. One of the main challenges in 3D imaging using single-photon Lidar is the integration time required to build ToA histograms and reconstruct reliable 3D profiles in the presence of non-negligible ambient illumination. This long integration time also prevents the analysis of rapid dynamic scenes using existing techniques. We propose a new method which does not rely on the construction of ToA histograms but allows, for the first time, individual detection events to be processed online, in a parallel manner in different pixels, while accounting for the intrinsic spatiotemporal structure of dynamic scenes. Adopting a Bayesian approach, a Bayesian model is constructed to capture the dynamics of the 3D profile and an approximate inference scheme based on assumed density filtering is proposed, yielding a fast and robust reconstruction algorithm able to process efficiently thousands to millions of frames, as usually recorded using single-photon detectors. The performance of the proposed method, able to process hundreds of frames per second, is assessed using a series of experiments conducted with static and dynamic 3D scenes and the results obtained pave the way to a new family of real-time 3D reconstruction solutions. △ Less

Submitted 8 May, 2019; originally announced May 2019.

arXiv:1811.04460 [pdf, other]

Analysis vs Synthesis - An Investigation of (Co)sparse Signal Models on Graphs

Authors: Madeleine S. Kotzagiannidis, Mike E. Davies

Abstract: In this work, we present a theoretical study of signals with sparse representations in the vertex domain of a graph, which is primarily motivated by the discrepancy arising from respectively adopting a synthesis and analysis view of the graph Laplacian matrix. Sparsity on graphs and, in particular, the characterization of the subspaces of signals which are sparse with respect to the connectivity o… ▽ More In this work, we present a theoretical study of signals with sparse representations in the vertex domain of a graph, which is primarily motivated by the discrepancy arising from respectively adopting a synthesis and analysis view of the graph Laplacian matrix. Sparsity on graphs and, in particular, the characterization of the subspaces of signals which are sparse with respect to the connectivity of the graph, as induced by analysis with a suitable graph operator, remains in general an opaque concept which we aim to elucidate. By leveraging the theory of cosparsity, we present a novel (co)sparse graph Laplacian-based signal model and characterize the underlying (structured) (co)sparsity, smoothness and localization of its solution subspaces on undirected graphs, while providing more refined statements for special cases such as circulant graphs. Ultimately, we substantiate fundamental discrepancies between the cosparse analysis and sparse synthesis models in this structured setting, by demonstrating that the former constitutes a special, constrained instance of the latter. △ Less

Submitted 11 November, 2018; originally announced November 2018.

Comments: IEEE GlobalSIP 2018. An extended version of this work can be found at arXiv:1811.04493

arXiv:1811.02411 [pdf, other]

An audio-only method for advertisement detection in broadcast television content

Authors: António Ramires, Diogo Cocharro, Matthew E. P. Davies

Abstract: We address the task of advertisement detection in broadcast television content. While typically approached from a video-only or audio-visual perspective, we present an audio-only method. Our approach centres on the detection of short silences which exist at the boundaries between programming and advertising, as well as between the advertisements themselves. To identify advertising regions we first… ▽ More We address the task of advertisement detection in broadcast television content. While typically approached from a video-only or audio-visual perspective, we present an audio-only method. Our approach centres on the detection of short silences which exist at the boundaries between programming and advertising, as well as between the advertisements themselves. To identify advertising regions we first locate all points within the broadcast content with very low signal energy. Next, we use a multiple linear regression model to reject non-boundary silences based on features extracted from the local context immediately surrounding the silence. Finally, we determine the advertising regions based on the long-term grou** of detected boundary silences. When evaluated over a 26 hour annotated database covering national and commercial Portuguese television channels we obtain a Matthews correlation coefficient in excess of 0.87 and outperform a freely available audio-visual approach. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Journal ref: Proc. of RecPad-2017, Amadora, Portugal, pp. 21-22, October, 2017

arXiv:1811.02406 [pdf, other]

User Specific Adaptation in Automatic Transcription of Vocalised Percussion

Authors: António Ramires, Rui Penha, Matthew E. P. Davies

Abstract: The goal of this work is to develop an application that enables music producers to use their voice to create drum patterns when composing in Digital Audio Workstations (DAWs). An easy-to-use and user-oriented system capable of automatically transcribing vocalisations of percussion sounds, called LVT - Live Vocalised Transcription, is presented. LVT is developed as a Max for Live device which follo… ▽ More The goal of this work is to develop an application that enables music producers to use their voice to create drum patterns when composing in Digital Audio Workstations (DAWs). An easy-to-use and user-oriented system capable of automatically transcribing vocalisations of percussion sounds, called LVT - Live Vocalised Transcription, is presented. LVT is developed as a Max for Live device which follows the `segment-and-classify' methodology for drum transcription, and includes three modules: i) an onset detector to segment events in time; ii) a module that extracts relevant features from the audio content; and iii) a machine-learning component that implements the k-Nearest Neighbours (kNN) algorithm for the classification of vocalised drum timbres. Due to the wide differences in vocalisations from distinct users for the same drum sound, a user-specific approach to vocalised transcription is proposed. In this perspective, a given end-user trains the algorithm with their own vocalisations for each drum sound before inputting their desired pattern into the DAW. The user adaption is achieved via a new Max external which implements Sequential Forward Selection (SFS) for choosing the most relevant features for a given set of input drum sounds. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Journal ref: Proc. of RecPad-2017, Amadora, Portugal, pp. 19-20, October, 2017

Showing 1–33 of 33 results for author: Davies, M E