Skip to main content

Showing 1–35 of 35 results for author: Lerch, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18747  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

    Authors: Karn N. Watcharasupat, Alexander Lerch

    Abstract: Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems.… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted to the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

  2. arXiv:2406.09998  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors

    Authors: Chaeyeon Han, Pavan Seshadri, Yiwei Ding, Noah Posner, Bon Woo Koo, Animesh Agrawal, Alexander Lerch, Subhrajit Guhathakurta

    Abstract: While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study dis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: submitted to Urban Informatics

  3. arXiv:2402.06761  [pdf, other

    cs.LG

    Embedding Compression for Teacher-to-Student Knowledge Transfer

    Authors: Yiwei Ding, Alexander Lerch

    Abstract: Common knowledge distillation methods require the teacher model and the student model to be trained on the same task. However, the usage of embeddings as teachers has also been proposed for different source tasks and target tasks. Prior work that uses embeddings as teachers ignores the fact that the teacher embeddings are likely to contain irrelevant knowledge for the target task. To address this… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 5+1 pages. In ICASSP 2024 Satellite Workshop Deep Neural Network Model Compression

  4. arXiv:2311.10113  [pdf, other

    cs.SD eess.AS

    AQUATK: An Audio Quality Assessment Toolkit

    Authors: Ashvala Vinay, Alexander Lerch

    Abstract: Recent advancements in Neural Audio Synthesis (NAS) have outpaced the development of standardized evaluation methodologies and tools. To bridge this gap, we introduce AquaTk, an open-source Python library specifically designed to simplify and standardize the evaluation of NAS systems. AquaTk offers a range of audio quality metrics, including a unique Python implementation of the basic PEAQ algorit… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  5. arXiv:2309.06531  [pdf, other

    eess.AS cs.SD

    ASPED: An Audio Dataset for Detecting Pedestrians

    Authors: Pavan Seshadri, Chaeyeon Han, Bon-Woo Koo, Noah Posner, Subhrajit Guhathakurta, Alexander Lerch

    Abstract: We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.

    Submitted 16 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 4+1 pages, ICASSP 2024

  6. arXiv:2309.02539  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

    Abstract: Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions whic… ▽ More

    Submitted 1 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

  7. arXiv:2306.17424  [pdf, other

    cs.SD cs.IR eess.AS

    Audio Embeddings as Teachers for Music Classification

    Authors: Yiwei Ding, Alexander Lerch

    Abstract: Music classification has been one of the most popular tasks in the field of music information retrieval. With the development of deep learning models, the last decade has seen impressive improvements in a wide range of classification tasks. However, the increasing model complexity makes both training and inference computationally expensive. In this paper, we integrate the ideas of transfer learnin… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted at the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), 9 pages, 2 figures

  8. arXiv:2306.08053  [pdf, other

    eess.AS cs.SD

    Quantifying Spatial Audio Quality Impairment

    Authors: Karn N. Watcharasupat, Alexander Lerch

    Abstract: Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical components of spatial audio quality, however, remain scarce, despite being perhaps the least subjective aspect of spatial audio quality to quantify. By considering i… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

  9. arXiv:2211.08379  [pdf, other

    cs.SD cs.LG eess.AS

    Music Instrument Classification Reprogrammed

    Authors: Hsin-Hung Chen, Alexander Lerch

    Abstract: The performance of approaches to Music Instrument Classification, a popular task in Music Information Retrieval, is often impacted and limited by the lack of availability of annotated data for training. We propose to address this issue with "reprogramming," a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and map** both the… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted at 29th International Conference on Multimedia Modeling (MMM23)

  10. arXiv:2211.01317  [pdf, other

    cs.SD cs.AI cs.LG cs.NE eess.AS

    Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming

    Authors: Yun-Ning Hung, Chao-Han Huck Yang, Pin-Yu Chen, Alexander Lerch

    Abstract: Transfer learning (TL) approaches have shown promising results when handling tasks with limited training data. However, considerable memory and computational resources are often required for fine-tuning pre-trained neural networks with target domain data. In this work, we introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neur… ▽ More

    Submitted 3 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE ICASSP 2023. The implementation is available at https://github.com/biboamy/music-repro

  11. arXiv:2209.00130  [pdf, other

    cs.SD cs.LG eess.AS

    Evaluating generative audio systems and their metrics

    Authors: Ashvala Vinay, Alexander Lerch

    Abstract: Recent years have seen considerable advances in audio synthesis with deep generative models. However, the state-of-the-art is very difficult to quantify; different studies often use different evaluation methodologies and different metrics when reporting results, making a direct comparison to other systems difficult if not impossible. Furthermore, the perceptual relevance and meaning of the reporte… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at ISMIR 2022

  12. arXiv:2208.09096  [pdf, other

    cs.SD cs.LG eess.AS

    Representation Learning for the Automatic Indexing of Sound Effects Libraries

    Authors: Alison B. Ma, Alexander Lerch

    Abstract: Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy updates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overc… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), 10 pages, 7 figures

  13. arXiv:2206.15219  [pdf, ps, other

    cs.SD eess.AS

    libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages

    Authors: Alexander Lerch

    Abstract: The three packages libACA, pyACA, and ACA-Code provide reference implementations for basic approaches and algorithms for the analysis of musical audio signals in three different languages: C++, Python, and Matlab. All three packages cover the same algorithms, such as extraction of low level audio features, fundamental frequency estimation, as well as simple approaches to chord recognition, musical… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

    Comments: Preprint submitted to "Software Impacts"

  14. arXiv:2206.04850  [pdf, other

    eess.AS cs.SD

    Feature-informed Embedding Space Regularization For Audio Classification

    Authors: Yun-Ning Hung, Alexander Lerch

    Abstract: Feature representations derived from models pre-trained on large-scale datasets have shown their generalizability on a variety of audio analysis tasks. Despite this generalizability, however, task-specific features can outperform if sufficient training data is available, as specific task-relevant properties can be learned. Furthermore, the complex pre-trained models bring considerable computationa… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

  15. arXiv:2205.05580  [pdf, other

    cs.SD cs.LG eess.AS

    Scream Detection in Heavy Metal Music

    Authors: Vedant Kalbag, Alexander Lerch

    Abstract: Harsh vocal effects such as screams or growls are far more common in heavy metal vocals than the traditionally sung vocal. This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music, specifically the identification of different scream techniques. We investigate the suitability of various feature representations, including cepstral, spectral, an… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  16. arXiv:2112.10638  [pdf, ps, other

    cs.LG cs.AI cs.IR cs.MS

    Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models

    Authors: Karn N. Watcharasupat, Junyoung Lee, Alexander Lerch

    Abstract: Latte (for LATent Tensor Evaluation) is a Python library for evaluation of latent-based generative models in the fields of disentanglement learning and controllable generation. Latte is compatible with both PyTorch and TensorFlow/Keras, and provides both functional and modular APIs that can be easily extended to support other deep learning frameworks. Using NumPy-based and framework-agnostic imple… ▽ More

    Submitted 22 January, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: To appear in Software Impacts

    Journal ref: Software Impacts, Volume 11, 2022, 100222, ISSN 2665-9638

  17. arXiv:2111.12761  [pdf, other

    cs.SD eess.AS

    Semi-Supervised Audio Classification with Partially Labeled Data

    Authors: Siddharth Gururani, Alexander Lerch

    Abstract: Audio classification has seen great progress with the increasing availability of large-scale datasets. These large datasets, however, are often only partially labeled as collecting full annotations is a tedious and expensive process. This paper presents two semi-supervised methods capable of learning with missing labels and evaluates them on two publicly available, partially labeled datasets. The… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: To be presented at IEEE ISM 2021

  18. arXiv:2110.05587  [pdf, other

    cs.SD cs.IR cs.IT cs.LG eess.AS

    Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes

    Authors: Karn N. Watcharasupat, Alexander Lerch

    Abstract: Controllable music generation with deep generative models has become increasingly reliant on disentanglement learning techniques. However, current disentanglement metrics, such as mutual information gap (MIG), are often inadequate and misleading when used for evaluating latent representations in the presence of interdependent semantic attributes often encountered in real-world music datasets. In t… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Submitted to the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference

  19. arXiv:2108.01711  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Improving Music Performance Assessment with Contrastive Learning

    Authors: Pavan Seshadri, Alexander Lerch

    Abstract: Several automatic approaches for objective music performance assessment (MPA) have been proposed in the past, however, existing systems are not yet capable of reliably predicting ratings with the same accuracy as professional judges. This study investigates contrastive learning as a potential method to improve existing MPA systems. Contrastive learning is a widely used technique in representation… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: To appear at 22nd International Society for Music Information Retrieval Conference, Online, 2021

  20. arXiv:2108.01450  [pdf, other

    cs.SD cs.LG eess.AS

    Is Disentanglement enough? On Latent Representations for Controllable Music Generation

    Authors: Ashis Pati, Alexander Lerch

    Abstract: Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between dis… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: To be published in: Proceedings of 22nd International Society for Music Information Retrieval Conference (ISMIR), Online, 2021

  21. arXiv:2104.09018  [pdf, other

    cs.SD cs.DL eess.AS

    An Interdisciplinary Review of Music Performance Analysis

    Authors: Alexander Lerch, Claire Arthur, Ashis Pati, Siddharth Gururani

    Abstract: A musical performance renders an acoustic realization of a musical score or other representation of a composition. Different performances of the same composition may vary in terms of performance parameters such as timing or dynamics, and these variations may have a major impact on how a listener perceives the music. The analysis of music performance has traditionally been a peripheral topic for th… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1907.00178

    ACM Class: A.1

    Journal ref: Transactions of the International Society for Music Information Retrieval, 3(1), pp.221-245, 2020

  22. arXiv:2102.06393  [pdf, other

    eess.SP cs.SD eess.AS

    Mind the beat: detecting audio onsets from EEG recordings of music listening

    Authors: Ashvala Vinay, Alexander Lerch, Grace Leslie

    Abstract: We propose a deep learning approach to predicting audio event onsets in electroencephalogram (EEG) recorded from users as they listen to music. We use a publicly available dataset containing ten contemporary songs and concurrently recorded EEG. We generate a sequence of onset labels for the songs in our dataset and trained neural networks (a fully connected network (FCN) and a recurrent neural net… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: to be published in ICASSP 2021 4 figures, 5 pages (4 pages of content + 1 page of references)

  23. arXiv:2101.00132  [pdf, other

    eess.AS cs.SD eess.SP

    Audio Content Analysis

    Authors: Alexander Lerch

    Abstract: Preprint for a book chapter introducing Audio Content Analysis. With a focus on Music Information Retrieval systems, this chapter defines musical audio content, introduces the general process of audio content analysis, and surveys basic approaches to audio content analysis. The various tasks in Audio Content Analysis are categorized into three classes: music transcription, music performance analys… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

    Comments: Preprint for a book chapter introducing Audio Content Analysis

  24. arXiv:2010.14709  [pdf, other

    cs.SD cs.IR cs.LG cs.MM eess.AS

    Melody-Conditioned Lyrics Generation with SeqGANs

    Authors: Yihao Chen, Alexander Lerch

    Abstract: Automatic lyrics generation has received attention from both music and AI communities for years. Early rule-based approaches have~---due to increases in computational power and evolution in data-driven models---~mostly been replaced with deep-learning-based systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by lar… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

  25. arXiv:2010.14565  [pdf, other

    cs.SD cs.MM eess.AS

    Remixing Music with Visual Conditioning

    Authors: Li-Chia Yang, Alexander Lerch

    Abstract: We propose a visually conditioned music remixing system by incorporating deep visual and audio models. The method is based on a state of the art audio-visual source separation model which performs music instrument source separation with video information. We modified the model to work with user-selected images instead of videos as visual input during inference to enable separation of audio-only co… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Journal ref: 2020 IEEE International Symposium on Multimedia

  26. arXiv:2008.00616  [pdf, other

    eess.AS cs.LG

    Multitask learning for instrument activation aware music source separation

    Authors: Yun-Ning Hung, Alexander Lerch

    Abstract: Music source separation is a core task in music information retrieval which has seen a dramatic improvement in the past years. Nevertheless, most of the existing systems focus exclusively on the problem of source separation itself and ignore the utilization of other~---possibly related---~MIR tasks which could lead to additional quality gains. In this work, we propose a novel multitask structure t… ▽ More

    Submitted 2 August, 2020; originally announced August 2020.

  27. arXiv:2008.00203  [pdf, other

    eess.AS cs.IR cs.LG

    Score-informed Networks for Music Performance Assessment

    Authors: Jiawen Huang, Yun-Ning Hung, Ashis Pati, Siddharth Kumar Gururani, Alexander Lerch

    Abstract: The assessment of music performances in most cases takes into account the underlying musical score being performed. While there have been several automatic approaches for objective music performance assessment (MPA) based on extracted features from both the performance audio and the score, deep neural network-based methods incorporating score information into MPA models have not yet been investiga… ▽ More

    Submitted 1 August, 2020; originally announced August 2020.

    Comments: To appear at 21st International Society for Music Information Retrieval Conference, Montréal, Canada, 2020

  28. arXiv:2007.15067  [pdf, other

    cs.LG cs.SD eess.AS

    dMelodies: A Music Dataset for Disentanglement Learning

    Authors: Ashis Pati, Siddharth Gururani, Alexander Lerch

    Abstract: Representation learning focused on disentangling the underlying factors of variation in given data has become an important area of research in machine learning. However, most of the studies in this area have relied on datasets from the computer vision domain and thus, have not been readily extended to music. In this paper, we present a new symbolic music dataset that will help researchers working… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: To be published in: Proceedings of 21st International Society for Music Information Retrieval Conference (ISMIR), Montréal, Canada, 2020

  29. arXiv:2006.09640  [pdf, other

    eess.AS cs.IR cs.LG cs.SD

    Visual Attention for Musical Instrument Recognition

    Authors: Karn Watcharasupat, Siddharth Gururani, Alexander Lerch

    Abstract: In the field of music information retrieval, the task of simultaneously identifying the presence or absence of multiple musical instruments in a polyphonic recording remains a hard problem. Previous works have seen some success in improving instrument classification by applying temporal attention in a multi-instance multi-label setting, while another series of work has also suggested the role of p… ▽ More

    Submitted 21 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 6 pages, 7 figures. Karn Watcharasupat is currently with the School of Electrical and Electronic Engineering, Nanyang Technological University. This work was done while she was with the Center for Music Technology, Georgia Institute of Technology on an exchange semester

  30. arXiv:2004.05485  [pdf, other

    cs.LG stat.ML

    Attribute-based Regularization of Latent Spaces for Variational Auto-Encoders

    Authors: Ashis Pati, Alexander Lerch

    Abstract: Selective manipulation of data attributes using deep generative models is an active area of research. In this paper, we present a novel method to structure the latent space of a Variational Auto-Encoder (VAE) to encode different continuous-valued attributes explicitly. This is accomplished by using an attribute regularization loss which enforces a monotonic relationship between the attribute value… ▽ More

    Submitted 28 July, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

  31. arXiv:1907.05208  [pdf, other

    cs.SD cs.AI eess.AS

    Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs

    Authors: Benjamin Genchel, Ashis Pati, Alexander Lerch

    Abstract: Deep generative models for symbolic music are typically designed to model temporal dependencies in music so as to predict the next musical event given previous events. In many cases, such models are expected to learn abstract concepts such as harmony, meter, and rhythm from raw musical data without any additional information. In this study, we investigate the effects of explicitly conditioning dee… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: In Proceedings of the 7th International Workshop on Musical Meta-creation (MUME). Charlotte, North Carolina 2019

  32. arXiv:1907.04294  [pdf, other

    cs.IR cs.SD eess.AS

    An Attention Mechanism for Musical Instrument Recognition

    Authors: Siddharth Gururani, Mohit Sharma, Alexander Lerch

    Abstract: While the automatic recognition of musical instruments has seen significant progress, the task is still considered hard for music featuring multiple instruments as opposed to single instrument recordings. Datasets for polyphonic instrument recognition can be categorized into roughly two categories. Some, such as MedleyDB, have strong per-frame instrument activity annotations but are usually small… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Comments: To appear in: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019

  33. arXiv:1907.01164  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Learning to Traverse Latent Spaces for Musical Score Inpainting

    Authors: Ashis Pati, Alexander Lerch, Gaëtan Hadjeres

    Abstract: Music Inpainting is the task of filling in missing or lost information in a piece of music. We investigate this task from an interactive music creation perspective. To this end, a novel deep learning-based approach for musical score inpainting is proposed. The designed model takes both past and future musical context into account and is capable of suggesting ways to connect them in a musically mea… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019, Delft, The Netherlands; 6 pages, 8 figures

    Journal ref: 20th International Society for Music Information Retrieval Conference (ISMIR), 2019, Delft, The Netherlands

  34. arXiv:1907.00178  [pdf, other

    cs.IR cs.MM

    Music Performance Analysis: A Survey

    Authors: Alexander Lerch, Claire Arthur, Ashis Pati, Siddharth Gururani

    Abstract: Music Information Retrieval (MIR) tends to focus on the analysis of audio signals. Often, a single music recording is used as representative of a "song" even though different performances of the same song may reveal different properties. A performance is distinct in many ways from a (arguably more abstract) representation of a "song," "piece," or musical score. The characteristics of the (recorded… ▽ More

    Submitted 29 June, 2019; originally announced July 2019.

    Comments: To be published in: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019

  35. arXiv:1712.01456  [pdf, other

    cs.LG cs.AI cs.MM cs.SD eess.AS

    Learning to Fuse Music Genres with Generative Adversarial Dual Learning

    Authors: Zhiqian Chen, Chih-Wei Wu, Yen-Cheng Lu, Alexander Lerch, Chang-Tien Lu

    Abstract: FusionGAN is a novel genre fusion framework for music generation that integrates the strengths of generative adversarial networks and dual learning. In particular, the proposed method offers a dual learning extension that can effectively integrate the styles of the given domains. To efficiently quantify the difference among diverse domains and avoid the vanishing gradient issue, FusionGAN provides… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

    Comments: International Conference on Data Mining - New Orleans, 2017