Skip to main content

Showing 1–26 of 26 results for author: Dubnov, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.11116  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

    Authors: Keren Shao, Ke Chen, Shlomo Dubnov

    Abstract: In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Dis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 2 pages, 2 figures, 1 tables, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

  2. arXiv:2402.06810  [pdf, other

    cs.SD cs.AI cs.HC cs.IT cs.LG eess.AS

    Evaluating Co-Creativity using Total Information Flow

    Authors: Vignesh Gokul, Chris Francis, Shlomo Dubnov

    Abstract: Co-creativity in music refers to two or more musicians or musical agents interacting with one another by composing or improvising music. However, this is a very subjective process and each musician has their own preference as to which improvisation is better for some context. In this paper, we aim to create a measure based on total information flow to quantitatively evaluate the co-creativity proc… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  3. arXiv:2402.03867  [pdf, other

    cs.SD eess.AS

    Binaural sound source localization using a hybrid time and frequency domain model

    Authors: Gil Geva, Olivier Warusfel, Shlomo Dubnov, Tammuz Dubnov, Amir Amedi, Yacov Hel-Or

    Abstract: This paper introduces a new approach to sound source localization using head-related transfer function (HRTF) characteristics, which enable precise full-sphere localization from raw data. While previous research focused primarily on using extensive microphone arrays in the frontal plane, this arrangement often encountered limitations in accuracy and robustness when dealing with smaller microphone… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  4. arXiv:2401.02135  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    PosCUDA: Position based Convolution for Unlearnable Audio Datasets

    Authors: Vignesh Gokul, Shlomo Dubnov

    Abstract: Deep learning models require large amounts of clean data to acheive good performance. To avoid the cost of expensive data acquisition, researchers use the abundant data available on the internet. This raises significant privacy concerns on the potential misuse of personal data for model training without authorisation. Recent works such as CUDA propose solutions to this problem by adding class-wise… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  5. arXiv:2311.12257  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    Equip** Pretrained Unconditional Music Transformers with Instrument and Genre Controls

    Authors: Weihan Xu, Julian McAuley, Shlomo Dubnov, Hao-Wen Dong

    Abstract: The ''pretraining-and-finetuning'' paradigm has become a norm for training domain-specific models in natural language processing and computer vision. In this work, we aim to examine this paradigm for symbolic music generation through leveraging the largest ever symbolic music dataset sourced from the MuseScore forum. We first pretrain a large unconditional transformer model using 1.5 million songs… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  6. arXiv:2310.09653  [pdf, other

    cs.SD cs.AI eess.AS

    SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

    Authors: Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

    Abstract: We propose SelfVC, a training strategy to iteratively improve a voice conversion model with self-synthesized examples. Previous efforts on voice conversion focus on factorizing speech into explicitly disentangled representations that separately encode speaker characteristics and linguistic content. However, disentangling speech representations to capture such attributes using task-specific loss te… ▽ More

    Submitted 3 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted at ICML 2024

  7. arXiv:2308.02723  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

    Authors: Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonic… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

  8. arXiv:2308.01546  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

    Authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 16 pages, 3 figures, 2 tables, demo page: https://musicldm.github.io/

  9. arXiv:2305.07447  [pdf, other

    cs.SD eess.AS

    Universal Source Separation with Weakly Labelled Data

    Authors: Qiuqiang Kong, Ke Chen, Haohe Liu, Xingjian Du, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Mark D. Plumbley

    Abstract: Universal source separation (USS) is a fundamental research task for computational auditory scene analysis, which aims to separate mono recordings into individual source tracks. There are three potential challenges awaiting the solution to the audio source separation task. First, previous audio source separation systems mainly focus on separating one or a limited number of specific sources. There… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  10. arXiv:2211.06687  [pdf, other

    cs.SD eess.AS

    Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

    Authors: Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different… ▽ More

    Submitted 21 March, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

  11. arXiv:2209.02871  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

    Authors: Ke Chen, Hao-Wen Dong, Yi Luo, Julian McAuley, Taylor Berg-Kirkpatrick, Miller Puckette, Shlomo Dubnov

    Abstract: Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of syn… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: Camera Ready for Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022

    Journal ref: The 23rd International Society for Music Information Retrieval Conference, 2022

  12. arXiv:2207.06983  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Multitrack Music Transformer

    Authors: Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: Existing approaches for generating multitrack music with transformer models have been limited in terms of the number of instruments, the length of the music segments and slow inference. This is partly due to the memory requirements of the lengthy input sequences necessitated by existing representations. In this work, we propose a new multitrack music representation that allows a diverse set of ins… ▽ More

    Submitted 24 May, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted by ICASSP 2023. Demo: https://salu133445.github.io/mmt/ . Code: https://github.com/salu133445/mmt

  13. arXiv:2202.00951  [pdf, other

    eess.AS cs.AI cs.LG cs.MM cs.SD

    TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

    Authors: Ke Chen, Shuai Yu, Cheng-i Wang, Wei Li, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that i… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: Preprint Version for ICASSP 2022, Singapore

  14. arXiv:2202.00874  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Audio classification is an important task of map** audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in this field. However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model's scalability in au… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Comments: Preprint version for ICASSP 2022, Singapore

  15. arXiv:2112.07891  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a univ… ▽ More

    Submitted 12 February, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Preprint version for Association for the Advancement of Artificial Intelligence Conference, AAAI 2022

  16. arXiv:2111.12588  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Cross-Cultural Analysis using Music Information Dynamics

    Authors: Shlomo Dubnov, Kevin Huang, Cheng-i Wang

    Abstract: A music piece is both comprehended hierarchically, from sonic events to melodies, and sequentially, in the form of repetition and variation. Music from different cultures establish different aesthetics by having different style conventions on these two aspects. We propose a framework that could be used to quantitatively compare music from different cultures by looking at these two aspects. The f… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  17. arXiv:2104.06517  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition

    Authors: Eunjeong Koh, Shlomo Dubnov

    Abstract: Emotion is a complicated notion present in music that is hard to capture even with fine-tuned feature engineering. In this paper, we investigate the utility of state-of-the-art pre-trained deep audio embedding methods to be used in the Music Emotion Recognition (MER) task. Deep audio embedding methods allow us to efficiently capture the high dimensional features into a compact representation. We i… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: AAAI Workshop on Affective Content Analysis 2021 Camera Ready Version

    Journal ref: AAAI 2021

  18. arXiv:2103.03344  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    WaveGuard: Understanding and Mitigating Audio Adversarial Examples

    Authors: Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar

    Abstract: There has been a recent surge in adversarial attacks on deep learning based automatic speech recognition (ASR) systems. These attacks pose new challenges to deep learning security and have raised significant concerns in deploying ASR systems in safety-critical applications. In this work, we introduce WaveGuard: a framework for detecting adversarial inputs that are crafted to attack ASR systems. Ou… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: Published as a conference paper at Usenix Security 2021

  19. arXiv:2102.01133  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Music Information Dynamics

    Authors: Shlomo Dubnov

    Abstract: Music comprises of a set of complex simultaneous events organized in time. In this paper we introduce a novel framework that we call Deep Musical Information Dynamics, which combines two parallel streams - a low rate latent representation stream that is assumed to capture the dynamics of a thought process contrasted with a higher rate information dynamics derived from the musical data itself. Moti… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Journal ref: The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, Royal Institute of Technology (KTH), Stockholm, Sweden

  20. arXiv:2102.00151  [pdf, other

    cs.SD cs.LG eess.AS

    Expressive Neural Voice Cloning

    Authors: Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

    Abstract: Voice cloning is the task of learning to synthesize the voice of an unseen speaker from a few samples. While current voice cloning methods achieve promising results in Text-to-Speech (TTS) synthesis for a new voice, these approaches lack the ability to control the expressiveness of synthesized audio. In this work, we propose a controllable voice cloning method that allows fine-grained control over… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

    Comments: 12 pages, 2 figures, 2 tables

  21. arXiv:2008.01291  [pdf, other

    cs.LG cs.MM cs.SD eess.AS stat.ML

    Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythm

    Authors: Ke Chen, Cheng-i Wang, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Drawing an analogy with automatic image completion systems, we propose Music SketchNet, a neural network framework that allows users to specify partial musical ideas guiding automatic music generation. We focus on generating the missing measures in incomplete monophonic musical pieces, conditioned on surrounding context, and optionally guided by user-specified pitch and rhythm snippets. First, we… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: 8 pages, 8 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020

    Journal ref: 21st International Society for Music Information Retrieval Conference, ISMIR 2020

  22. arXiv:1906.09155  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Query-based Deep Improvisation

    Authors: Shlomo Dubnov

    Abstract: In this paper we explore techniques for generating new music using a Variational Autoencoder (VAE) neural network that was trained on a corpus of specific style. Instead of randomly sampling the latent states of the network to produce free improvisation, we generate new music by querying the network with musical input in a style different from the training corpus. This allows us to produce new mus… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Journal ref: 7th International Workshop on Musical Metacreation, International Conference on Computational Creativity 2019

  23. arXiv:1905.03828  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Universal Adversarial Perturbations for Speech Recognition Systems

    Authors: Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian McAuley, Farinaz Koushanfar

    Abstract: In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the appl… ▽ More

    Submitted 15 August, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

    Comments: Published as a conference paper at INTERSPEECH 2019

  24. arXiv:1904.07944  [pdf, other

    cs.SD cs.LG eess.AS

    Expediting TTS Synthesis with Adversarial Vocoding

    Authors: Paarth Neekhara, Chris Donahue, Miller Puckette, Shlomo Dubnov, Julian McAuley

    Abstract: Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn map**s from perceptually-informed s… ▽ More

    Submitted 25 July, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Published as a conference paper at INTERSPEECH 2019

  25. arXiv:1811.08380  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

    Authors: Ke Chen, Weilin Zhang, Shlomo Dubnov, Gus Xia, Wei Li

    Abstract: With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation pr… ▽ More

    Submitted 24 January, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: 8 pages, 13 figures

    Journal ref: 2019 International Workshop on Multilayer Music Representation and Processing (MMRP)

  26. arXiv:1810.03226  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Rethinking Recurrent Latent Variable Model for Music Composition

    Authors: Eunjeong Stella Koh, Shlomo Dubnov, Dustin Wright

    Abstract: We present a model for capturing musical features and creating novel sequences of music, called the Convolutional Variational Recurrent Neural Network. To generate sequential data, the model uses an encoder-decoder architecture with latent probabilistic connections to capture the hidden structure of music. Using the sequence-to-sequence model, our generative model can exploit samples from a prior… ▽ More

    Submitted 7 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at IEEE MMSP 2018