Skip to main content

Showing 1–2 of 2 results for author: Nezhurina, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2308.01546  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

    Authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 16 pages, 3 figures, 2 tables, demo page: https://musicldm.github.io/

  2. arXiv:2211.06687  [pdf, other

    cs.SD eess.AS

    Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

    Authors: Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different… ▽ More

    Submitted 21 March, 2024; v1 submitted 12 November, 2022; originally announced November 2022.