Skip to main content

Showing 1–4 of 4 results for author: Melechovsky, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02255  [pdf, other

    eess.AS cs.LG cs.MM cs.SD

    MidiCaps -- A large-scale MIDI dataset with text captions

    Authors: Jan Melechovsky, Abhinaba Roy, Dorien Herremans

    Abstract: Generative models guided by text prompts are increasingly becoming more popular. However, no text-to-MIDI models currently exist, mostly due to the lack of a captioned MIDI dataset. This work aims to enable research that combines LLMs with symbolic music by presenting the first large-scale MIDI dataset with text captions that is openly available: MidiCaps. MIDI (Musical Instrument Digital Interfac… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Under review

  2. arXiv:2406.01018  [pdf, other

    eess.AS cs.LG cs.SD

    Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training

    Authors: Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans

    Abstract: With rapid globalization, the need to build inclusive and representative speech technology cannot be overstated. Accent is an important aspect of speech that needs to be taken into consideration while building inclusive speech synthesizers. Inclusive speech technology aims to erase any biases towards specific groups, such as people of certain accent. We note that state-of-the-art Text-to-Speech (T… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Under review

  3. arXiv:2311.08355  [pdf, other

    eess.AS

    Mustango: Toward Controllable Text-to-Music Generation

    Authors: Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria

    Abstract: The quality of the text-to-music models has reached new heights due to recent advancements in diffusion models. The controllability of various musical aspects, however, has barely been explored. In this paper, we propose Mustango: a music-domain-knowledge-inspired text-to-music system based on diffusion. Mustango aims to control the generated music, not only with general text captions, but with mo… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  4. arXiv:2211.03316  [pdf, other

    eess.AS cs.LG cs.SD

    Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

    Authors: Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, Dorien Herremans

    Abstract: Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, which is converted to any desired target accent. Ou… ▽ More

    Submitted 3 June, 2024; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: preprint submitted to a conference, under review