Skip to main content

Showing 1–7 of 7 results for author: Golge, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.04904  [pdf, other

    eess.AS cs.CL cs.SD

    XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model

    Authors: Edresson Casanova, Kelly Davis, Eren Gölge, Görkem Göknar, Iulian Gulea, Logan Hart, Aya Aljafari, Joshua Meyer, Reuben Morais, Samuel Olayemi, Julian Weber

    Abstract: Most Zero-shot Multi-speaker TTS (ZS-TTS) systems support only a single language. Although models like YourTTS, VALL-E X, Mega-TTS 2, and Voicebox explored Multilingual ZS-TTS they are limited to just a few high/medium resource languages, limiting the applications of these models in most of the low/medium resource languages. In this paper, we aim to alleviate this issue by proposing and making pub… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  2. arXiv:2401.09512  [pdf, other

    cs.SD eess.AS

    MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

    Authors: Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

    Abstract: Text-to-Speech (TTS) technology brings significant advantages, such as giving a voice to those with speech impairments, but also enables audio deepfakes and spoofs. The former mislead individuals and may propagate misinformation, while the latter undermine voice biometric security systems. AI-based detection can help to address these challenges by automatically differentiating between genuine and… ▽ More

    Submitted 16 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: IJCNN 2024

  3. arXiv:2112.02418  [pdf, other

    cs.SD cs.CL eess.AS

    YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

    Authors: Edresson Casanova, Julian Weber, Christopher Shulby, Arnaldo Candido Junior, Eren Gölge, Moacir Antonelli Ponti

    Abstract: YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our… ▽ More

    Submitted 30 April, 2023; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: An Erratum was added on the last page of this paper

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:2709-2720, 2022

  4. arXiv:2104.05557  [pdf, other

    eess.AS cs.SD

    SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

    Authors: Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frederico Santos de Oliveira, Arnaldo Candido Junior, Anderson da Silva Soares, Sandra Maria Aluisio, Moacir Antonelli Ponti

    Abstract: In this paper, we propose SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech model that improves similarity for speakers unseen during training. We propose a speaker-conditional architecture that explores a flow-based decoder that works in a zero-shot scenario. As text encoders, we explore a dilated residual convolutional-based encoder, gated convolutional-based encoder, and transform… ▽ More

    Submitted 15 June, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: Accepted on Interspeech 2021

  5. arXiv:1407.2987  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    FAME: Face Association through Model Evolution

    Authors: Eren Golge, Pinar Duygulu

    Abstract: We attack the problem of learning face models for public faces from weakly-labelled images collected from web through querying a name. The data is very noisy even after face detection, with several irrelevant faces corresponding to other people. We propose a novel method, Face Association through Model Evolution (FAME), that is able to prune the data in an iterative way, for the face models associ… ▽ More

    Submitted 10 July, 2014; originally announced July 2014.

    Comments: Draft version of the study

  6. arXiv:1401.0733  [pdf, other

    cs.CV

    ConceptVision: A Flexible Scene Classification Framework

    Authors: Ahmet Iscen, Eren Golge, Ilker Sarac, Pinar Duygulu

    Abstract: We introduce ConceptVision, a method that aims for high accuracy in categorizing large number of scenes, while kee** the model relatively simpler and efficient for scalability. The proposed method combines the advantages of both low-level representations and high-level semantic categories, and eliminates the distinctions between different levels through the definition of concepts. The proposed f… ▽ More

    Submitted 29 October, 2014; v1 submitted 3 January, 2014; originally announced January 2014.

  7. arXiv:1312.4384  [pdf, other

    cs.CV cs.LG cs.NE

    Rectifying Self Organizing Maps for Automatic Concept Learning from Web Images

    Authors: Eren Golge, Pinar Duygulu

    Abstract: We attack the problem of learning concepts automatically from noisy web image search results. Going beyond low level attributes, such as colour and texture, we explore weakly-labelled datasets for the learning of higher level concepts, such as scene categories. The idea is based on discovering common characteristics shared among subsets of images by posing a method that is able to organise the dat… ▽ More

    Submitted 16 December, 2013; originally announced December 2013.

    Comments: present CVPR2014 submission