-
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Authors:
Mateusz Łajszczak,
Guillermo Cámbara,
Yang Li,
Fatih Beyhan,
Arent van Korlaar,
Fan Yang,
Arnaud Joly,
Álvaro Martín-Cortinas,
Ammar Abbas,
Adam Michalski,
Alexis Moinet,
Sri Karlapati,
Ewa Muszyńska,
Haohan Guo,
Bartosz Putrycz,
Soledad López Gambino,
Kayeon Yoo,
Elena Sokolova,
Thomas Drugman
Abstract:
We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts ra…
▽ More
We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts raw texts into discrete codes ("speechcodes") followed by a convolution-based decoder which converts these speechcodes into waveforms in an incremental, streamable manner. Further, our speechcodes are built using a novel speech tokenization technique that features speaker ID disentanglement and compression with byte-pair encoding. Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences. We design and share a specialized dataset to measure these emergent abilities for text-to-speech. We showcase state-of-the-art naturalness of BASE TTS by evaluating against baselines that include publicly available large-scale text-to-speech systems: YourTTS, Bark and TortoiseTTS. Audio samples generated by the model can be heard at https://amazon-ltts-paper.com/.
△ Less
Submitted 15 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Ultrafast flow of interacting organic polaritons
Authors:
Giovanni Lerario,
Dario Ballarini,
Antonio Fieramosca,
Alessandro Cannavale,
Armando Genco,
Federica Mangione,
Salvatore Gambino,
Lorenzo Dominici,
Milena De Giorgi,
Giuseppe Gigli,
Daniele Sanvitto
Abstract:
The strong-coupling of an excitonic transition with an electromagnetic mode results in composite quasi-particles called exciton-polaritons, which have been shown to combine the best properties of their bare components in semiconductor microcavities. However, the physics and applications of polariton flows in organic materials and at room temperature are still unexplored because of the poor photon…
▽ More
The strong-coupling of an excitonic transition with an electromagnetic mode results in composite quasi-particles called exciton-polaritons, which have been shown to combine the best properties of their bare components in semiconductor microcavities. However, the physics and applications of polariton flows in organic materials and at room temperature are still unexplored because of the poor photon confinement in such structures. Here we demonstrate that polaritons formed by the hybridization of organic excitons with a Bloch Surface Wave are able to propagate for hundreds of microns showing remarkable third-order nonlinear interactions upon high injection density. These findings pave the way for the studies of organic nonlinear light-matter fluxes and for a technological promising route of dissipation-less on-chip polariton devices working at room temperature.
△ Less
Submitted 27 January, 2016; v1 submitted 2 February, 2015;
originally announced February 2015.
-
Polariton Induced Enhanced Emission from an Organic Dye under Strong Coupling Regime
Authors:
Dario Ballarini,
Milena De Giorgi,
Salvatore Gambino,
Giovanni Lerario,
Marco Mazzeo,
Armando Genco,
Gianluca Accorsi,
Carlo Giansante,
Silvia Colella,
Stefania D'Agostino,
Paolo Cazzato,
Daniele Sanvitto,
Giuseppe Gigli
Abstract:
Exciton-polaritons in semiconductors are quasi-particles which have recently shown the capability to undergo phase transition into a coherent hybrid state of light and matter. The observation of such quasi-particles in organic microcavities has attracted increasing attention for their characteristic of reaching condensation at room temperature. In this work we demonstrate that the emission propert…
▽ More
Exciton-polaritons in semiconductors are quasi-particles which have recently shown the capability to undergo phase transition into a coherent hybrid state of light and matter. The observation of such quasi-particles in organic microcavities has attracted increasing attention for their characteristic of reaching condensation at room temperature. In this work we demonstrate that the emission properties of organic polaritons do not depend on the overlap between the absorption and emission states of the molecule and that the emission dynamics are modified in the strong coupling regime, showing a significant enhancement of the photoluminescence intensity as compared to the bare dye. This paves the way to the investigation of molecules with large absorption coefficients but poor emission efficiencies for the realization of polariton condensates and organic electrically injected lasers by exploiting strong exciton-photon coupling regimes.
△ Less
Submitted 20 October, 2014;
originally announced October 2014.