Skip to main content

Showing 1–10 of 10 results for author: Sokolova, E

.
  1. arXiv:2402.08093  [pdf, other

    cs.LG cs.CL eess.AS

    BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

    Authors: Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

    Abstract: We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data, achieving a new state-of-the-art in speech naturalness. It deploys a 1-billion-parameter autoregressive Transformer that converts ra… ▽ More

    Submitted 15 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: v1.1 (fixed typos)

  2. arXiv:2307.07062  [pdf, other

    eess.AS cs.LG cs.SD

    Controllable Emphasis with zero data for text-to-speech

    Authors: Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova

    Abstract: We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques im… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: In proceeding of 12th Speech Synthesis Workshop (SSW) 2023

  3. arXiv:2303.16085  [pdf, other

    eess.IV cs.CV

    Whole-body PET image denoising for reduced acquisition time

    Authors: Ivan Kruzhilov, Stepan Kudin, Luka Vetoshkin, Elena Sokolova, Vladimir Kokh

    Abstract: This paper evaluates the performance of supervised and unsupervised deep learning models for denoising positron emission tomography (PET) images in the presence of reduced acquisition times. Our experiments consider 212 studies (56908 images), and evaluate the models using 2D (RMSE, SSIM) and 3D (SUVpeak and SUVmax error for the regions of interest) metrics. It was shown that, in contrast to previ… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  4. arXiv:2202.06409  [pdf, other

    eess.AS cs.CL cs.LG

    Distribution augmentation for low-resource expressive text-to-speech

    Authors: Mateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova

    Abstract: This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available during training. This helps to reduce overfitting, especially in low-resource settings. Our method relies on substituting text and audio fragments in a w… ▽ More

    Submitted 19 February, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: ICASSP 2022: camera-ready

  5. arXiv:2105.11863  [pdf, other

    eess.IV cs.CV cs.LG

    CoRSAI: A System for Robust Interpretation of CT Scans of COVID-19 Patients Using Deep Learning

    Authors: Manvel Avetisian, Ilya Burenko, Konstantin Egorov, Vladimir Kokh, Aleksandr Nesterov, Aleksandr Nikolaev, Alexander Ponomarchuk, Elena Sokolova, Alex Tuzhilin, Dmitry Umerenkov

    Abstract: Analysis of chest CT scans can be used in detecting parts of lungs that are affected by infectious diseases such as COVID-19.Determining the volume of lungs affected by lesions is essential for formulating treatment recommendations and prioritizingpatients by severity of the disease. In this paper we adopted an approach based on using an ensemble of deep convolutionalneural networks for segmentati… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  6. arXiv:2011.09303  [pdf, other

    eess.SP cs.CV cs.LG

    Noise-Resilient Automatic Interpretation of Holter ECG Recordings

    Authors: Konstantin Egorov, Elena Sokolova, Manvel Avetisian, Alexander Tuzhilin

    Abstract: Holter monitoring, a long-term ECG recording (24-hours and more), contains a large amount of valuable diagnostic information about the patient. Its interpretation becomes a difficult and time-consuming task for the doctor who analyzes them because every heartbeat needs to be classified, thus requiring highly accurate methods for automatic interpretation. In this paper, we present a three-stage pro… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted for publication on BIOSIGNALS 2021

  7. arXiv:2006.15956  [pdf, other

    astro-ph.IM astro-ph.HE

    Search for glitches of gamma-ray pulsars with deep learning

    Authors: E. V. Sokolova, A. G. Panin

    Abstract: The pulsar glitches are generally assumed to be an apparent manifestation of the superfluid interior of the neutron stars. Most of them were discovered and extensively studied by continuous monitoring in the radio wavelengths. The Fermi-LAT space telescope has made a revolution uncovering a large population of gamma-ray pulsars. In this paper we suggest to employ these observations for the searche… ▽ More

    Submitted 26 May, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: 5 pages, 5 figures

    Report number: INR-TH-2020-020

    Journal ref: A&A 660, A43 (2022)

  8. Search for differences between radio-loud and radio-quiet gamma-ray pulsar populations with Fermi-LAT data

    Authors: E. V. Sokolova, G. I. Rubtsov

    Abstract: Observations by Fermi LAT enabled us to explore the population of non-recycled gamma-ray pulsars with the set of 89 objects. It was recently noted that there are apparent differences in properties of radio-quiet and radio-loud subsets. In particular, average observed radio-loud pulsar is younger than radio-quiet one and is located at smaller galactic latitude. Even so, the analysis based on the fu… ▽ More

    Submitted 14 November, 2016; v1 submitted 3 January, 2016; originally announced January 2016.

    Comments: 6 pages, 7 figures

    Report number: INR-TH/2016-001

    Journal ref: Astrophys.J. 833 (2016) no.2, 271

  9. Blind search for radio-quiet and radio-loud gamma-ray pulsars with Fermi-LAT data

    Authors: G. I. Rubtsov, E. V. Sokolova

    Abstract: The Fermi Large Area Telescope (LAT) has observed more than a hundred of gamma-ray pulsars, about one third of which are radio-quiet, i.e. not detected at radio frequencies. The most of radio-loud pulsars are detected by Fermi LAT by using the radio timing models, while the radio-quiet ones are discovered in a blind search. The difference in the techniques introduces an observational selection bia… ▽ More

    Submitted 30 October, 2014; v1 submitted 3 June, 2014; originally announced June 2014.

    Comments: 5 pages, 3 figures; accepted for publication in JETP Letters

    Report number: INR-TH/2014-013

    Journal ref: Pis'ma v ZhETF 100:787-792, 2014

  10. arXiv:0901.0191  [pdf, ps, other

    cond-mat.stat-mech

    Field theoretical representation of classical statistical mechanics. I. Wave-vector space

    Authors: A. Yu. Zakharov, E. V. Sokolova

    Abstract: Thermodynamic equivalence between classical many-body system and some auxiliary nonlinear auxiliary field is proved. Connection between Hamiltonians of the many-body system and the auxiliary field is derived.

    Submitted 1 January, 2009; originally announced January 2009.

    Comments: 7 pages. PACS 05.20.-y, 05.70.Ce, 03.50.Kk