Skip to main content

Showing 1–6 of 6 results for author: Markó, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2107.12051  [pdf, other

    eess.AS cs.AI cs.SD

    Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Map** using Ultrasound Tongue Imaging

    Authors: Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó

    Abstract: For articulatory-to-acoustic map**, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic map** with a limited database. We use… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.03152

  2. arXiv:2107.02003  [pdf, other

    eess.AS cs.SD

    Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input

    Authors: Tamás Gábor Csapó, László Tóth, Gábor Gosztolya, Alexandra Markó

    Abstract: Articulatory information has been shown to be effective in improving the performance of HMM-based and DNN-based text-to-speech synthesis. Speech synthesis research focuses traditionally on text-to-speech conversion, when the input is text or an estimated linguistic representation, and the target is synthesized speech. However, a research field that has risen in the last decade is articulation-to-s… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: accepted at SSW11 (11th Speech Synthesis Workshop)

  3. Improving Neural Silent Speech Interface Models by Adversarial Training

    Authors: Amin Honarmandi Shandiz, László Tóth, Gábor Gosztolya, Alexandra Markó, Tamás Gábor Csapó

    Abstract: Besides the well-known classification task, these days neural networks are frequently being applied to generate or transform data, such as images and audio signals. In such tasks, the conventional loss functions like the mean squared error (MSE) may not give satisfactory results. To improve the perceptual quality of the generated signals, one possibility is to increase their similarity to real sig… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 11 pages, 3 tables, 2 figures

  4. arXiv:2008.03152  [pdf, other

    eess.AS cs.SD

    Ultrasound-based Articulatory-to-Acoustic Map** with WaveGlow Speech Synthesis

    Authors: Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, Alexandra Markó

    Abstract: For articulatory-to-acoustic map** using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in this paper on ultrasound-based articulatory-to-acoustic conversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of En… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 5 pages, accepted for publication at Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:1906.09885

  5. arXiv:1906.09885  [pdf, other

    cs.SD eess.AS eess.IV

    Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

    Authors: Tamás Gábor Csapó, Mohammed Salah Al-Radhi, Géza Németh, Gábor Gosztolya, Tamás Grósz, László Tóth, Alexandra Markó

    Abstract: Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic map**. Moreover, text-to-speech synthesizers were shown to produce higher quality speech when using a continuous pitch estimate, which takes non-zero pitch values even whe… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: 5 pages, 3 figures, accepted for publication at Interspeech 2019

  6. arXiv:1904.05259  [pdf, other

    cs.SD eess.AS

    Autoencoder-Based Articulatory-to-Acoustic Map** for Ultrasound Silent Speech Interfaces

    Authors: Gábor Gosztolya, Ádám Pintér, László Tóth, Tamás Grósz, Alexandra Markó, Tamás Gábor Csapó

    Abstract: When using ultrasound video as input, Deep Neural Network-based Silent Speech Interfaces usually rely on the whole image to estimate the spectral parameters required for the speech synthesis step. Although this approach is quite straightforward, and it permits the synthesis of understandable speech, it has several disadvantages as well. Besides the inability to capture the relations between close… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: 8 pages, 6 figures, Accepted to IJCNN 2019