Skip to main content

Showing 1–9 of 9 results for author: Shandiz, A H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.10636  [pdf, other

    eess.IV cs.CV

    Automated Identification of Failure Cases in Organ at Risk Segmentation Using Distance Metrics: A Study on CT Data

    Authors: Amin Honarmandi Shandiz, Attila Rádics, Rajesh Tamada, Makk Árpád, Karolina Glowacka, Lehel Ferenczi, Sandeep Dutta, Michael Fanariotis

    Abstract: Automated organ at risk (OAR) segmentation is crucial for radiation therapy planning in CT scans, but the generated contours by automated models can be inaccurate, potentially leading to treatment planning issues. The reasons for these inaccuracies could be varied, such as unclear organ boundaries or inaccurate ground truth due to annotation errors. To improve the model's performance, it is necess… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, 2 tables

  2. Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

    Authors: László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Csapó Tamás Gábor

    Abstract: Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a quick switch between users troublesome. Even for the same speaker, these models perform poorly cross-session, i.e. after dismounting and re-mounting… ▽ More

    Submitted 17 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, 3 tables

    Journal ref: the Proceedings of Interspeech 2023

  3. arXiv:2206.12947  [pdf, other

    cs.HC eess.IV

    Improved Processing of Ultrasound Tongue Videos by Combining ConvLSTM and 3D Convolutional Networks

    Authors: Amin Honarmandi Shandiz, Laszlo Toth

    Abstract: Silent Speech Interfaces aim to reconstruct the acoustic signal from a sequence of ultrasound tongue images that records the articulatory movement. The extraction of information about the tongue movement requires us to efficiently process the whole sequence of images, not just as a single image. Several approaches have been suggested to process such a sequential image data. The classic neural netw… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: 10 pages, 4 tables, 2 figures, conference

  4. arXiv:2107.12051  [pdf, other

    eess.AS cs.AI cs.SD

    Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Map** using Ultrasound Tongue Imaging

    Authors: Csaba Zainkó, László Tóth, Amin Honarmandi Shandiz, Gábor Gosztolya, Alexandra Markó, Géza Németh, Tamás Gábor Csapó

    Abstract: For articulatory-to-acoustic map**, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic map** with a limited database. We use… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.03152

  5. arXiv:2106.04552  [pdf, ps, other

    cs.SD

    Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

    Authors: Amin Honarmandi Shandiz, László Tóth, Gábor Gosztolya, Alexandra Markó, Tamás Gábor Csapó

    Abstract: Articulatory-to-acoustic map** seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly specific to the actual speaker. Hence, due to the lack of multi-speaker data sets, researchers have so far concentrated on speaker-dependent modeling.… ▽ More

    Submitted 11 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: 5 pages, 3 figures, 3 tables

  6. Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks

    Authors: Amin Honarmandi Shandiz, László Tóth

    Abstract: Voice Activity Detection (VAD) is not easy task when the input audio signal is noisy, and it is even more complicated when the input is not even an audio recording. This is the case with Silent Speech Interfaces (SSI) where we record the movement of the articulatory organs during speech, and we aim to reconstruct the speech signal from this recording. Our SSI system synthesizes speech from ultraso… ▽ More

    Submitted 18 September, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: 12 pages, 7 tables, 4 figures

  7. Improving Neural Silent Speech Interface Models by Adversarial Training

    Authors: Amin Honarmandi Shandiz, László Tóth, Gábor Gosztolya, Alexandra Markó, Tamás Gábor Csapó

    Abstract: Besides the well-known classification task, these days neural networks are frequently being applied to generate or transform data, such as images and audio signals. In such tasks, the conventional loss functions like the mean squared error (MSE) may not give satisfactory results. To improve the perceptual quality of the generated signals, one possibility is to increase their similarity to real sig… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 11 pages, 3 tables, 2 figures

  8. arXiv:2104.11598  [pdf, ps, other

    cs.SD eess.AS

    Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

    Authors: Yide Yu, Amin Honarmandi Shandiz, László Tóth

    Abstract: Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable reso… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 6 pages. 4 tables, 3 figures

  9. arXiv:2104.11532  [pdf, ps, other

    cs.SD cs.CL eess.AS

    3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces

    Authors: László Tóth, Amin Honarmandi Shandiz

    Abstract: Silent speech interfaces (SSI) aim to reconstruct the speech signal from a recording of the articulatory movement, such as an ultrasound video of the tongue. Currently, deep neural networks are the most successful technology for this task. The efficient solution requires methods that do not simply process single images, but are able to extract the tongue movement information from a sequence of vid… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 10 pages, 2 tables , 3 figures