Skip to main content

Showing 1–9 of 9 results for author: Yang, S H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14576  [pdf, other

    eess.AS

    Towards Intelligent Speech Assistants in Operating Rooms: A Multimodal Model for Surgical Workflow Analysis

    Authors: Kubilay Can Demir, Belen Lojo Rodriguez, Tobias Weise, Andreas Maier, Seung Hee Yang

    Abstract: To develop intelligent speech assistants and integrate them seamlessly with intra-operative decision-support frameworks, accurate and efficient surgical phase recognition is a prerequisite. In this study, we propose a multimodal framework based on Gated Multimodal Units (GMU) and Multi-Stage Temporal Convolutional Networks (MS-TCN) to recognize surgical phases of port-catheter placement operations… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 5 Pages, Interspeech 2024

    MSC Class: 00b20

  2. arXiv:2404.08064  [pdf

    eess.AS cs.AI cs.CR cs.LG

    The Impact of Speech Anonymization on Pathology and Its Limits

    Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  3. Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

    Authors: Soroosh Tayebi Arasteh, Cristian David Rios-Urrego, Elmar Noeth, Andreas Maier, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave

    Abstract: Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing pati… ▽ More

    Submitted 21 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: INTERSPEECH 2023, pp. 5003--5007, Dublin, Ireland

    Journal ref: INTERSPEECH 2023

  4. arXiv:2206.12320  [pdf, other

    cs.SD cs.AI eess.AS

    PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis

    Authors: Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina Breininger, Tobias Weise, Andreas Maier, Seung Hee Yang

    Abstract: This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0 minutes. The corpus aims to provide a resource for develo** a smart speech assistant in ope… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: 8 pages, 4 figures, Text, Speech and Dialogue 2022 Conference

    MSC Class: 00b20

  5. arXiv:2204.06450  [pdf, other

    cs.SD cs.LG eess.AS

    The effect of speech pathology on automatic speaker verification -- a large-scale study

    Authors: Soroosh Tayebi Arasteh, Tobias Weise, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects… ▽ More

    Submitted 22 November, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Published in Scientific Reports

    Journal ref: Sci Rep 13, 20476 (2023)

  6. arXiv:2204.04016  [pdf, other

    eess.AS cs.CL cs.LG cs.SD q-bio.QM

    Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

    Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang

    Abstract: Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech represen… ▽ More

    Submitted 27 June, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted and Accepted at INTERSPEECH2022

  7. arXiv:2204.01670  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

    Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

    Abstract: State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of sp… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted for review at Interspeech 2022

  8. arXiv:2112.03678  [pdf, other

    cs.CR cs.CV cs.CY cs.LG eess.IV physics.med-ph

    Does Proprietary Software Still Offer Protection of Intellectual Property in the Age of Machine Learning? -- A Case Study using Dual Energy CT Data

    Authors: Andreas Maier, Seung Hee Yang, Farhad Maleki, Nikesh Muthukrishnan, Reza Forghani

    Abstract: In the domain of medical image processing, medical device manufacturers protect their intellectual property in many cases by ship** only compiled software, i.e. binary code which can be executed but is difficult to be understood by a potential attacker. In this paper, we investigate how well this procedure is able to protect image processing algorithms. In particular, we investigate whether the… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 6 pages, 2 figures, 1 table, accepted on BVM 2022

  9. arXiv:2001.04260  [pdf

    eess.AS cs.CL cs.SD

    Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training

    Authors: Seung Hee Yang, Minhwa Chung

    Abstract: Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were rec… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: To be Published on the 24th February in BIOSIGNALS 2020. arXiv admin note: text overlap with arXiv:1904.09407