Skip to main content

Showing 1–7 of 7 results for author: Park, H J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2312.03013  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Breast Ultrasound Report Generation using LangChain

    Authors: Jaeyoung Huh, Hyun Jeong Park, Jong Chul Ye

    Abstract: Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multipl… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  3. arXiv:2303.15703  [pdf, other

    eess.AS

    AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection

    Authors: ** Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han

    Abstract: Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based… ▽ More

    Submitted 10 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2023

  4. arXiv:2303.09057  [pdf, other

    eess.AS cs.SD

    TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

    Authors: Hyun Joon Park, Seok Woo Yang, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker. The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics. In this study, we propose Triple Adaptive Att… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: To appear in ICASSP 2023

  5. Multi-View Attention Transfer for Efficient Speech Enhancement

    Authors: Wooseok Shin, Hyun Joon Park, ** Sob Kim, Byung Hoon Lee, Sung Won Han

    Abstract: Recent deep learning models have achieved high performance in speech enhancement; however, it is still challenging to obtain a fast and low-complexity model without significant performance degradation. Previous knowledge distillation studies on speech enhancement could not solve this problem because their output distillation methods do not fit the speech enhancement task in some aspects. In this s… ▽ More

    Submitted 30 October, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Proceedings of Interspeech 2022

  6. arXiv:2204.06322  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

    Authors: Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun ** Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Abstract: We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering str… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  7. arXiv:2203.02181  [pdf, other

    eess.AS cs.SD eess.SP

    MANNER: Multi-view Attention Network for Noise Erasure

    Authors: Hyun Joon Park, Byung Ha Kang, Wooseok Shin, ** Sob Kim, Sung Won Han

    Abstract: In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: To appear in ICASSP 2022