Skip to main content

Showing 1–8 of 8 results for author: Kim, H Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2309.00372  [pdf, other

    eess.IV cs.CV

    On the Localization of Ultrasound Image Slices within Point Distribution Models

    Authors: Lennart Bastian, Vincent Bürgin, Ha Young Kim, Alexander Baumann, Benjamin Busam, Mahdi Saleh, Nassir Navab

    Abstract: Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US). Longitudinal nodule tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology. This task, however, imposes a substantial cognitive load on clinicians due to the inherent challenge of maintaining a mental 3D reconstruction of the organ. We thus present a framework for autom… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: ShapeMI Workshop @ MICCAI 2023; 12 pages 2 figures

  2. arXiv:2308.15791  [pdf, other

    cs.CV eess.IV

    Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

    Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim

    Abstract: Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In N… ▽ More

    Submitted 5 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  3. End-to-End Learnable Multi-Scale Feature Compression for VCM

    Authors: Yeongwoong Kim, Hyewon Jeong, Janghyun Yu, Younhee Kim, Jooyoung Lee, Se Yoon Jeong, Hui Yong Kim

    Abstract: The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision performance instead of human visual quality. In the feature compression track of MPEG-VCM, multi-scale features extracted from images are subject to compressio… ▽ More

    Submitted 8 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 13 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technology

  4. arXiv:2303.16511  [pdf, other

    eess.AS

    Joint unsupervised and supervised learning for context-aware language identification

    Authors: **seok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

    Abstract: Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this probl… ▽ More

    Submitted 14 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  5. arXiv:2111.03664  [pdf, other

    cs.LG eess.AS eess.IV

    Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

    Authors: Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

    Abstract: Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teach… ▽ More

    Submitted 11 August, 2023; v1 submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  6. Breast Cancer Diagnosis in Two-View Mammography Using End-to-End Trained EfficientNet-Based Convolutional Network

    Authors: Daniel G. P. Petrini, Carlos Shimizu, Rosimeire A. Roela, Gabriel V. Valente, Maria A. A. K. Folgueira, Hae Yong Kim

    Abstract: Some recent studies have described deep convolutional neural networks to diagnose breast cancer in mammograms with similar or even superior performance to that of human experts. One of the best techniques does two transfer learnings: the first uses a model trained on natural images to create a "patch classifier" that categorizes small subimages; the second uses the patch classifier to scan the who… ▽ More

    Submitted 3 August, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Updated to published version in IEEE Access

    Journal ref: IEEE Access, vol. 10, pp. 77723-77731, 2022

  7. TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition

    Authors: Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won Ik Cho, Nam Soo Kim

    Abstract: In recent years, there has been a great deal of research in develo** end-to-end speech recognition models, which enable simplifying the traditional pipeline and achieving promising results. Despite their remarkable performance improvements, end-to-end models typically require expensive computational cost to show successful performance. To reduce this computational burden, knowledge distillation… ▽ More

    Submitted 16 September, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

  8. Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo Kim

    Abstract: For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end with… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 7 pages, 3 figures

    Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI} 2020