Skip to main content

Showing 1–50 of 100 results for author: Park, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  3. arXiv:2406.05983  [pdf, other

    eess.AS

    Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation

    Authors: Ui-Hyeop Shin, Sangyoun Lee, Taehan Kim, Hyung-Min Park

    Abstract: Since the success of a time-domain speech separation, further improvements have been made by expanding the length and channel of a feature sequence to increase the amount of computation. When temporally expanded to a long sequence, the feature is segmented into chunks as a dual-path model in most studies of speech separation. In particular, it is common for the process of separating features corre… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Project Page https://fordemopage.github.io/SepReformer

  4. arXiv:2406.02936  [pdf

    eess.IV cs.CV

    Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI

    Authors: Jonghun Kim, Hyun** Park

    Abstract: Breast cancer is the most prevalent cancer among women and predicting pathologic complete response (pCR) after anti-cancer treatment is crucial for patient prognosis and treatment customization. Deep learning has shown promise in medical imaging diagnosis, particularly when utilizing multiple imaging modalities to enhance accuracy. This study presents a model that predicts pCR in breast cancer pat… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 5 figures, IEEE ISBI 2024 proceedings

  5. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  6. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  7. arXiv:2404.07021  [pdf, other

    eess.SP

    A 4x32Gb/s 1.8pJ/bit Collaborative Baud-Rate CDR with Background Eye-Climbing Algorithm and Low-Power Global Clock Distribution

    Authors: Jihee Kim, Jia Park, Jiwon Shin, Hanseok Kim, Kahyun Kim, Haengbeom Shin, Ha-Jung Park, Woo-Seok Choi

    Abstract: This paper presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the freq… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  8. arXiv:2404.05119  [pdf, other

    eess.SP

    A 0.65-pJ/bit 3.6-TB/s/mm I/O Interface with XTalk Minimizing Affine Signaling for Next-Generation HBM with High Interconnect Density

    Authors: Hyunjun Park, Jiwon Shin, Hanseok Kim, Jihee Kim, Haengbeom Shin, Taehoon Kim, Jung-Hun Park, Woo-Seok Choi

    Abstract: This paper presents an I/O interface with Xtalk Minimizing Affine Signaling (XMAS), which is designed to support high-speed data transmission in die-to-die communication over silicon interposers or similar high-density interconnects susceptible to crosstalk. The operating principles of XMAS are elucidated through rigorous analyses, and its advantages over existing signaling are validated through n… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  9. arXiv:2403.05906  [pdf, other

    eess.IV cs.CV

    Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

    Authors: **gyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao

    Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 13 pages, 10 figures, conference or other essential info

  10. arXiv:2403.03526  [pdf, other

    eess.SP cs.LG q-bio.NC

    FingerNet: EEG Decoding of A Fine Motor Imagery with Finger-tap** Task Based on A Deep Neural Network

    Authors: Young-Min Go, Seong-Hyun Yu, Hyeong-Yeong Park, Minji Lee, Ji-Hoon Jeong

    Abstract: Brain-computer interface (BCI) technology facilitates communication between the human brain and computers, primarily utilizing electroencephalography (EEG) signals to discern human intentions. Although EEG-based BCI systems have been developed for paralysis individuals, ongoing studies explore systems for speech imagery and motor imagery (MI). This study introduces FingerNet, a specialized network… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 12 pages,5 figures, and 2 tables

  11. arXiv:2402.09452  [pdf, other

    eess.SP cs.LG eess.SY

    Data Distribution Dynamics in Real-World WiFi-Based Patient Activity Monitoring for Home Healthcare

    Authors: Mahathir Monjur, Jia Liu, **gye Xu, Yuntong Zhang, Xiaomeng Wang, Chengdong Li, Hye** Park, Wei Wang, Karl Shieh, Sirajum Munir, **g Wang, Lixin Song, Shahriar Nirjon

    Abstract: This paper examines the application of WiFi signals for real-world monitoring of daily activities in home healthcare scenarios. While the state-of-the-art of WiFi-based activity recognition is promising in lab environments, challenges arise in real-world settings due to environmental, subject, and system configuration variables, affecting accuracy and adaptability. The research involved deploying… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  12. arXiv:2401.14421  [pdf, other

    cs.LG cs.MA eess.SY stat.ML

    Multi-Agent Based Transfer Learning for Data-Driven Air Traffic Applications

    Authors: Chuhao Deng, Hong-Cheol Choi, Hyunsang Park, Inseok Hwang

    Abstract: Research in develo** data-driven models for Air Traffic Management (ATM) has gained a tremendous interest in recent years. However, data-driven models are known to have long training time and require large datasets to achieve good performance. To address the two issues, this paper proposes a Multi-Agent Bidirectional Encoder Representations from Transformers (MA-BERT) model that fully considers… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures, submitted for IEEE Transactions on Intelligent Transportation System

  13. arXiv:2401.06913  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Microphone Conversion: Mitigating Device Variability in Sound Event Classification

    Authors: Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park

    Abstract: In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method. As SEC systems become increasingly common, it is crucial that they work well with audio from diverse recording devices. Our method addresses limited device div… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  14. arXiv:2312.08603  [pdf, other

    eess.AS cs.SD

    NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification

    Authors: Hyun-Jun Heo, Ui-Hyeop Shin, Ran Lee, YoungJu Cheon, Hyung-Min Park

    Abstract: In speaker verification, ECAPA-TDNN has shown remarkable improvement by utilizing one-dimensional(1D) Res2Net block and squeeze-and-excitation(SE) module, along with multi-layer feature aggregation (MFA). Meanwhile, in vision tasks, ConvNet structures have been modernized by referring to Transformer, resulting in improved performance. In this paper, we present an improved block design for TDNN in… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  15. arXiv:2312.03013  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Breast Ultrasound Report Generation using LangChain

    Authors: Jaeyoung Huh, Hyun Jeong Park, Jong Chul Ye

    Abstract: Breast ultrasound (BUS) is a critical diagnostic tool in the field of breast imaging, aiding in the early detection and characterization of breast abnormalities. Interpreting breast ultrasound images commonly involves creating comprehensive medical reports, containing vital information to promptly assess the patient's condition. However, the ultrasound imaging system necessitates capturing multipl… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  16. arXiv:2311.13906  [pdf, other

    eess.SP eess.SY

    Threat-Based Resource Allocation Strategy for Target Tracking in a Cognitive Radar Network

    Authors: JiYe Lee, J. H Park

    Abstract: Cognitive radar is developed to utilize the feedback of its operating environment obtained from a beam to make resource allocation decisions by solving optimization problems. Previous works focused on target tracking accuracy by designing an evaluation metric for an optimization problem. However, in a real combat situation, not only the tracking performance of the target but also its operational p… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  17. arXiv:2311.02586  [pdf, other

    eess.IV cs.CV q-bio.QM

    Synthetic Tumor Manipulation: With Radiomics Features

    Authors: Inye Na, Jonghun Kim, Hyun** Park

    Abstract: We introduce RadiomicsFill, a synthetic tumor generator conditioned on radiomics features, enabling detailed control and individual manipulation of tumor subregions. This conditioning leverages conventional high-dimensional features of the tumor (i.e., radiomics features) and thus is biologically well-grounded. Our model combines generative adversarial networks, radiomics-feature conditioning, and… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: Paper accepted at NeurIPS 2023 Workshop: Medical Imaging meets NeurIPS

  18. arXiv:2311.00265  [pdf, other

    eess.IV cs.CV

    Adaptive Latent Diffusion Model for 3D Medical Image to Image Translation: Multi-modal Magnetic Resonance Imaging Study

    Authors: Jonghun Kim, Hyun** Park

    Abstract: Multi-modal images play a crucial role in comprehensive evaluations in medical image analysis providing complementary information for identifying clinically important biomarkers. However, in clinical practice, acquiring multiple modalities can be challenging due to reasons such as scan cost, limited scan time, and safety considerations. In this paper, we propose a model based on the latent diffusi… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: 8 pages, 7 figures, WACV 2024 Accepted

  19. arXiv:2310.01633  [pdf, other

    eess.SY

    Distributionally Robust Path Integral Control

    Authors: Hyuk Park, Duo Zhou, Grani A. Hanasusanto, Takashi Tanaka

    Abstract: We consider a continuous-time continuous-space stochastic optimal control problem, where the controller lacks exact knowledge of the underlying diffusion process, relying instead on a finite set of historical disturbance trajectories. In situations where data collection is limited, the controller synthesized from empirical data may exhibit poor performance. To address this issue, we introduce a no… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  20. arXiv:2308.14595  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Neural Network Training Strategy to Enhance Anomaly Detection Performance: A Perspective on Reconstruction Loss Amplification

    Authors: YeongHyeon Park, Sungho Kang, Myung ** Kim, Hyeonho Jeong, Hyunkyu Park, Hyeong Seok Kim, Juneho Yi

    Abstract: Unsupervised anomaly detection (UAD) is a widely adopted approach in industry due to rare anomaly occurrences and data imbalance. A desirable characteristic of an UAD model is contained generalization ability which excels in the reconstruction of seen normal patterns but struggles with unseen anomalies. Recent studies have pursued to contain the generalization capability of their UAD models in rec… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, 2 tables

  21. arXiv:2308.01939  [pdf, ps, other

    eess.IV physics.med-ph

    Numerical Uncertainty of Convolutional Neural Networks Inference for Structural Brain MRI Analysis

    Authors: Inés Gonzalez Pepe, Vinuyan Sivakolunthu, Hae Lang Park, Yohan Chatelain, Tristan Glatard

    Abstract: This paper investigates the numerical uncertainty of Convolutional Neural Networks (CNNs) inference for structural brain MRI analysis. It applies Random Rounding -- a stochastic arithmetic technique -- to CNN models employed in non-linear registration (SynthMorph) and whole-brain segmentation (FastSurfer), and compares the resulting numerical uncertainty to the one measured in a reference image-pr… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  22. arXiv:2306.07562  [pdf, other

    eess.AS cs.SD

    Statistical Beamformer Exploiting Non-stationarity and Sparsity with Spatially Constrained ICA for Robust Speech Recognition

    Authors: Ui-Hyeop Shin, Hyung-Min Park

    Abstract: In this paper, we present a statistical beamforming algorithm as a pre-processing step for robust automatic speech recognition (ASR). By modeling the target speech as a non-stationary Laplacian distribution, a mask-based statistical beamforming algorithm is proposed to exploit both its output and masked input variance for robust estimation of the beamformer. In addition, we also present a method f… ▽ More

    Submitted 5 January, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted by TASLP

  23. arXiv:2305.10823  [pdf, other

    eess.AS cs.LG cs.SD

    FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs

    Authors: Won Jang, Dan Lim, Heayoung Park

    Abstract: This paper presents FastFit, a novel neural vocoder architecture that replaces the U-Net encoder with multiple short-time Fourier transforms (STFTs) to achieve faster generation rates without sacrificing sample quality. We replaced each encoder block with an STFT, with parameters equal to the temporal resolution of each decoder block, leading to the skip connection. FastFit reduces the number of p… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH 2023

  24. arXiv:2305.09986  [pdf, other

    eess.IV cs.CV cs.LG

    A robust multi-domain network for short-scanning amyloid PET reconstruction

    Authors: Hyoung Suk Park, Young ** Jeong, Kiwan Jeon

    Abstract: This paper presents a robust multi-domain network designed to restore low-quality amyloid PET images acquired in a short period of time. The proposed method is trained on pairs of PET images from short (2 minutes) and standard (20 minutes) scanning times, sourced from multiple domains. Learning relevant image features between these domains with a single network is challenging. Our key contribution… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 21 pages, 7 figures, 3 tables

    MSC Class: 92C55; 68T05; 15A29; 65F22

  25. arXiv:2304.03940  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Unsupervised Speech Representation Pooling Using Vector Quantization

    Authors: Jeongkyun Park, Kwanghee Choi, Hyunjun Heo, Hyung-Min Park

    Abstract: With the advent of general-purpose speech representations from large-scale self-supervised models, applying a single model to multiple downstream tasks is becoming a de-facto approach. However, the pooling problem remains; the length of speech representations is inherently variable. The naive average pooling is often used, even though it ignores the characteristics of speech, such as differently l… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  26. arXiv:2304.00471  [pdf, other

    cs.SD cs.CV cs.GR cs.LG eess.AS

    A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

    Authors: Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

    Abstract: Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limi… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: MLSys Workshop on On-Device Intelligence, 2023; Demo: https://huggingface.co/spaces/nota-ai/compressed_wav2lip

  27. arXiv:2303.15703  [pdf, other

    eess.AS

    AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection

    Authors: ** Sob Kim, Hyun Joon Park, Wooseok Shin, Sung Won Han

    Abstract: Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based… ▽ More

    Submitted 10 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2023

  28. arXiv:2303.09057  [pdf, other

    eess.AS cs.SD

    TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

    Authors: Hyun Joon Park, Seok Woo Yang, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker. The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics. In this study, we propose Triple Adaptive Att… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: To appear in ICASSP 2023

  29. arXiv:2303.01678  [pdf, other

    eess.IV cs.CV physics.med-ph

    Nonlinear ill-posed problem in low-dose dental cone-beam computed tomography

    Authors: Hyoung Suk Park, Chang Min Hyun, ** Keun Seo

    Abstract: This paper describes the mathematical structure of the ill-posed nonlinear inverse problem of low-dose dental cone-beam computed tomography (CBCT) and explains the advantages of a deep learning-based approach to the reconstruction of computed tomography images over conventional regularization methods. This paper explains the underlying reasons why dental CBCT is more ill-posed than standard comput… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  30. arXiv:2212.08034  [pdf, other

    eess.IV

    Generating Realistic Brain MRIs via a Conditional Diffusion Probabilistic Model

    Authors: Wei Peng, Ehsan Adeli, Tomas Bosschieter, Sang Hyun Park, Qingyu Zhao, Kilian M. Pohl

    Abstract: As acquiring MRIs is expensive, neuroscience studies struggle to attain a sufficient number of them for properly training deep learning models. This challenge could be reduced by MRI synthesis, for which Generative Adversarial Networks (GANs) are popular. GANs, however, are commonly unstable and struggle with creating diverse and high-quality data. A more stable alternative is Diffusion Probabilis… ▽ More

    Submitted 7 September, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Journal ref: MICCAI 2023

  31. arXiv:2212.02233  [pdf, other

    cs.NE cs.LG eess.SP

    Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks

    Authors: Yuhang Li, Ruokai Yin, Hyoungseob Park, Youngeun Kim, Priyadarshini Panda

    Abstract: We study the Human Activity Recognition (HAR) task, which predicts user daily activity based on time series data from wearable sensors. Recently, researchers use end-to-end Artificial Neural Networks (ANNs) to extract the features and perform classification in HAR. However, ANNs pose a huge computation burden on wearable devices and lack temporal feature extraction. In this work, we leverage Spiki… ▽ More

    Submitted 14 November, 2022; originally announced December 2022.

    Comments: Workshop on Learning from Time Series for Health

  32. arXiv:2211.13676  [pdf, other

    cs.CV eess.IV

    Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation

    Authors: Seung Ho Park, Young Su Moon, Nam Ik Cho

    Abstract: Single-image super-resolution (SISR) networks trained with perceptual and adversarial losses provide high-contrast outputs compared to those of networks trained with distortion-oriented losses, such as L1 or L2. However, it has been shown that using a single perceptual loss is insufficient for accurately restoring locally varying diverse shapes in images, often generating undesirable artifacts or… ▽ More

    Submitted 11 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: CVPR 2023 accepted. Code and trained models will be available at https://github.com/seungho-snu/SROOE

  33. Multi-View Attention Transfer for Efficient Speech Enhancement

    Authors: Wooseok Shin, Hyun Joon Park, ** Sob Kim, Byung Hoon Lee, Sung Won Han

    Abstract: Recent deep learning models have achieved high performance in speech enhancement; however, it is still challenging to obtain a fast and low-complexity model without significant performance degradation. Previous knowledge distillation studies on speech enhancement could not solve this problem because their output distillation methods do not fit the speech enhancement task in some aspects. In this s… ▽ More

    Submitted 30 October, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Proceedings of Interspeech 2022

  34. arXiv:2207.11728  [pdf, other

    eess.SP cs.AR

    A Custom IC Layout Generation Engine Based on Dynamic Templates and Grids

    Authors: Taeho Shin, Dongjun Lee, Dongwhee Kim, Gaeryun Sung, Wook** Shin, Yunseong Jo, Hyungjoo Park, Jaeduk Han

    Abstract: This paper presents an automatic layout generation framework in advanced CMOS technologies. The framework extends the template-and-grid-based layout generation methodology with the following additional techniques applied to produce optimal layouts more effectively. First, layout templates and grids are dynamically created and adjusted during runtime to serve various structural, functional, and des… ▽ More

    Submitted 24 July, 2022; originally announced July 2022.

    Comments: 10 pages, 6 figures

  35. arXiv:2207.11132  [pdf, other

    eess.SY cs.AI cs.MA

    Proactive Distributed Constraint Optimization of Heterogeneous Incident Vehicle Teams

    Authors: Justice Darko, Hyoshin Park

    Abstract: Traditionally, traffic incident management (TIM) programs coordinate the deployment of emergency resources to immediate incident requests without accommodating the interdependencies on incident evolutions in the environment. However, ignoring inherent interdependencies on the evolution of incidents in the environment while making current deployment decisions is shortsighted, and the resulting naiv… ▽ More

    Submitted 4 August, 2022; v1 submitted 16 July, 2022; originally announced July 2022.

    Comments: 14 pages, 13 figures, 2 tables, journal

  36. arXiv:2207.04156  [pdf, other

    cs.SD cs.CL cs.IR eess.AS

    Automated Audio Captioning and Language-Based Audio Retrieval

    Authors: Clive Gomes, Hye** Park, Patrick Kollman, Yi Song, Iffanice Houndayi, Ankit Shah

    Abstract: This project involved participation in the DCASE 2022 Competition (Task 6) which had two subtasks: (1) Automated Audio Captioning and (2) Language-Based Audio Retrieval. The first subtask involved the generation of a textual description for audio samples, while the goal of the second was to find audio samples within a fixed dataset that match a given description. For both subtasks, the Clotho data… ▽ More

    Submitted 15 May, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: DCASE 2022 Competition (Task 6)

  37. arXiv:2206.13700  [pdf, other

    cs.SD cs.LG eess.AS

    Domain Agnostic Few-shot Learning for Speaker Verification

    Authors: Seunghan Yang, Debasmit Das, Janghoon Cho, Hyoungwoo Park, Sungrack Yun

    Abstract: Deep learning models for verification systems often fail to generalize to new users and new environments, even though they learn highly discriminative features. To address this problem, we propose a few-shot domain generalization framework that learns to tackle distribution shift for new users and new domains. Our framework consists of domain-specific and domain-aggregation networks, which are the… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: Proceedings of INTERSPEECH 2022

  38. arXiv:2206.12638  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Distilling a Pretrained Language Model to a Multilingual ASR Model

    Authors: Kwanghee Choi, Hyung-Min Park

    Abstract: Multilingual speech data often suffer from long-tailed language distribution, resulting in performance degradation. However, multilingual text data is much easier to obtain, yielding a more useful general language model. Hence, we are motivated to distill the rich knowledge embedded inside a well-trained teacher text model to the student speech model. We propose a novel method called the Distillin… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022. Official implementation provided in https://github.com/juice500ml/xlm_to_xlsr

  39. arXiv:2206.12513  [pdf, other

    cs.SD cs.LG eess.AS

    Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification

    Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Hyunsin Park, Juntae Lee, Simyung Chang

    Abstract: While using two-dimensional convolutional neural networks (2D-CNNs) in image processing, it is possible to manipulate domain information using channel statistics, and instance normalization has been a promising way to get domain-invariant features. However, unlike image processing, we analyze that domain-relevant information in an audio feature is dominant in frequency statistics rather than chann… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: Proceedings of INTERSPEECH 2022

  40. arXiv:2206.07651  [pdf

    eess.SP

    Fault Diagnosis of Inter-turn Short Circuit in Permanent Magnet Synchronous Motors with Current Signal Imaging and Unsupervised Learning

    Authors: W. Jung, S. H. Yun, Y. S. Lim, S. Cheong, J. Bae, Y. H. Park

    Abstract: This paper proposes machine-independent feature engineering for winding inter-turn short circuit fault that uses electrical current signals. Electrical current signal collected from permanent magnet synchronous motor (PMSM) is subjected to different environmental and operational conditions. To solve these problems, robust current signal imaging method and deep learning-based feature extraction met… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: submitted to IECON 2022

  41. Explainable AI for Suicide Risk Assessment Using Eye Activities and Head Gestures

    Authors: Siyu Liu, Catherine Lu, Sharifa Alghowinem, Lea Gotoh, Cynthia Breazeal, Hae Won Park

    Abstract: The prevalence of suicide has been on the rise since the 20th century, causing severe emotional damage to individuals, families, and communities alike. Despite the severity of this suicide epidemic, there is so far no reliable and systematic way to assess suicide intent of a given individual. Through efforts to automate and systematize diagnosis of mental illnesses over the past few years, verbal… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Artificial Intelligence in HCI. HCII 2022

  42. arXiv:2206.04383  [pdf, other

    eess.IV physics.med-ph

    Only-Train-Once MR Fingerprinting for Magnetization Transfer Contrast Quantification

    Authors: Beomgu Kang, Hye-Young Heo, HyunWook Park

    Abstract: Magnetization transfer contrast magnetic resonance fingerprinting (MTC-MRF) is a novel quantitative imaging technique that simultaneously measures several tissue parameters of semisolid macromolecule and free bulk water. In this study, we propose an Only-Train-Once MR fingerprinting (OTOM) framework that estimates the free bulk water and MTC tissue parameters from MR fingerprints regardless of MRF… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted at 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI'22)

  43. arXiv:2206.03796  [pdf, other

    cs.RO eess.SP

    Adaptive Neural Network-based Unscented Kalman Filter for Robust Pose Tracking of Noncooperative Spacecraft

    Authors: Tae Ha Park, Simone D'Amico

    Abstract: This paper presents a neural network-based Unscented Kalman Filter (UKF) to estimate and track the pose (i.e., position and orientation) of a known, noncooperative, tumbling target spacecraft in a close-proximity rendezvous scenario. The UKF estimates the target's orbit and attitude relative to the servicer based on the pose information provided by a multi-task Convolutional Neural Network (CNN) f… ▽ More

    Submitted 8 May, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: Accepted to AIAA Journal of Guidance, Control, and Dynamics. Updated derivation of Section IV.B and experiments

  44. arXiv:2204.06322  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

    Authors: Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun ** Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Abstract: We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering str… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  45. arXiv:2203.02181  [pdf, other

    eess.AS cs.SD eess.SP

    MANNER: Multi-view Attention Network for Noise Erasure

    Authors: Hyun Joon Park, Byung Ha Kang, Wooseok Shin, ** Sob Kim, Sung Won Han

    Abstract: In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: To appear in ICASSP 2022

  46. DXM-TransFuse U-net: Dual Cross-Modal Transformer Fusion U-net for Automated Nerve Identification

    Authors: Baijun Xie, Gary Milam, Bo Ning, Jaepyeong Cha, Chung Hyuk Park

    Abstract: Accurate nerve identification is critical during surgical procedures for preventing any damages to nerve tissues. Nerve injuries can lead to long-term detrimental effects for patients as well as financial overburdens. In this study, we develop a deep-learning network framework using the U-Net architecture with a Transformer block based fusion module at the bottleneck to identify nerve tissues from… ▽ More

    Submitted 27 February, 2022; originally announced February 2022.

    Journal ref: Computerized Medical Imaging and Graphics, 2022-07-01, Volume 99, Article 102090

  47. arXiv:2202.03571  [pdf, other

    eess.IV cs.CV

    Metal Artifact Reduction with Intra-Oral Scan Data for 3D Low Dose Maxillofacial CBCT Modeling

    Authors: Chang Min Hyun, Taigyntuya Bayaraa, Hye Sun Yun, Tae Jun Jang, Hyoung Suk Park, ** Keun Seo

    Abstract: Low-dose dental cone beam computed tomography (CBCT) has been increasingly used for maxillofacial modeling. However, the presence of metallic inserts, such as implants, crowns, and dental filling, causes severe streaking and shading artifacts in a CBCT image and loss of the morphological structures of the teeth, which consequently prevents accurate segmentation of bones. A two-stage metal artifact… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  48. arXiv:2201.10355  [pdf, other

    cs.NE cs.AI cs.LG eess.SP

    Neural Architecture Search for Spiking Neural Networks

    Authors: Youngeun Kim, Yuhang Li, Hyoungseob Park, Yeshwanth Venkatesha, Priyadarshini Panda

    Abstract: Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. However, most prior SNN methods use ANN-like architectures (e.g., VGG-Net or ResNet), which could provide sub-optimal performance for temporal sequence processing of binary information in SNNs. To add… ▽ More

    Submitted 20 July, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV) 2022

  49. arXiv:2201.04898  [pdf, other

    cs.CV eess.IV

    Flexible Style Image Super-Resolution using Conditional Objective

    Authors: Seung Ho Park, Young Su Moon, Nam Ik Cho

    Abstract: Recent studies have significantly enhanced the performance of single-image super-resolution (SR) using convolutional neural networks (CNNs). While there can be many high-resolution (HR) solutions for a given input, most existing CNN-based methods do not explore alternative solutions during the inference. A typical approach to obtaining alternative SR results is to train multiple SR models with dif… ▽ More

    Submitted 8 March, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: Will be presented in IEEE ACCESS. Code and trained models will be available at https://github.com/seungho-snu/FxSR

  50. arXiv:2112.00216  [pdf, other

    cs.CV cs.SD eess.AS

    PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

    Authors: Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park

    Abstract: Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are ma… ▽ More

    Submitted 2 December, 2021; v1 submitted 30 November, 2021; originally announced December 2021.