Skip to main content

Showing 1–50 of 72 results for author: Kang, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17329  [pdf, other

    eess.SP cs.SD eess.AS physics.bio-ph

    Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

    Authors: Woo-** Chung, Hong-Goo Kang

    Abstract: We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  2. arXiv:2406.12688  [pdf, other

    eess.AS eess.SP

    Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

    Authors: Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi

    Abstract: This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. arXiv:2406.09819  [pdf, other

    eess.AS

    Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments

    Authors: Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang

    Abstract: Ad-hoc distributed microphone environments, where microphone locations and numbers are unpredictable, present a challenge to traditional deep learning models, which typically require fixed architectures. To tailor deep learning models to accommodate arbitrary array configurations, the Transform-Average-Concatenate (TAC) layer was previously introduced. In this work, we integrate TAC layers with du… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  4. arXiv:2404.01330  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    Holo-VQVAE: VQ-VAE for phase-only holograms

    Authors: Joohyun Park, Hyeongyeop Kang

    Abstract: Holography stands at the forefront of visual technology innovation, offering immersive, three-dimensional visualizations through the manipulation of light wave amplitude and phase. Contemporary research in hologram generation has predominantly focused on image-to-hologram conversion, producing holograms from existing images. These approaches, while effective, inherently limit the scope of innovati… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  5. arXiv:2402.09341  [pdf, other

    eess.IV cs.CV

    Registration of Longitudinal Spine CTs for Monitoring Lesion Growth

    Authors: Malika Sanhinova, Nazim Haouchine, Steve D. Pieper, William M. Wells III, Tracy A. Balboni, Alexander Spektor, Mai Anh Huynh, Jeffrey P. Guenette, Bryan Czajkowski, Sarah Caplan, Patrick Doyle, Heejoo Kang, David B. Hackney, Ron N. Alkalay

    Abstract: Accurate and reliable registration of longitudinal spine images is essential for assessment of disease progression and surgical outcome. Implementing a fully automatic and robust registration is crucial for clinical use, however, it is challenging due to substantial change in shape and appearance due to lesions. In this paper we present a novel method to automatically align longitudinal spine CTs… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Paper accepted for publication at SPIE Medical Imaging 2024

  6. arXiv:2312.13752  [pdf

    eess.IV cs.AI cs.CV

    Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

    Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Wei** Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, **yu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

    Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More

    Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 19 pages

  7. arXiv:2312.13615  [pdf, other

    eess.AS cs.SD eess.SP

    Self-supervised Complex Network for Machine Sound Anomaly Detection

    Authors: Miseul Kim, Minh Tri Ho, Hong-Goo Kang

    Abstract: In this paper, we propose an anomaly detection algorithm for machine sounds with a deep complex network trained by self-supervision. Using the fact that phase continuity information is crucial for detecting abnormalities in time-series signals, our proposed algorithm utilizes the complex spectrum as an input and performs complex number arithmetic throughout the entire process. Since the usefulness… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Published in EUSIPCO 2021

  8. arXiv:2312.13603  [pdf, other

    eess.AS cs.SD

    Style Modeling for Multi-Speaker Articulation-to-Speech

    Authors: Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

    Abstract: In this paper, we propose a neural articulation-to-speech (ATS) framework that synthesizes high-quality speech from articulatory signal in a multi-speaker situation. Most conventional ATS approaches only focus on modeling contextual information of speech from a single speaker's articulatory features. To explicitly represent each speaker's speaking style as well as the contextual information, our p… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 5 pages, Accepted to ICASSP 2023

  9. arXiv:2312.13600  [pdf, other

    eess.AS cs.SD

    BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

    Authors: Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

    Abstract: Decoding spoken speech from neural activity in the brain is a fast-emerging research topic, as it could enable communication for people who have difficulties with producing audible speech. For this task, electrocorticography (ECoG) is a common method for recording brain activity with high temporal resolution and high spatial precision. However, due to the risky surgical procedure required for obta… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 5 pages. Accepted to BHI 2023

  10. arXiv:2311.00364  [pdf, other

    eess.AS cs.SD physics.bio-ph

    C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

    Authors: Woo-** Chung, Miseul Kim, Hong-Goo Kang

    Abstract: This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through exper… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 1st place winning paper from the BHI 2023 Data Challenge Competition: Sensor Informatics

  11. Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

    Authors: Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

    Abstract: For personalized speech generation, a neural text-to-speech (TTS) model must be successfully implemented with limited data from a target speaker. To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i.e., target speaker's speech). However, approaches to address this out-of-domain generalization problem in TTS have yet to be thoroughly studied. In this work, we p… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023

    Journal ref: Proc. INTERSPEECH 2023, 4299-4303

  12. arXiv:2307.06961  [pdf, other

    eess.SY

    Coordinated Path Following of UAVs using Event-Triggered Communication over Time-Varying Networks with Digraph Topologies

    Authors: Hyungsoo Kang, Isaac Kaminer, Venanzio Cichella, Naira Hovakimyan

    Abstract: In this article, a novel time-coordination algorithm based on event-triggered communications is proposed to achieve coordinated path-following of UAVs. To be specific, in the approach adopted a UAV transmits its progression information over a time-varying network to its neighbors only when a decentralized trigger condition is satisfied, thereby significantly reducing the volume of inter-vehicle co… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2307.06553

  13. arXiv:2307.06553  [pdf, other

    eess.SY

    Coordinated Path Following of UAVs over Time-Varying Digraphs Connected in an Integral Sense

    Authors: Hyungsoo Kang, Isaac Kaminer, Venanzio Cichella, Naira Hovakimyan

    Abstract: This paper presents a new connectivity condition on the information flow between UAVs to achieve coordinated path following. The information flow is directional, so that the underlying communication network topology is represented by a time-varying digraph. We assume that this digraph is connected in an integral sense. This is a much more general assumption than the one currently used in the liter… ▽ More

    Submitted 15 March, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

  14. arXiv:2306.09640  [pdf, other

    eess.AS

    MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion

    Authors: Woo-** Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

    Abstract: We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments. Our model leverages the periodic characteristics of audio signals and involves two key steps: extracting pitch periodicity using periodic non-periodic convolution (PNP-Conv) b… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: accepted at INTERSPEECH 2023

  15. arXiv:2306.08406  [pdf, other

    eess.AS cs.LG cs.SD

    Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

    Authors: Hejung Yang, Hong-Goo Kang

    Abstract: Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on appl… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: INTERSPEECH 2023 accepted

  16. arXiv:2306.01411  [pdf, other

    eess.AS cs.SD

    HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

    Authors: Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang

    Abstract: This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then restore missing signal components, our model performs these tasks in parallel using two heterogeneous decoder networks. Based on the U-Net style enco… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  17. arXiv:2305.06806  [pdf, other

    cs.SD eess.AS

    HappyQuokka System for ICASSP 2023 Auditory EEG Challenge

    Authors: Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang

    Abstract: This report describes our submission to Task 2 of the Auditory EEG Decoding Challenge at ICASSP 2023 Signal Processing Grand Challenge (SPGC). Task 2 is a regression problem that focuses on reconstructing a speech envelope from an EEG signal. For the task, we propose a pre-layer normalized feed-forward transformer (FFT) architecture. For within-subjects generation, we additionally utilize an auxil… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: First Place in Task 2 of Auditory EEG decoding Challenge, which is part of ICASSP Signal Processing Grand Challenge (SPGC) 2023

  18. arXiv:2305.06511  [pdf, other

    eess.IV cs.CV

    ParamNet: A Parameter-variable Network for Fast Stain Normalization

    Authors: Hongtao Kang, Die Luo, Li Chen, Junbo Hu, Shenghua Cheng, Tingwei Quan, Shaoqun Zeng, Xiuli Liu

    Abstract: In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images,… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  19. arXiv:2211.09385  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    ComMU: Dataset for Combinatorial Music Generation

    Authors: Lee Hyun, Taehyun Kim, Hyolim Kang, Minjoo Ki, Hyeonchan Hwang, Kwanho Park, Sharang Han, Seon Joo Kim

    Abstract: Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short s… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 19 pages, 12 figures

  20. arXiv:2210.03786  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Evaluating the Performance of StyleGAN2-ADA on Medical Images

    Authors: McKell Woodland, John Wood, Brian M. Anderson, Suprateek Kundu, Ethan Lin, Eugene Koay, Bruno Odisio, Caroline Chung, Hyunseon Christine Kang, Aradhana M. Venkatesan, Sireesha Yedururi, Brian De, Yuan-Mao Lin, Ankit B. Patel, Kristy K. Brock

    Abstract: Although generative adversarial networks (GANs) have shown promise in medical imaging, they have four main limitations that impeded their utility: computational cost, data requirements, reliable evaluation measures, and training complexity. Our work investigates each of these obstacles in a novel application of StyleGAN2-ADA to high-resolution medical imaging datasets. Our dataset is comprised of… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: This preprint has not undergone post-submission improvements or corrections. The Version of Record of this contribution is published in LNCS, volume 13570, and is available online at https://doi.org/10.1007/978-3-031-16980-9_14

    Journal ref: Lecture Notes in Computer Science 13570 (2022)

  21. arXiv:2206.15400  [pdf, other

    eess.AS cs.AI cs.LG

    Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting

    Authors: Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang

    Abstract: In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences. Unlike previous approaches requiring speech keyword enrollment, our method compares input queries with an enrolled text keyword sequence. To place the audio and text representations within a common latent space, we adopt an attenti… ▽ More

    Submitted 1 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted to Interspeech 2022

  22. MMMNA-Net for Overall Survival Time Prediction of Brain Tumor Patients

    Authors: Wen Tang, Haoyue Zhang, Pengxin Yu, Han Kang, Rongguo Zhang

    Abstract: Overall survival (OS) time is one of the most important evaluation indices for gliomas situations. Multimodal Magnetic Resonance Imaging (MRI) scans play an important role in the study of glioma prognosis OS time. Several deep learning-based methods are proposed for the OS time prediction on multi-modal MRI problems. However, these methods usually fuse multi-modal information at the beginning or a… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted EMBC 2022

  23. arXiv:2206.06253  [pdf, ps, other

    eess.IV cs.CV cs.LG

    RPLHR-CT Dataset and Transformer Baseline for Volumetric Super-Resolution from CT Scans

    Authors: Pengxin Yu, Haoyue Zhang, Han Kang, Wen Tang, Corey W. Arnold, Rongguo Zhang

    Abstract: In clinical practice, anisotropic volumetric medical images with low through-plane resolution are commonly used due to short acquisition time and lower storage cost. Nevertheless, the coarse resolution may lead to difficulties in medical diagnosis by either physicians or computer-aided diagnosis algorithms. Deep learning-based volumetric super-resolution (SR) methods are feasible ways to improve r… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted MICCAI 2022

  24. arXiv:2205.15720  [pdf, other

    eess.IV cs.CV

    Progressive Multi-scale Consistent Network for Multi-class Fundus Lesion Segmentation

    Authors: Along He, Kai Wang, Tao Li, Wang Bo, Hong Kang, Huazhu Fu

    Abstract: Effectively integrating multi-scale information is of considerable significance for the challenging multi-class segmentation of fundus lesions because different lesions vary significantly in scales and shapes. Several methods have been proposed to successfully handle the multi-scale object segmentation. However, two issues are not considered in previous studies. The first is the lack of interactio… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  25. arXiv:2205.04104  [pdf, other

    eess.AS cs.AI

    ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence

    Authors: Sangshin Oh, Seyun Um, Hong-Goo Kang

    Abstract: The Gumbel-softmax distribution, or Concrete distribution, is often used to relax the discrete characteristics of a categorical distribution and enable back-propagation through differentiable reparameterization. Although it reliably yields low variance gradients, it still relies on a stochastic sampling process for optimization. In this work, we present a relaxed categorical analytic bound (ReCAB)… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

  26. arXiv:2205.01528  [pdf, other

    eess.AS cs.CR cs.SD

    Attentive activation function for improving end-to-end spoofing countermeasure systems

    Authors: Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan

    Abstract: The main objective of the spoofing countermeasure system is to detect the artifacts within the input speech caused by the speech synthesis or voice conversion process. In order to achieve this, we propose to adopt an attentive activation function, more specifically attention rectified linear unit (AReLU) to the end-to-end spoofing countermeasure system. Since the AReLU employs the attention mechan… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

  27. arXiv:2204.03315  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

    Authors: Nick J. C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang

    Abstract: In spoken language understanding (SLU), what the user says is converted to his/her intent. Recent work on end-to-end SLU has shown that accuracy can be improved via pre-training approaches. We revisit ideas presented by Lugosch et al. using speech pre-training and three-module modeling; however, to ease construction of the end-to-end SLU model, we use as our phoneme module an open-source acoustic-… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: Published in INTERSPEECH 2021

  28. Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech

    Authors: Hyungchan Yoon, Seyun Um, Changwhan Kim, Hong-Goo Kang

    Abstract: To simplify the generation process, several text-to-speech (TTS) systems implicitly learn intermediate latent representations instead of relying on predefined features (e.g., mel-spectrogram). However, their generation quality is unsatisfactory as these representations lack speech variances. In this paper, we improve TTS performance by adding \emph{prosody embeddings} to the latent representations… ▽ More

    Submitted 28 August, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: INTERSPEECH 2023

    MSC Class: 68T07 (Primary) 68T50; 68T99 (Secondary) ACM Class: I.2.7; I.2.6

  29. arXiv:2203.05125  [pdf, ps, other

    eess.SP math.OC

    A Lifted $\ell_1 $ Framework for Sparse Recovery

    Authors: Yaghoub Rahimi, Sung Ha Kang, Yifei Lou

    Abstract: Motivated by re-weighted $\ell_1$ approaches for sparse recovery, we propose a lifted $\ell_1$ (LL1) regularization which is a generalized form of several popular regularizations in the literature. By exploring such connections, we discover there are two types of lifting functions which can guarantee that the proposed approach is equivalent to the $\ell_0$ minimization. Computationally, we design… ▽ More

    Submitted 12 May, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 24 pages

    MSC Class: 65K10; 49N45; 65F50; 90C90; 49M20

  30. arXiv:2203.02181  [pdf, other

    eess.AS cs.SD eess.SP

    MANNER: Multi-view Attention Network for Noise Erasure

    Authors: Hyun Joon Park, Byung Ha Kang, Wooseok Shin, ** Sob Kim, Sung Won Han

    Abstract: In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: To appear in ICASSP 2022

  31. arXiv:2202.11918  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement

    Authors: Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang

    Abstract: Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  32. arXiv:2201.10283  [pdf, ps, other

    cs.SD cs.CR eess.AS

    SASV Challenge 2022: A Spoofing Aware Speaker Verification Challenge Evaluation Plan

    Authors: Jee-weon Jung, Hemlata Tak, Hye-** Shim, Hee-Soo Heo, Bong-** Lee, Soo-Whan Chung, Hong-Goo Kang, Ha-** Yu, Nicholas Evans, Tomi Kinnunen

    Abstract: ASV (automatic speaker verification) systems are intrinsically required to reject both non-target (e.g., voice uttered by different speaker) and spoofed (e.g., synthesised or converted) inputs. However, there is little consideration for how ASV systems themselves should be adapted when they are expected to encounter spoofing attacks, nor when they operate in tandem with CMs (spoofing countermeasur… ▽ More

    Submitted 2 March, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: Evaluation plan of the SASV Challenge 2022. See this webpage for more information: https://sasv-challenge.github.io

  33. arXiv:2112.08222  [pdf, other

    eess.SY cs.LG

    Guaranteed Nonlinear Tracking in the Presence of DNN-Learned Dynamics With Contraction Metrics and Disturbance Estimation

    Authors: Pan Zhao, Ziyao Guo, Aditya Gahlawat, Hyungsoo Kang, Naira Hovakimyan

    Abstract: This paper presents an approach to trajectory-centric learning control based on contraction metrics and disturbance estimation for nonlinear systems subject to matched uncertainties. The approach uses deep neural networks to learn uncertain dynamics while still providing guarantees of transient tracking performance throughout the learning phase. Within the proposed approach, a disturbance estimati… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Shorter version submitted to ACC 2023

  34. arXiv:2112.03454  [pdf, other

    eess.AS cs.SD

    Robust Speech Representation Learning via Flow-based Embedding Regularization

    Authors: Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan

    Abstract: Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in numerous tasks including speaker verification, language identification and anti-spoofing, their performance is limited when it comes to mismatched conditions due… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  35. arXiv:2109.14779  [pdf, other

    eess.SY

    Time Coordination of Multiple UAVs over Switching Communication Networks with Digraph Topologies

    Authors: Hyungsoo Kang, Hyung-** Yoon, Venanzio Cichella, Naira Hovakimyan, Petros Voulgaris

    Abstract: This paper presents a time-coordination algorithm for multiple UAVs executing cooperative missions. Unlike previous algorithms, it does not rely on the assumption that the communication between UAVs is bidirectional. Thus, the topology of the inter-UAV information flow can be characterized by digraphs. To achieve coordination with weak connectivity, we design a switching law that orchestrates swit… ▽ More

    Submitted 12 April, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

  36. arXiv:2108.08346  [pdf

    eess.SY

    Permanent Magnet Linear Generator Design for Surface Riding Wave Energy Converters

    Authors: Farid Naghavi, Shrikesh Sheshaprasad, Matthew Gardner, Aghamarshana Meduri, HeonYong Kang, Hamid Toliyat

    Abstract: This paper describes the detailed analysis for the design of a linear generator developed for a Surface Riding Wave Energy Converter (SR-WEC), which was designed to improve energy capture over a wider range of sea states. The study starts with an analysis of the power take-off (PTO) control strategy to harness the maximum output power from given sea states. Passive, reactive, and discrete PTO cont… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: To be published in Energy Conversion Congress and Expo 2021

  37. arXiv:2107.12003  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

    Authors: Se-Yun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang

    Abstract: In this paper, we propose a multi-speaker face-to-speech waveform generation model that also works for unseen speaker conditions. Using a generative adversarial network (GAN) with linguistic and speaker characteristic features as auxiliary conditions, our method directly converts face images into speech waveforms under an end-to-end training framework. The linguistic features are extracted from li… ▽ More

    Submitted 15 March, 2023; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: 5 pages (including references), 1 figure

  38. arXiv:2104.02775  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

    Authors: Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn

    Abstract: In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing. Most conventional approaches utilize frame-wise matching criteria to extract shared information between co-occurring audio and video. Thus, their performance heavily depends on the accuracy of audio-visual synchronization and the effectiveness of their representations. To… ▽ More

    Submitted 25 March, 2021; originally announced April 2021.

    Comments: CVPR 2021. The first two authors contributed equally to this work. Project page: https://caffnet.github.io

  39. arXiv:2103.12926  [pdf, other

    cs.CV eess.IV

    Beyond Visual Attractiveness: Physically Plausible Single Image HDR Reconstruction for Spherical Panoramas

    Authors: Wei Wei, Li Guan, Yue Liu, Hao Kang, Haoxiang Li, Ying Wu, Gang Hua

    Abstract: HDR reconstruction is an important task in computer vision with many industrial needs. The traditional approaches merge multiple exposure shots to generate HDRs that correspond to the physical quantity of illuminance of the scene. However, the tedious capturing process makes such multi-shot approaches inconvenient in practice. In contrast, recent single-shot methods predict a visually appealing HD… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

  40. arXiv:2102.03542   

    eess.SP cs.LG

    Continuous Monitoring of Blood Pressure with Evidential Regression

    Authors: Hyeongju Kim, Woo Hyun Kang, Hyeonseung Lee, Nam Soo Kim

    Abstract: Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is a promising candidate for modern BP measurements, as PPG signals can be easily obtained from wearable devices in a non-invasive manner, allowing quick BP measurement. However, the performance of existing machine learning-based BP measuring methods still fall behind some BP measurement guidelines and most of them provide only p… ▽ More

    Submitted 25 February, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

    Comments: We found some errors in the experimental configuration. We plan to revise the paper and republish it later

  41. arXiv:2101.09864  [pdf, other

    eess.IV cs.CV cs.LG

    Applications of Deep Learning in Fundus Images: A Review

    Authors: Tao Li, Wang Bo, Chunyu Hu, Hong Kang, Hanruo Liu, Kai Wang, Huazhu Fu

    Abstract: The use of fundus images for the early screening of eye diseases is of great clinical importance. Due to its powerful performance, deep learning is becoming more and more popular in related applications, such as lesion segmentation, biomarkers segmentation, disease diagnosis and image synthesis. Therefore, it is very necessary to summarize the recent developments in deep learning for fundus images… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Journal ref: Medical Image Analysis 2021

  42. StainNet: a fast and robust stain normalization network

    Authors: Hongtao Kang, Die Luo, Weihua Feng, Junbo Hu, Shaoqun Zeng, Tingwei Quan, Xiuli Liu

    Abstract: Stain normalization often refers to transferring the color distribution of the source image to that of the target image and has been widely used in biomedical image analysis. The conventional stain normalization is regarded as constructing a pixel-by-pixel color map** model, which only depends on one reference image, and can not accurately achieve the style transformation between image datasets.… ▽ More

    Submitted 23 July, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: 14 pages, 8 figures

    Journal ref: Front. Med. 8:746307 (2021)

  43. arXiv:2011.02168  [pdf, other

    eess.AS

    Learning in your voice: Non-parallel voice conversion based on speaker consistency loss

    Authors: Yoohwan Kwon, Soo-Whan Chung, Hee-Soo Heo, Hong-Goo Kang

    Abstract: In this paper, we propose a novel voice conversion strategy to resolve the mismatch between the training and conversion scenarios when parallel speech corpus is unavailable for training. Based on auto-encoder and disentanglement frameworks, we design the proposed model to extract identity and content representations while reconstructing the input speech signal itself. Since we use other speaker's… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: ICASSP 2021 submitted

  44. arXiv:2010.11433  [pdf, other

    eess.AS cs.SD

    Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the pr… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figure, 4 tables

  45. arXiv:2010.11408  [pdf, ps, other

    eess.AS cs.SD

    Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

    Authors: Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

    Abstract: This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted in INTERSPEECH 2020

  46. arXiv:2008.12912  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Attention Based Ultra Lightweight Image Super-Resolution

    Authors: Abdul Muqeet, Jiwon Hwang, Subin Yang, Jung Heum Kang, Yongwoo Kim, Sung-Ho Bae

    Abstract: Lightweight image super-resolution (SR) networks have the utmost significance for real-world applications. There are several deep learning based SR methods with remarkable performance, but their memory and computational cost are hindrances in practical usage. To tackle this problem, we propose a Multi-Attentive Feature Fusion Super-Resolution Network (MAFFSRN). MAFFSRN consists of proposed feature… ▽ More

    Submitted 21 September, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

    Comments: ECCVW AIM2020

  47. Disentangled speaker and nuisance attribute embedding for robust speaker verification

    Authors: Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

    Abstract: Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states)… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

    Comments: Accepted in IEEE Access

  48. arXiv:2008.01698  [pdf, other

    eess.AS cs.SD

    MIRNet: Learning multiple identities representations in overlapped speech

    Authors: Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

    Abstract: Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speak… ▽ More

    Submitted 6 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted in Interspeech 2020

  49. arXiv:2008.01348  [pdf, other

    eess.AS cs.SD

    Intra-class variation reduction of speaker representation in disentanglement framework

    Authors: Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang

    Abstract: In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing solely speakercharacteristic information in order to be robust in terms of intra-speaker variations. By modifying the network architecture togenerate both speaker-re… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: Accepted for INTERSPEECH 2020

  50. Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

    Authors: Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo Kim

    Abstract: For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end with… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: 7 pages, 3 figures

    Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI} 2020