Skip to main content

Showing 1–50 of 359 results for author: Chung

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17329  [pdf, other

    eess.SP cs.SD eess.AS physics.bio-ph

    Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator

    Authors: Woo-** Chung, Hong-Goo Kang

    Abstract: We present a novel speaker-independent acoustic-to-articulatory inversion (AAI) model, overcoming the limitations observed in conventional AAI models that rely on acoustic features derived from restricted datasets. To address these challenges, we leverage representations from a pre-trained self-supervised learning (SSL) model to more effectively estimate the global, local, and kinematic pattern in… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  2. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  3. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  4. arXiv:2406.14559  [pdf, other

    cs.SD eess.AS

    Disentangled Representation Learning for Environment-agnostic Speaker Recognition

    Authors: KiHyun Nam, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung

    Abstract: This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components related to the speaker and other residual information. We employ a group of objective functions to ensure that the auto-encoder's code representation -… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024. The official webpage can be found at https://mm.kaist.ac.kr/projects/voxceleb-disentangler/

  5. arXiv:2406.12688  [pdf, other

    eess.AS eess.SP

    Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation

    Authors: Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi

    Abstract: This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting the acoustic scene behind speech signals to desired environments. We propose AST-LDM for the AST task, which generates speech signals accompanied by… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2406.12632  [pdf, other

    eess.IV cs.CV

    Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

    Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

    Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

  7. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  8. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  9. arXiv:2406.09286  [pdf, other

    eess.AS cs.SD

    FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching

    Authors: Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung

    Abstract: This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by slow inference speeds and computational complexity. To address this issue, we present FlowAVSE which enhances the inference speed and reduces the numbe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  10. arXiv:2406.05339  [pdf, other

    eess.AS cs.AI

    To what extent can ASV systems naturally defend against spoofing attacks?

    Authors: Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-** Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung

    Abstract: The current automatic speaker verification (ASV) task involves making binary decisions on two types of trials: target and non-target. However, emerging advancements in speech generation technology pose significant threats to the reliability of ASV systems. This study investigates whether ASV effortlessly acquires robustness against spoofing attacks (i.e., zero-shot capability) by systematically ex… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 3 tables, Interspeech 2024

  11. arXiv:2406.03344  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

    Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

    Abstract: Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision task… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/mhamzaerol/Audio-Mamba-AuM

  12. arXiv:2405.10272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS eess.IV

    Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

    Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

    Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  13. arXiv:2405.09787  [pdf, other

    eess.IV cs.CV cs.LG

    Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

    Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

    Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 16 pages, 11 tables, 10 figures, MICCAI

  14. arXiv:2405.04225  [pdf, other

    eess.SY eess.IV eess.SP

    Long-term usage of the off-grid photovoltaic system with lithium-ion battery-based energy storage system on high mountains: A case study in Payiun Lodge on Mt. Jade in Taiwan

    Authors: Hsien-Ching Chung

    Abstract: Energy supply on high mountains remains an open issue since grid connection is unavailable. In the past, diesel generators with lead-acid battery energy storage systems (ESSs) are applied in most cases. Recently, photovoltaic (PV) system with lithium-ion (Li-ion) battery ESS is an appropriate method for solving this problem in a greener way. In 2016, an off-grid PV system with Li-ion battery ESS h… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 32 pages, 14 figures, 4 tables

    Journal ref: Batteries 10 (2024) 202

  15. arXiv:2404.15009  [pdf, other

    cs.CV eess.IV

    The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar , et al. (51 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.17033

  16. arXiv:2404.08212   

    eess.SP

    Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography

    Authors: Hika Barki, Wan-Young Chung

    Abstract: Mental stress is a prevalent condition that can have negative impacts on one's health. Early detection and treatment are crucial for preventing related illnesses and maintaining overall wellness. This study presents a new method for identifying mental stress using a wearable biosensor worn in the ear. Data was gathered from 14 participants in a controlled environment using stress-inducing tasks su… ▽ More

    Submitted 13 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: The paper is being withdrawn because we have identified substantial issues with the data analysis process. To ensure the integrity and accuracy of our findings, we are re-evaluating the data and will resubmit the paper after thorough revisions

  17. arXiv:2404.02781  [pdf, other

    eess.AS cs.SD

    CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech

    Authors: Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho

    Abstract: With the emergence of neural audio codecs, which encode multiple streams of discrete tokens from audio, large language models have recently gained attention as a promising approach for zero-shot Text-to-Speech (TTS) synthesis. Despite the ongoing rush towards scaling paradigms, audio tokenization ironically amplifies the scalability challenge, stemming from its long sequence length and the complex… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  18. arXiv:2403.19180  [pdf

    eess.SP cs.ET

    A Robust UWOC-assisted Multi-hop Topology for Underwater Sensor Network Nodes

    Authors: Maaz Salman, Javad Bolboli, Wan-Young Chung

    Abstract: Underwater environment is substantially less explored territory as compared to earth surface due to lack of robust underwater communication infrastructure. For Internet of Underwater things connectivity, underwater wireless optical communication can play a vital role, compared to conventional radio frequency communication, due to longer range, high data rate, low latency, and unregulated bandwidth… ▽ More

    Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  19. arXiv:2403.18052  [pdf, other

    astro-ph.IM cs.LG eess.IV eess.SP

    R2D2 image reconstruction with model uncertainty quantification in radio astronomy

    Authors: Amir Aghabiglou, Chung San Chu, Arwa Dabbech, Yves Wiaux

    Abstract: The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we inve… ▽ More

    Submitted 27 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE EUSIPCO 2024

  20. arXiv:2403.17905  [pdf, other

    eess.IV cs.CV cs.LG eess.SP

    Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2

    Authors: Yiwei Chen, Chao Tang, Amir Aghabiglou, Chung San Chu, Yves Wiaux

    Abstract: We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limita… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE EUSIPCO 2024

  21. arXiv:2403.16377  [pdf, other

    cs.LG eess.SY stat.ML

    Real-time Adaptation for Condition Monitoring Signal Prediction using Label-aware Neural Processes

    Authors: Seokhyun Chung, Raed Al Kontar

    Abstract: Building a predictive model that rapidly adapts to real-time condition monitoring (CM) signals is critical for engineering systems/units. Unfortunately, many current methods suffer from a trade-off between representation power and agility in online settings. For instance, parametric methods that assume an underlying functional form for CM signals facilitate efficient online prediction updates. How… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  22. arXiv:2403.13288  [pdf, other

    eess.SY

    Observer-Based Environment Robust Control Barrier Functions for Safety-critical Control with Dynamic Obstacles

    Authors: Ying Shuai Quan, Jian Zhou, Erik Frisk, Chung Choo Chung

    Abstract: This paper proposes a safety-critical controller for dynamic and uncertain environments, leveraging a robust environment control barrier function (ECBF) to enhance the robustness against the measurement and prediction uncertainties associated with moving obstacles. The approach reduces conservatism, compared with a worst-case uncertainty approach, by incorporating a state observer for obstacles in… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  23. A GNN Approach for Cell-Free Massive MIMO

    Authors: Lou Salaun, Hong Yang, Shashwat Mishra, Chung Shue Chen

    Abstract: Beyond 5G wireless technology Cell-Free Massive MIMO (CFmMIMO) downlink relies on carefully designed precoders and power control to attain uniformly high rate coverage. Many such power control problems can be calculated via second order cone programming (SOCP). In practice, several order of magnitude faster numerical procedure is required because power control has to be rapidly updated to adapt to… ▽ More

    Submitted 8 February, 2024; originally announced March 2024.

    Journal ref: GLOBECOM 2022 - 2022 IEEE Global Communications Conference, Dec 2022, Rio de Janeiro, France. pp.3053-3058

  24. arXiv:2403.08337  [pdf, other

    eess.SY cs.AI cs.LG

    LLM-Assisted Light: Leveraging Large Language Model Capabilities for Human-Mimetic Traffic Signal Control in Complex Urban Environments

    Authors: Maonan Wang, Aoyu Pang, Yuheng Kan, Man-On Pun, Chung Shue Chen, Bo Huang

    Abstract: Traffic congestion in metropolitan areas presents a formidable challenge with far-reaching economic, environmental, and societal ramifications. Therefore, effective congestion management is imperative, with traffic signal control (TSC) systems being pivotal in this endeavor. Conventional TSC systems, designed upon rule-based algorithms or reinforcement learning (RL), frequently exhibit deficiencie… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 20 pages, 11 figures

  25. arXiv:2402.17127  [pdf, other

    cs.SD eess.AS

    Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

    Authors: Taein Kang, Soyul Han, Sunmook Choi, Jae** Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

    Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 5 pages

    MSC Class: 00A71 ACM Class: I.2.6

  26. arXiv:2402.15604  [pdf, other

    cs.RO eess.SY

    Goal-Reaching Trajectory Design Near Danger with Piecewise Affine Reach-avoid Computation

    Authors: Long Kiu Chung, Wonsuhk Jung, Chuizheng Kong, Shreyas Kousik

    Abstract: Autonomous mobile robots must maintain safety, but should not sacrifice performance, leading to the classical reach-avoid problem: find a trajectory that is guaranteed to reach a goal and avoid obstacles. This paper addresses the near danger case, also known as a narrow gap, where the agent starts near the goal, but must navigate through tight obstacles that block its path. The proposed method bui… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: The first two authors contributed equally to the work. This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. arXiv:2402.15539  [pdf, ps, other

    eess.AS cs.CL

    Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards Automatic Assessment Systems

    Authors: Seonwoo Lee, Jihyun Mun, Sunhee Kim, Minhwa Chung

    Abstract: Despite the growing demand for digital therapeutics for children with Autism Spectrum Disorder (ASD), there is currently no speech corpus available for Korean children with ASD. This paper introduces a speech corpus specifically designed for Korean children with ASD, aiming to advance speech technologies such as pronunciation and severity evaluation. Speech recordings from speech and language eval… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 11 pages, Accepted for LREC-COLING 2024

  28. arXiv:2402.13236  [pdf, other

    eess.AS cs.SD

    Towards audio language modeling -- an overview

    Authors: Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-yi Lee

    Abstract: Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency. Researchers recently discovered the potential of codecs as suitable tokenizers for converting continuous audio into discrete codes, which can be employed to develop audio language models (LMs). Numerous high-performance neural audio codecs and codec-based LMs have been developed.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  29. arXiv:2402.13071  [pdf, other

    eess.AS cs.SD

    Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

    Authors: Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee

    Abstract: The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance. Recent years have witnessed significant developments in codec models. The ideal sound codec should preserve content, paralinguistics, speakers, and audio information. However, the question of which codec achieves optimal sound information preservation remains unanswere… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Github: https://github.com/voidful/Codec-SUPERB

  30. arXiv:2402.09679  [pdf, other

    cs.RO eess.SY

    Design and Visual Servoing Control of a Hybrid Dual-Segment Flexible Neurosurgical Robot for Intraventricular Biopsy

    Authors: Jian Chen, Mingcong Chen, Qingxiang Zhao, Shuai Wang, Yihe Wang, Ying Xiao, Jian Hu, Danny Tat Ming Chan, Kam Tong Leo Yeung, David Yuen Chung Chan, Hongbin Liu

    Abstract: Traditional rigid endoscopes have challenges in flexibly treating tumors located deep in the brain, and low operability and fixed viewing angles limit its development. This study introduces a novel dual-segment flexible robotic endoscope MicroNeuro, designed to perform biopsies with dexterous surgical manipulation deep in the brain. Taking into account the uncertainty of the control model, an imag… ▽ More

    Submitted 23 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2024, 7 pages, 9 figures

  31. arXiv:2402.03397  [pdf

    q-bio.QM eess.IV

    A Comprehensive Approach to Diagnosing Temporomandibular Joint Diseases: AI-driven TMD Diagnostic System

    Authors: Y. Gua, C. T. Kong, D. D Zhangc, Y. J Baid, J. K. H. Tsoia, Hua Huangc, Y. Q. Dengc, Y. M Zhue

    Abstract: AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  32. arXiv:2401.11429  [pdf, ps, other

    cs.IT eess.SP

    Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems

    Authors: Gyoseung Lee, Hyeongtaek Lee, Donghwan Kim, Jaehoon Chung, A. Lee. Swindlehurst, Junil Choi

    Abstract: This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization probl… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Wireless Communications

  33. arXiv:2401.10032  [pdf, other

    eess.AS cs.AI eess.SP

    FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

    Authors: Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated co… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  34. arXiv:2401.09294  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis

    Authors: Yoon** Chung, Junwon Lee, Juhan Nam

    Abstract: Foley sound, audio content inserted synchronously with videos, plays a critical role in the user experience of multimedia content. Recently, there has been active research in Foley sound synthesis, leveraging the advancements in deep generative models. However, such works mainly focus on replicating a single sound class or a textual sound description, neglecting temporal information, which is cruc… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  35. arXiv:2401.08415  [pdf, other

    cs.SD cs.LG eess.AS

    From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

    Authors: Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak

    Abstract: Transformers have become central to recent advances in audio classification. However, training an audio spectrogram transformer, e.g. AST, from scratch can be resource and time-intensive. Furthermore, the complexity of transformers heavily depends on the input audio spectrogram size. In this work, we aim to optimize AST training by linking to the resolution in the time-axis. We introduce multi-pha… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  36. arXiv:2401.07326  [pdf, other

    eess.IV cs.CV

    Beyond Traditional Approaches: Multi-Task Network for Breast Ultrasound Diagnosis

    Authors: Dat T. Chung, Minh-Anh Dang, Mai-Anh Vu, Minh T. Nguyen, Thanh-Huy Nguyen, Vinh Q. Dinh

    Abstract: Breast Ultrasound plays a vital role in cancer diagnosis as a non-invasive approach with cost-effective. In recent years, with the development of deep learning, many CNN-based approaches have been widely researched in both tumor localization and cancer classification tasks. Even though previous single models achieved great performance in both tasks, these methods have some limitations in inference… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 7 pages, 3 figures

  37. arXiv:2401.06966  [pdf, other

    eess.SP

    Near-Field Channel Estimation for XL-RIS Assisted Multi-User XL-MIMO Systems: Hybrid Beamforming Architectures

    Authors: Jeongjae Lee, Hyeong** Chung, Yunseong Cho, Sunwoo Kim, Songnam Hong

    Abstract: Channel estimation is one of the key challenges for the deployment of extremely large-scale reconfigurable intelligent surface (XL-RIS) assisted multiple-input multiple-output (MIMO) systems. In this paper, we study the channel estimation problem for XL-RIS assisted multi-user XL-MIMO systems with hybrid beamforming structures. For this system, we propose an {\em unified} channel estimation method… ▽ More

    Submitted 25 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: submitted to IEEE Transactions on Communications

  38. arXiv:2401.01608  [pdf

    eess.SP cs.NI math.OC

    Interference Management in 5G and Beyond Networks

    Authors: Nessrine Trabelsi, Lamia Chaari Fourati, Chung Shue Chen

    Abstract: During the last decade, wireless data services have had an incredible impact on people's lives in ways we could never have imagined. The number of mobile devices has increased exponentially and data traffic has almost doubled every year. Undoubtedly, the rate of growth will continue to be rapid with the explosive increase in demands for data rates, latency, massive connectivity, network reliabilit… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  39. arXiv:2401.00740  [pdf, other

    eess.IV cs.CV

    Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution

    Authors: Zeke Zexi Hu, Xiaoming Chen, Vera Yuk Ying Chung, Yiran Shen

    Abstract: The effective extraction of spatial-angular features plays a crucial role in light field image super-resolution (LFSR) tasks, and the introduction of convolution and Transformers leads to significant improvement in this area. Nevertheless, due to the large 4D data volume of light field images, many existing methods opted to decompose the data into a number of lower-dimensional subspaces and perfor… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  40. arXiv:2312.06498  [pdf, other

    cs.CE eess.SY

    Sustainability through Optimal Design of Buildings for Natural Ventilation using Updated Comfort and Occupancy Models

    Authors: Jihoon Chung, Nastaran Shahmansouri, Rhys Goldstein, James Stoddart, John Locke

    Abstract: This paper explores the benefits of incorporating natural ventilation (NV) simulation into a generative process of designing residential buildings to improve energy efficiency and indoor thermal comfort. Our proposed workflow uses the Wave Function Collapse algorithm to generate a diverse set of plausible floor plans. It also includes post-COVID occupant presence models while incorporating adaptiv… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 12 pages, 14 figures

  41. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  42. arXiv:2312.02669  [pdf, other

    physics.optics eess.IV

    Deep-learning-driven end-to-end metalens imaging

    Authors: Joonhyuk Seo, Jaegang Jo, Joohoon Kim, Joonho Kang, Chanik Kang, Seongwon Moon, Eunji Lee, Jehyeong Hong, Junsuk Rho, Haejun Chung

    Abstract: Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic ab… ▽ More

    Submitted 10 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 17 pages, 7 figures, 1 table

  43. arXiv:2311.10327  [pdf, other

    cs.RO eess.SY

    Dimensionality Reduction of Dynamics on Lie Manifolds via Structure-Aware Canonical Correlation Analysis

    Authors: Wooyoung Chung, Daniel Polani, Stas Tiomkin

    Abstract: Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  44. arXiv:2311.05600  [pdf, other

    cs.RO eess.SY

    FogROS2-Config: Optimizing Latency and Cost for Multi-Cloud Robot Applications

    Authors: Kaiyuan Chen, Kush Hari, Rohil Khare, Charlotte Le, Trinity Chung, Jaimyn Drake, Jeffrey Ichnowski, John Kubiatowicz, Ken Goldberg

    Abstract: Cloud service providers provide over 50,000 distinct and dynamically changing set of cloud server options. To help roboticists make cost-effective decisions, we present FogROS2-Config, an open toolkit that takes ROS2 nodes as input and automatically runs relevant benchmarks to quickly return a menu of cloud compute services that tradeoff latency and cost. Because it is infeasible to try every hard… ▽ More

    Submitted 13 May, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Published 2024 IEEE International Conference on Robotics and Automation (ICRA), Former name: FogROS2-Sky

  45. arXiv:2311.04066  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Can CLIP Help Sound Source Localization?

    Authors: Sooyoung Park, Arda Senocak, Joon Son Chung

    Abstract: Large-scale pre-trained image-text models demonstrate remarkable versatility across diverse tasks, benefiting from their robust representational capabilities and effective multimodal alignment. We extend the application of these models, specifically CLIP, to the domain of sound source localization. Unlike conventional approaches, we employ the pre-trained CLIP model without explicit text input, re… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  46. arXiv:2311.02838  [pdf, other

    stat.ML cs.LG eess.SP

    Barron Space for Graph Convolution Neural Networks

    Authors: Seok-Young Chung, Qiyu Sun

    Abstract: Graph convolutional neural network (GCNN) operates on graph domain and it has achieved a superior performance to accomplish a wide range of tasks. In this paper, we introduce a Barron space of functions on a compact domain of graph signals. We prove that the proposed Barron space is a reproducing kernel Banach space, it can be decomposed into the union of a family of reproducing kernel Hilbert spa… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  47. arXiv:2311.00364  [pdf, other

    eess.AS cs.SD physics.bio-ph

    C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

    Authors: Woo-** Chung, Miseul Kim, Hong-Goo Kang

    Abstract: This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through exper… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 1st place winning paper from the BHI 2023 Data Challenge Competition: Sensor Informatics

  48. arXiv:2310.19581  [pdf, other

    eess.AS cs.CV cs.SD

    Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

    Authors: Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung

    Abstract: The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Project page with demo: https://mm.kaist.ac.kr/projects/avdiffuss/

  49. arXiv:2310.13218  [pdf, other

    eess.SY

    Deep Reinforcement Learning-Enabled Adaptive Forecasting-Aided State Estimation in Distribution Systems with Multi-Source Multi-Rate Data

    Authors: Ying Zhang, Junbo Zhao, Di Shi, Sungjoo Chung

    Abstract: Distribution system state estimation (DSSE) is paramount for effective state monitoring and control. However, stochastic outputs of renewables and asynchronous streaming of multi-rate measurements in practical systems largely degrade the estimation performance. This paper proposes a deep reinforcement learning (DRL)-enabled adaptive DSSE algorithm in unbalanced distribution systems, which tackles… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted by 2024 IEEE PES Innovative Smart Grid Technologies Conference

  50. arXiv:2310.10214  [pdf, other

    eess.SY

    K-SMPC: Koopman Operator-Based Stochastic Model Predictive Control for Enhanced Lateral Control of Autonomous Vehicles

    Authors: ** Sung Kim, Ying Shuai Quan, Chung Choo Chung

    Abstract: This paper proposes Koopman operator-based Stochastic Model Predictive Control (K-SMPC) for enhanced lateral control of autonomous vehicles. The Koopman operator is a linear map representing the nonlinear dynamics in an infinite-dimensional space. Thus, we use the Koopman operator to represent the nonlinear dynamics of a vehicle in dynamic lane-kee** situations. The Extended Dynamic Mode Decompo… ▽ More

    Submitted 9 December, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 13 pages, 12 figures