Skip to main content

Showing 1–50 of 431 results for author: Lee, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.17310  [pdf, other

    eess.AS

    High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

    Authors: Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

    Abstract: We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  2. Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification

    Authors: Benjamin Hou, Sung-Won Lee, Jung-Min Lee, Christopher Koh, **g Xiao, Perry J. Pickhardt, Ronald M. Summers

    Abstract: Purpose: To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and ovarian cancer. Materials and Methods: This retrospective study included contrast-enhanced and non-contrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, N… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  3. arXiv:2406.14372  [pdf, ps, other

    eess.SY

    Ring-LWE based encrypted controller with unlimited number of recursive multiplications and effect of error growth

    Authors: Yeongjun Jang, Joowon Lee, Seonhong Min, Hyesun Kwak, Junsoo Kim, Yongsoo Song

    Abstract: In this paper, we propose a method to encrypt linear dynamic controllers that enables an unlimited number of recursive homomorphic multiplications on a Ring Learning With Errors (Ring-LWE) based cryptosystem without bootstrap**. Unlike LWE based schemes, where a scalar error is injected during encryption for security, Ring-LWE based schemes are based on polynomial rings and inject error as a pol… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 3 figures

  4. arXiv:2406.13935  [pdf, other

    eess.AS cs.AI cs.SD

    CONMOD: Controllable Neural Frame-based Modulation Effects

    Authors: Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam

    Abstract: Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single blac… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2406.08644  [pdf, other

    eess.SP cs.AI cs.SD eess.AS

    Toward Fully-End-to-End Listened Speech Decoding from EEG Signals

    Authors: Jihwan Lee, Aditya Kommineni, Tiantian Feng, Kleanthis Avramidis, Xuan Shi, Sudarsana Kadiri, Shrikanth Narayanan

    Abstract: Speech decoding from EEG signals is a challenging task, where brain activity is modeled to estimate salient characteristics of acoustic stimuli. We propose FESDE, a novel framework for Fully-End-to-end Speech Decoding from EEG signals. Our approach aims to directly reconstruct listened speech waveforms given EEG signals, where no intermediate acoustic feature processing step is required. The propo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: accepted to Interspeech2024

  7. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  8. arXiv:2406.06111  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

    Authors: Hyunjae Cho, Junhyeok Lee, Wonbin Jung

    Abstract: Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent alia… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  9. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  10. arXiv:2406.00669  [pdf

    eess.SY econ.GN

    Multi-technology co-optimization approach for sustainable hydrogen and electricity supply chains considering variability and demand scale

    Authors: Sunwoo Kim, Joungho Park, Jay H. Lee

    Abstract: In the pursuit of a carbon-neutral future, hydrogen emerges as a pivotal element, serving as a carbon-free energy carrier and feedstock. As efforts to decarbonize sectors such as heating and transportation intensify, understanding and navigating through the dynamics of hydrogen demand expansion becomes critical. Transitioning to hydrogen economy is complicated by varying regional scales and types… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  11. arXiv:2406.00665  [pdf

    econ.GN eess.SY

    Integrating solid direct air capture systems with green hydrogen production: Economic synergy of sector coupling

    Authors: Sunwoo Kim, Joungho Park, Jay H. Lee

    Abstract: In the global pursuit of sustainable energy solutions, mitigating carbon dioxide (CO2) emissions stands as a pivotal challenge. With escalating atmospheric CO2 levels, the imperative of direct air capture (DAC) systems becomes evident. Simultaneously, green hydrogen (GH) emerges as a pivotal medium for renewable energy. Nevertheless, the substantial expenses associated with these technologies impe… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  12. arXiv:2405.19685  [pdf

    eess.IV

    Identifying Functional Brain Networks of Spatiotemporal Wide-Field Calcium Imaging Data via a Long Short-Term Memory Autoencoder

    Authors: Xiaohui Zhang, Eric C Landsness, Lindsey M Brier, Wei Chen, Michelle J. Tang, Hanyang Miao, **-Moo Lee, Mark A. Anastasio, Joseph P. Culver

    Abstract: Wide-field calcium imaging (WFCI) that records neural calcium dynamics allows for identification of functional brain networks (FBNs) in mice that express genetically encoded calcium indicators. Estimating FBNs from WFCI data is commonly achieved by use of seed-based correlation (SBC) analysis and independent component analysis (ICA). These two methods are conceptually distinct and each possesses l… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  13. arXiv:2405.01792  [pdf, other

    cs.RO cs.LG eess.SY

    Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots

    Authors: Joonho Lee, Marko Bjelonic, Alexander Reske, Lorenz Wellhausen, Takahiro Miki, Marco Hutter

    Abstract: Autonomous wheeled-legged robots have the potential to transform logistics systems, improving operational efficiency and adaptability in urban environments. Navigating urban environments, however, poses unique challenges for robots, necessitating innovative solutions for locomotion and navigation. These challenges include the need for adaptive locomotion across varied terrains and the ability to n… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Journal ref: Science Robotics, 2024, Vol 9, Issue 89

  14. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Su** Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  15. arXiv:2404.19167  [pdf

    eess.IV physics.med-ph

    Advancing low-field MRI with a universal denoising imaging transformer: Towards fast and high-quality imaging

    Authors: Zheren Zhu, Azaan Rehman, Xiaozhi Cao, Congyu Liao, Yoo ** Lee, Michael Ohliger, Hui Xue, Yang Yang

    Abstract: Recent developments in low-field (LF) magnetic resonance imaging (MRI) systems present remarkable opportunities for affordable and widespread MRI access. A robust denoising method to overcome the intrinsic low signal-noise-ratio (SNR) barrier is critical to the success of LF MRI. However, current data-driven MRI denoising methods predominantly handle magnitude images and rely on customized models… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  16. arXiv:2404.17667  [pdf, other

    eess.SP cs.LG

    SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals

    Authors: Cheng Ding, Zhicheng Guo, Zhaoliang Chen, Randall J Lee, Cynthia Rudin, Xiao Hu

    Abstract: Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for develo** foundation models for phys… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  17. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  18. arXiv:2404.15533  [pdf, other

    eess.SY

    Designing, simulating, and performing the 100-AV field test for the CIRCLES consortium: Methodology and Implementation of the Largest mobile traffic control experiment to date

    Authors: Mostafa Ameli, Sean Mcquade, Jonathan W. Lee, Matthew Bunting, Matthew Nice, Han Wang, William Barbour, Ryan Weightman, Chris Denaro, Ryan Delorenzo, Sharon Hornstein, Jon F. Davis, Dan Timsit, Riley Wagner, Rita Xu, Malaika Mahmood, Mikail Mahmood, Maria Laura Delle Monache, Benjamin Seibold, Daniel B. Work, Jonathan Sprinkle, Benedetto Piccoli, Alexandre M. Bayen

    Abstract: Previous controlled experiments on single-lane ring roads have shown that a single partially autonomous vehicle (AV) can effectively mitigate traffic waves. This naturally prompts the question of how these findings can be generalized to field operational, high-density traffic conditions. To address this question, the Congestion Impacts Reduction via CAV-in-the-loop Lagrangian Energy Smoothing (CIR… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  19. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  20. arXiv:2404.13569  [pdf, other

    cs.SD eess.AS

    Musical Word Embedding for Music Tagging and Retrieval

    Authors: SeungHeon Doh, Jongpil Lee, Dasaem Jeong, Juhan Nam

    Abstract: Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks. To address this issue, we propose a new approach ca… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  21. arXiv:2404.10965  [pdf, other

    eess.IV

    IMIL: Interactive Medical Image Learning Framework

    Authors: Adrit Rao, Andrea Fisher, Ken Chang, John Christopher Panagides, Katherine McNamara, Joon-Young Lee, Oliver Aalami

    Abstract: Data augmentations are widely used in training medical image deep learning models to increase the diversity and size of sparse datasets. However, commonly used augmentation techniques can result in loss of clinically relevant information from medical images, leading to incorrect predictions at inference time. We propose the Interactive Medical Image Learning (IMIL) framework, a novel approach for… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop on Domain adaptation, Explainability and Fairness in AI for Medical Image Analysis (DEF-AI-MIA)

  22. arXiv:2404.09621  [pdf, other

    eess.SY cs.ET cs.HC cs.RO

    AAM-VDT: Vehicle Digital Twin for Tele-Operations in Advanced Air Mobility

    Authors: Tuan Anh Nguyen, Taeho Kwag, Vinh Pham, Viet Nghia Nguyen, Jeongseok Hyun, Minseok Jang, Jae-Woo Lee

    Abstract: This study advanced tele-operations in Advanced Air Mobility (AAM) through the creation of a Vehicle Digital Twin (VDT) system for eVTOL aircraft, tailored to enhance remote control safety and efficiency, especially for Beyond Visual Line of Sight (BVLOS) operations. By synergizing digital twin technology with immersive Virtual Reality (VR) interfaces, we notably elevate situational awareness and… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.07217  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    Attention-aware Semantic Communications for Collaborative Inference

    Authors: Jiwoong Im, Nayoung Kwon, Taewoo Park, Jiheon Woo, Jaeho Lee, Yongjune Kim

    Abstract: We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. There… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 February, 2024; originally announced April 2024.

  24. arXiv:2404.05832  [pdf, other

    cs.HC eess.SY

    Human-Machine Interaction in Automated Vehicles: Reducing Voluntary Driver Intervention

    Authors: Xinzhi Zhong, Yang Zhou, Varshini Kamaraj, Zhenhao Zhou, Wissam Kontar, Dan Negrut, John D. Lee, Soyoung Ahn

    Abstract: This paper develops a novel car-following control method to reduce voluntary driver interventions and improve traffic stability in Automated Vehicles (AVs). Through a combination of experimental and empirical analysis, we show how voluntary driver interventions can instigate substantial traffic disturbances that are amplified along the traffic upstream. Motivated by these findings, we present a fr… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  25. arXiv:2404.03188  [pdf

    eess.IV cs.CV cs.LG

    Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture

    Authors: W. S. H. M. W. Ahmad, M. F. A. Fauzi, M. K. Abdullahi, Jenny T. H. Lee, N. S. A. Basry, A Yahaya, A. M. Ismail, A. Adam, Elaine W. L. Chan, F. S. Abas

    Abstract: Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This article has been accepted in the Journal of Engineering Science and Technology (JESTEC) and awaiting publication

  26. arXiv:2404.02574  [pdf, ps, other

    eess.SY

    Learning with errors based dynamic encryption that discloses residue signal for anomaly detection

    Authors: Yeongjun Jang, Joowon Lee, Junsoo Kim, Hyungbo Shim

    Abstract: Anomaly detection is a protocol that detects integrity attacks on control systems by comparing the residue signal with a threshold. Implementing anomaly detection on encrypted control systems has been a challenge because it is hard to detect an anomaly from the encrypted residue signal without the secret key. In this paper, we propose a dynamic encryption scheme for a linear system that automatica… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 7 pages, 1 figure

  27. arXiv:2403.18707  [pdf, other

    math.OC eess.SY

    Connections between Reachability and Time Optimality

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee, Chang-Hun Lee

    Abstract: This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Submitted to Automatica

  28. arXiv:2403.17508  [pdf, other

    cs.SD eess.AS

    Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant

    Authors: Modan Tailleur, Junwon Lee, Mathieu Lagrange, Keunwoo Choi, Laurie M. Heller, Keisuke Imoto, Yuki Okamoto

    Abstract: This paper explores whether considering alternative domain-specific embeddings to calculate the Fréchet Audio Distance (FAD) metric can help the FAD to correlate better with perceptual ratings of environmental sounds. We used embeddings from VGGish, PANNs, MS-CLAP, L-CLAP, and MERT, which are tailored for either music or environmental sound evaluation. The FAD scores were calculated for sounds fro… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  29. arXiv:2403.11417  [pdf, ps, other

    eess.SP

    Positioning Using Wireless Networks: Applications, Recent Progress and Future Challenges

    Authors: Yang Yang, Mingzhe Chen, Yufei Blankenship, Jemin Lee, Zabih Ghassemlooy, Julian Cheng, Shiwen Mao

    Abstract: Positioning has recently received considerable attention as a key enabler in emerging applications such as extended reality, unmanned aerial vehicles and smart environments. These applications require both data communication and high-precision positioning, and thus they are particularly well-suited to be offered in wireless networks (WNs). The purpose of this paper is to provide a comprehensive ov… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  30. arXiv:2403.10786  [pdf, other

    eess.IV cs.CV cs.LG

    ContourDiff: Unpaired Image Translation with Contour-Guided Diffusion Models

    Authors: Yuwen Chen, Nicholas Konz, Hanxue Gu, Haoyu Dong, Yaqian Chen, Lin Li, Jisoo Lee, Maciej A. Mazurowski

    Abstract: Accurately translating medical images across different modalities (e.g., CT to MRI) has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tas… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Code will be released on GitHub

  31. arXiv:2403.09083  [pdf, other

    eess.SP

    Asymptotically Near-Optimal Hybrid Beamforming for mmWave IRS-Aided MIMO Systems

    Authors: Jeongjae Lee, Songnam Hong

    Abstract: Hybrid beamforming is an emerging technology for massive multiple-input multiple-output (MIMO) systems due to the advantages of lower complexity, cost, and power consumption. Recently, intelligent reflection surface (IRS) has been proposed as the cost-effective technique for robust millimeter-wave (mmWave) MIMO systems. Thus, it is required to jointly optimize a reflection vector and hybrid beamfo… ▽ More

    Submitted 24 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE Transactions on Vehicular Technology

  32. arXiv:2403.05136  [pdf, other

    cs.RO eess.SP

    DeRO: Dead Reckoning Based on Radar Odometry With Accelerometers Aided for Robot Localization

    Authors: Hoang Viet Do, Yong Hun Kim, Joo Han Lee, Min Ho Lee, ** Woo Song

    Abstract: In this paper, we propose a radar odometry structure that directly utilizes radar velocity measurements for dead reckoning while maintaining its ability to update estimations within the Kalman filter framework. Specifically, we employ the Doppler velocity obtained by a 4D Frequency Modulated Continuous Wave (FMCW) radar in conjunction with gyroscope data to calculate poses. This approach helps mit… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 9 pages, 5 figures, 1 table, conference

    ACM Class: I.2.9

  33. arXiv:2402.18923  [pdf, other

    cs.CL cs.SD eess.AS

    Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition

    Authors: Jeehyun Lee, Yerin Choi, Tae-** Song, Myoung-Wan Koo

    Abstract: Dysarthria, a common issue among stroke patients, severely impacts speech intelligibility. Inappropriate pauses are crucial indicators in severity assessment and speech-language therapy. We propose to extend a large-scale speech recognition model for inappropriate pause detection in dysarthric speech. To this end, we propose task design, labeling strategy, and a speech recognition model with an in… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024

  34. arXiv:2402.17778  [pdf, other

    eess.SP cs.AI eess.SY

    Dynamic Anchor Selection and Real-Time Pose Prediction for Ultra-wideband Tagless Gate

    Authors: Junyoung Choi, Sagnik Bhattacharya, Joohyun Lee

    Abstract: Ultra-wideband (UWB) is emerging as a promising solution that can realize proximity services, such as UWB tagless gate (UTG), thanks to centimeter-level localization accuracy based on two different ranging methods such as downlink time-difference of arrival (DL-TDoA) and double-sided two-way ranging (DS-TWR). The UTG is a UWB-based proximity service that provides a seamless gate pass system withou… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.08399

  35. arXiv:2402.17050  [pdf, other

    eess.SY cs.RO

    Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

    Authors: Kathy Jang, Nathan Lichtlé, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. Bayen

    Abstract: In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with develo** RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their app… ▽ More

    Submitted 14 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  36. arXiv:2402.17043  [pdf, other

    eess.SY

    Traffic Control via Connected and Automated Vehicles: An Open-Road Field Experiment with 100 CAVs

    Authors: Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng , et al. (39 additional authors not shown)

    Abstract: The CIRCLES project aims to reduce instabilities in traffic flow, which are naturally occurring phenomena due to human driving behavior. These "phantom jams" or "stop-and-go waves,"are a significant source of wasted energy. Toward this goal, the CIRCLES project designed a control system referred to as the MegaController by the CIRCLES team, that could be deployed in real traffic. Our field experim… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  37. arXiv:2402.16993  [pdf, other

    eess.SY

    Hierarchical Speed Planner for Automated Vehicles: A Framework for Lagrangian Variable Speed Limit in Mixed Autonomy Traffic

    Authors: Han Wang, Zhe Fu, Jonathan Lee, Hossein Nick Zinat Matin, Arwa Alanqary, Daniel Urieli, Sharon Hornstein, Abdul Rahman Kreidieh, Raphael Chekroun, William Barbour, William A. Richardson, Dan Work, Benedetto Piccoli, Benjamin Seibold, Jonathan Sprinkle, Alexandre M. Bayen, Maria Laura Delle Monache

    Abstract: This paper introduces a novel control framework for Lagrangian variable speed limits in hybrid traffic flow environments utilizing automated vehicles (AVs). The framework was validated using a fleet of 100 connected automated vehicles as part of the largest coordinated open-road test designed to smooth traffic flow. The framework includes two main components: a high-level controller deployed on th… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  38. arXiv:2402.11656  [pdf, ps, other

    cs.IT cs.CL cs.LG eess.SP

    Integrating Pre-Trained Language Model with Physical Layer Communications

    Authors: Ju-Hyung Lee, Dong-Ho Lee, Joohan Lee, Jay Pujara

    Abstract: The burgeoning field of on-device AI communication, where devices exchange information directly through embedded foundation models, such as language models (LMs), requires robust, efficient, and generalizable communication frameworks. However, integrating these frameworks with existing wireless systems and effectively managing noise and bit errors pose significant challenges. In this work, we intr… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  39. arXiv:2402.10515  [pdf, other

    eess.SP cs.AI

    Power-Efficient Indoor Localization Using Adaptive Channel-aware Ultra-wideband DL-TDOA

    Authors: Sagnik Bhattacharya, Junyoung Choi, Joohyun Lee

    Abstract: Among the various Ultra-wideband (UWB) ranging methods, the absence of uplink communication or centralized computation makes downlink time-difference-of-arrival (DL-TDOA) localization the most suitable for large-scale industrial deployments. However, temporary or permanent obstacles in the deployment region often lead to non-line-of-sight (NLOS) channel path and signal outage effects, which result… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Journal ref: IEEE GLOBECOM 2023

  40. arXiv:2402.05965  [pdf, other

    cs.LG eess.SP

    Hybrid Neural Representations for Spherical Data

    Authors: Hyomin Kim, Yunhui Jang, Jaeho Lee, Sungsoo Ahn

    Abstract: In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of h… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 13 pages, 8 figures

  41. arXiv:2402.05706  [pdf, other

    cs.CL cs.SD eess.AS

    Unified Speech-Text Pretraining for Spoken Dialog Modeling

    Authors: Heeseung Kim, Soonshin Seo, Kyeongseok Jeong, Ohsung Kwon, Jungwhan Kim, Jaehong Lee, Eunwoo Song, Myungwoo Oh, Sungroh Yoon, Kang Min Yoo

    Abstract: While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  42. arXiv:2402.03399  [pdf, other

    eess.IV cs.CV

    Rethinking RGB Color Representation for Image Restoration Models

    Authors: Jaerin Lee, JoonKyu Park, Sungyong Baik, Kyoung Mu Lee

    Abstract: Image restoration models are typically trained with a pixel-wise distance loss defined over the RGB color representation space, which is well known to be a source of blurry and unrealistic textures in the restored images. The reason, we believe, is that the three-channel RGB space is insufficient for supervising the restoration models. To this end, we augment the representation to hold structural… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 31 pages (11 pages main manuscript + 20 pages appendices), 22 figures

  43. arXiv:2401.17690  [pdf, other

    eess.AS cs.AI cs.SD

    EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

    Authors: Jaeyeon Kim, Jaeyoon Jung, **joo Lee, Sang Hoon Woo

    Abstract: We propose EnCLAP, a novel framework for automated audio captioning. EnCLAP employs two acoustic representation models, EnCodec and CLAP, along with a pretrained language model, BART. We also introduce a new training objective called masked codec modeling that improves acoustic awareness of the pretrained language model. Experimental results on AudioCaps and Clotho demonstrate that our model surpa… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  44. arXiv:2401.15938  [pdf, other

    cs.CV eess.SY

    Motion-induced error reduction for high-speed dynamic digital fringe projection system

    Authors: Sanghoon Jeon, Hyo-Geon Lee, Jae-Sung Lee, Bo-Min Kang, Byung-Wook Jeon, Jun Young Yoon, Jae-Sang Hyun

    Abstract: In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, whic… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 9 pages, 7 figures

  45. arXiv:2401.14304  [pdf, other

    eess.SY

    Constraint-Aware Mesh Refinement Method by Reachability Set Envelope of Curvature Bounded Paths

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee

    Abstract: This paper presents an enhanced direct-method-based approach for the real-time solution of optimal control problems to handle path constraints, such as obstacles. The principal contributions of this work are twofold: first, the existing methods for constructing reachability sets in the literature are extended to derive the envelope of these sets, which determines the region swept by all feasible t… ▽ More

    Submitted 4 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Preprint submitted to Automatica

  46. arXiv:2401.13936  [pdf, ps, other

    eess.SY

    Learning-based sensing and computing decision for data freshness in edge computing-enabled networks

    Authors: Sinwoong Yun, Dongsun Kim, Chanwon Park, Jemin Lee

    Abstract: As the demand on artificial intelligence (AI)-based applications increases, the freshness of sensed data becomes crucial in the wireless sensor networks. Since those applications require a large amount of computation for processing the sensed data, it is essential to offload the computation load to the edge computing (EC) server. In this paper, we propose the sensing and computing decision (SCD) a… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 15 pages

  47. Machine learning for industrial sensing and control: A survey and practical perspective

    Authors: Nathan P. Lawrence, Seshu Kumar Damarla, Jong Woo Kim, Aditya Tulsyan, Faraz Amjad, Kai Wang, Benoit Chachuat, Jong Min Lee, Biao Huang, R. Bhushan Gopaluni

    Abstract: With the rise of deep learning, there has been renewed interest within the process industries to utilize data on large-scale nonlinear sensing and control problems. We identify key statistical and machine learning techniques that have seen practical success in the process industries. To do so, we start with hybrid modeling to provide a methodological framework underlying core application areas: so… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 48 pages

    Journal ref: Control Engineering Practice 2024

  48. arXiv:2401.13146  [pdf, other

    eess.AS cs.CL cs.SD

    Locality enhanced dynamic biasing and sampling strategies for contextual ASR

    Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  49. arXiv:2401.12987  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation

    Authors: Taeyang Yun, Hyunkuk Lim, Jeonghwan Lee, Min Song

    Abstract: Emotion Recognition in Conversation (ERC) plays a crucial role in enabling dialogue systems to effectively respond to user requests. The emotions in a conversation can be identified by the representations from various modalities, such as audio, visual, and text. However, due to the weak contribution of non-verbal modalities to recognize emotions, multimodal ERC has always been considered a challen… ▽ More

    Submitted 31 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: NAACL 2024 main conference

  50. arXiv:2401.12974  [pdf, other

    eess.IV cs.CV q-bio.QM

    SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI

    Authors: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski

    Abstract: Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment pla… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 15 figures