Skip to main content

Showing 1–50 of 298 results for author: Lee, S

Searching in archive eess. Search in all archives.
.
  1. Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification

    Authors: Benjamin Hou, Sung-Won Lee, Jung-Min Lee, Christopher Koh, **g Xiao, Perry J. Pickhardt, Ronald M. Summers

    Abstract: Purpose: To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and ovarian cancer. Materials and Methods: This retrospective study included contrast-enhanced and non-contrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, N… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.15487  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Text-To-Audio Models with Synthetic Captions

    Authors: Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

    Abstract: It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.13502  [pdf, other

    cs.CL cs.SD eess.AS

    ManWav: The First Manchu ASR Model

    Authors: Jean Seo, Minha Kang, Sungjoo Byun, Sangah Lee

    Abstract: This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ACL2024/Field Matters

  4. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2406.09286  [pdf, other

    eess.AS cs.SD

    FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching

    Authors: Chaeyoung Jung, Suyeon Lee, Ji-Hoon Kim, Joon Son Chung

    Abstract: This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is limited by slow inference speeds and computational complexity. To address this issue, we present FlowAVSE which enhances the inference speed and reduces the numbe… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  6. arXiv:2406.07923  [pdf, other

    cs.SD cs.AI eess.AS

    CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting

    Authors: Sichen **, Youngmoon Jung, Seung** Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho

    Abstract: This paper introduces a novel approach for streaming openvocabulary keyword spotting (KWS) with text-based keyword enrollment. For every input frame, the proposed method finds the optimal alignment ending at the frame using connectionist temporal classification (CTC) and aggregates the frame-level acoustic embedding (AE) to obtain higher-level (i.e., character, word, or phrase) AE that aligns with… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.07803  [pdf, other

    cs.SD cs.AI eess.AS

    EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

    Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  8. arXiv:2406.05983  [pdf, other

    eess.AS

    Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation

    Authors: Ui-Hyeop Shin, Sangyoun Lee, Taehan Kim, Hyung-Min Park

    Abstract: Since the success of a time-domain speech separation, further improvements have been made by expanding the length and channel of a feature sequence to increase the amount of computation. When temporally expanded to a long sequence, the feature is segmented into chunks as a dual-path model in most studies of speech separation. In particular, it is common for the process of separating features corre… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Project Page https://fordemopage.github.io/SepReformer

  9. arXiv:2406.05314  [pdf, other

    eess.AS cs.AI eess.SP

    Relational Proxy Loss for Audio-Text based Keyword Spotting

    Authors: Youngmoon Jung, Seung** Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho

    Abstract: In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes text input during the enrollment phase and audio input during actual usage, we call this task audio-text based KWS. To enable this task, both acoustic and text encoders are typically trained using deep… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, Accepted by Interspeech 2024

  10. arXiv:2405.18012  [pdf, other

    cs.CV eess.IV

    Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

    Authors: Muhammad Adi Nugroho, Sangmin Woo, Sumin Lee, **young Park, Yooseung Wang, Donguk Kim, Changick Kim

    Abstract: Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net) for WSGAR, which consists of the motion-aware actor encoder to extract actor features and the two-pathways relation module to infer the interaction… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  11. arXiv:2405.10216  [pdf, other

    cs.LG cs.AI eess.SP

    Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting

    Authors: Divij Gupta, Anubhav Bhatti, Suraj Parmar, Chen Dan, Yuwei Liu, Bingjie Shen, San Lee

    Abstract: Low-Rank Adaptation (LoRA) is a widely used technique for fine-tuning large pre-trained or foundational models across different modalities and tasks. However, its application to time series data, particularly within foundational models, remains underexplored. This paper examines the impact of LoRA on contemporary time series foundational models: Lag-Llama, MOIRAI, and Chronos. We demonstrate LoRA'… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures. This work has been submitted to the ACM for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  12. arXiv:2405.06284  [pdf, other

    eess.IV cs.CV cs.LG

    Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention

    Authors: Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim, Sang-Chul Lee

    Abstract: Generalizability in deep neural networks plays a pivotal role in medical image segmentation. However, deep learning-based medical image analyses tend to overlook the importance of frequency variance, which is critical element for achieving a model that is both modality-agnostic and domain-generalizable. Additionally, various models fail to account for the potential information loss that can arise… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted in Computer Vision and Pattern Recognition (CVPR) 2024

  13. arXiv:2405.01264  [pdf, other

    eess.SY

    Model Predictive Guidance for Fuel-Optimal Landing of Reusable Launch Vehicles

    Authors: Ki-Wook Jung, Sang-Don Lee, Cheol-Goo Jung, Chang-Hun Lee

    Abstract: This paper introduces a landing guidance strategy for reusable launch vehicles (RLVs) using a model predictive approach based on sequential convex programming (SCP). The proposed approach devises two distinct optimal control problems (OCPs): planning a fuel-optimal landing trajectory that accommodates practical path constraints specific to RLVs, and determining real-time optimal tracking commands.… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  14. arXiv:2405.01113  [pdf, other

    cs.CV cs.AI eess.IV

    Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

    Authors: Seungyeop Lee, Knut Peterson, Solmaz Arezoomandan, Bill Cai, Peihan Li, Lifeng Zhou, David Han

    Abstract: A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data g… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  15. arXiv:2404.15305  [pdf, other

    eess.SP cs.LG

    ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay

    Authors: Hyungjun Yoon, Jaehyun Kwak, Biniyam Aschalew Tolera, Gaole Dai, Mo Li, Taesik Gong, Kimin Lee, Sung-Ju Lee

    Abstract: Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  16. arXiv:2404.13286  [pdf, other

    cs.SD cs.IR eess.AS

    Track Role Prediction of Single-Instrumental Sequences

    Authors: Changheon Han, Suhyun Lee, Minsam Ko

    Abstract: In the composition process, selecting appropriate single-instrumental music sequences and assigning their track-role is an indispensable task. However, manually determining the track-role for a myriad of music samples can be time-consuming and labor-intensive. This study introduces a deep learning model designed to automatically predict the track-role of single-instrumental music sequences. Our ev… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ISMIR LBD 2023

  17. arXiv:2404.04096  [pdf, other

    cs.IT eess.SP

    Machine Learning-Aided Cooperative Localization under Dense Urban Environment

    Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

    Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  18. arXiv:2404.02135  [pdf

    cs.CV eess.IV

    Enhancing Ship Classification in Optical Satellite Imagery: Integrating Convolutional Block Attention Module with ResNet for Improved Performance

    Authors: Ryan Donghan Kwon, Gangjoo Robin Nam, Jisoo Tak, Junseob Shin, Hyerin Cha, Yeom Hyeok, Seung Won Lee

    Abstract: This study presents an advanced Convolutional Neural Network (CNN) architecture for ship classification from optical satellite imagery, significantly enhancing performance through the integration of the Convolutional Block Attention Module (CBAM) and additional architectural innovations. Building upon the foundational ResNet50 model, we first incorporated a standard CBAM to direct the model's focu… ▽ More

    Submitted 8 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  19. arXiv:2403.17420  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

    Authors: Dong** Kim, Sung ** Um, Sangmin Lee, Jung Uk Kim

    Abstract: The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localizati… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  20. arXiv:2403.17327  [pdf, other

    cs.SD cs.CV eess.AS

    Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer

    Authors: Jeong-Yoon Kim, Seung-Ho Lee

    Abstract: In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time (x-axis) in spectrogram and transferring positional information between ViT through knowledge transfer. The proposed method has the following originality i) We use vertically segmented patches of log-Mel spectr… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  21. arXiv:2403.14154  [pdf, other

    eess.SY

    LR-FHSS Transceiver for Direct-to-Satellite IoT Communications: Design, Implementation, and Verification

    Authors: Sooyeob Jung, Seongah Jeong, **kyu Kang, Gyeongrae Im, Sangjae Lee, Mi-Kyung Oh, Joon Gyu Ryu, Joonhyuk Kang

    Abstract: This paper proposes a long range-frequency hop** spread spectrum (LR-FHSS) transceiver design for the Direct-to-Satellite Internet of Things (DtS-IoT) communication system. The DtS-IoT system has recently attracted attention as a promising nonterrestrial network (NTN) solution to provide high-traffic and low-latency data transfer services to IoT devices in global coverage. In particular, this st… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 17pages, 23 figures

  22. arXiv:2403.09179  [pdf, other

    eess.SY cs.RO math.OC

    Synchronisation-Oriented Design Approach for Adaptive Control

    Authors: Namhoon Cho, Seokwon Lee, Hyo-Sang Shin

    Abstract: This study presents a synchronisation-oriented perspective towards adaptive control which views model-referenced adaptation as synchronisation between actual and virtual dynamic systems. In the context of adaptation, model reference adaptive control methods make the state response of the actual plant follow a reference model. In the context of synchronisation, consensus methods involving diffusive… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 34 pages, 8 figures, extended version for a manuscript submitted to Automatica

  23. arXiv:2403.05093  [pdf, other

    cs.CV eess.IV

    Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile

    Authors: Seokjun Lee, Seung-Won Jung, Hyunseok Seo

    Abstract: Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  24. arXiv:2403.01130  [pdf, other

    eess.AS eess.SP

    Arbitrary Discrete Fourier Analysis and Its Application in Replayed Speech Detection

    Authors: Shih-Kuang Lee

    Abstract: In this paper, a group of finite sequences and its variants were proposed to use in conducting signal analysis; we called the developed signal analysis methods arbitrary discrete Fourier analysis (ADFA), Mel-scale discrete Fourier analysis (MDFA) and constant Q analysis (CQA). The effectiveness of three signal analysis methods were then validated by testing their performance on a replayed speech d… ▽ More

    Submitted 23 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: https://github.com/shihkuanglee/ADFA

  25. arXiv:2402.17127  [pdf, other

    cs.SD eess.AS

    Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

    Authors: Taein Kang, Soyul Han, Sunmook Choi, Jae** Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

    Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 5 pages

    MSC Class: 00A71 ACM Class: I.2.6

  26. arXiv:2402.15539  [pdf, ps, other

    eess.AS cs.CL

    Speech Corpus for Korean Children with Autism Spectrum Disorder: Towards Automatic Assessment Systems

    Authors: Seonwoo Lee, Jihyun Mun, Sunhee Kim, Minhwa Chung

    Abstract: Despite the growing demand for digital therapeutics for children with Autism Spectrum Disorder (ASD), there is currently no speech corpus available for Korean children with ASD. This paper introduces a speech corpus specifically designed for Korean children with ASD, aiming to advance speech technologies such as pronunciation and severity evaluation. Speech recordings from speech and language eval… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 11 pages, Accepted for LREC-COLING 2024

  27. arXiv:2402.08979  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Learning-enabled Flexible Job-shop Scheduling for Scalable Smart Manufacturing

    Authors: Sihoon Moon, Sanghoon Lee, Kyung-Joon Park

    Abstract: In smart manufacturing systems (SMSs), flexible job-shop scheduling with transportation constraints (FJSPT) is essential to optimize solutions for maximizing productivity, considering production flexibility based on automated guided vehicles (AGVs). Recent developments in deep reinforcement learning (DRL)-based methods for FJSPT have encountered a scale generalization challenge. These methods unde… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  28. arXiv:2402.00032  [pdf, other

    cs.RO eess.SY

    Multi-objective Generative Design Framework and Realization for Quasi-serial Manipulator: Considering Kinematic and Dynamic Performance

    Authors: Sumin Lee, Sunwoong Yang, Namwoo Kang

    Abstract: This paper proposes a framework that optimizes the linkage mechanism of the quasi-serial manipulator for target tasks. This process is explained through a case study of 2-degree-of-freedom linkage mechanisms, which significantly affect the workspace of the quasi-serial manipulator. First, a vast quasi-serial mechanism is generated with a workspace satisfying a target task and it converts it into a… ▽ More

    Submitted 7 January, 2024; originally announced February 2024.

  29. TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data

    Authors: Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Although there has been significant advancement in the field of speech-to-speech translation, conventional models still require language-parallel speech data between the source and target languages for training. In this paper, we introduce TranSentence, a novel speech-to-speech translation without language-parallel speech data. To achieve this, we first adopt a language-agnostic sentence-level spe… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  30. arXiv:2401.08095  [pdf, other

    cs.SD cs.AI eess.AS

    DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

    Authors: Hyung-Seok Oh, Sang-Hoon Lee, Deok-Hyeon Cho, Seong-Whan Lee

    Abstract: Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics. Recent advancements in EVC have involved the simultaneous modeling of pitch and duration, utilizing the potential of sequence-to-sequence (seq2seq) models. To enhance reliability and efficiency in conversion, this… ▽ More

    Submitted 7 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures, 8 tables

  31. arXiv:2401.07472  [pdf, other

    eess.SY

    Fully Decentralized Design of Initialization-free Distributed Network Size Estimation

    Authors: Donggil Lee, Taekyoo Kim, Seungjoon Lee, Hyungbo Shim

    Abstract: In this paper, we propose a distributed scheme for estimating the network size, which refers to the total number of agents in a network. By leveraging a synchronization technique for multi-agent systems, we devise an agent dynamics that ensures convergence to an equilibrium point located near the network size regardless of its initial condition. Our approach is based on an assumption that each age… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  32. arXiv:2401.06913  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    Microphone Conversion: Mitigating Device Variability in Sound Event Classification

    Authors: Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park

    Abstract: In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method. As SEC systems become increasingly common, it is crucial that they work well with audio from diverse recording devices. Our method addresses limited device div… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  33. arXiv:2312.13313  [pdf, other

    eess.IV cs.CV

    ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

    Authors: Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho

    Abstract: RAW images are rarely shared mainly due to its excessive data size compared to their sRGB counterparts obtained by camera ISPs. Learning the forward and inverse processes of camera ISPs has been recently demonstrated, enabling physically-meaningful RAW-level image processing on input sRGB images. However, existing learning-based ISP methods fail to handle the large variations in the ISP processes… ▽ More

    Submitted 14 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  34. arXiv:2312.09572  [pdf, other

    eess.AS cs.CL cs.SD

    IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases

    Authors: Sunghwa Lee, Younghoon Shin, Myungjong Kim, Jiwon Seo

    Abstract: Several sensing techniques have been proposed for silent speech recognition (SSR); however, many of these methods require invasive processes or sensor attachment to the skin using adhesive tape or glue, rendering them unsuitable for frequent use in daily life. By contrast, impulse radio ultra-wideband (IR-UWB) radar can operate without physical contact with users' articulators and related body par… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Access

  35. arXiv:2312.09461  [pdf, other

    eess.SP cs.HC cs.LG

    Improving Generalization of Drowsiness State Classification by Domain-Specific Normalization

    Authors: Dong-Young Kim, Dong-Kyun Han, Seo-Hyeon Park, Geun-Deok Jang, Seong-Whan Lee

    Abstract: Abnormal driver states, particularly have been major concerns for road safety, emphasizing the importance of accurate drowsiness detection to prevent accidents. Electroencephalogram (EEG) signals are recognized for their effectiveness in monitoring a driver's mental state by monitoring brain activities. However, the challenge lies in the requirement for prior calibration due to the variation of EE… ▽ More

    Submitted 14 November, 2023; originally announced December 2023.

    Comments: Submitted to 2024 12th IEEE International Winter Conference on Brain-Computer Interface

  36. arXiv:2312.09456  [pdf, other

    eess.SP cs.AI cs.LG

    Pioneering EEG Motor Imagery Classification Through Counterfactual Analysis

    Authors: Kang Yin, Hye-Bin Shin, Hee-Dong Kim, Seong-Whan Lee

    Abstract: The application of counterfactual explanation (CE) techniques in the realm of electroencephalography (EEG) classification has been relatively infrequent in contemporary research. In this study, we attempt to introduce and explore a novel non-generative approach to CE, specifically tailored for the analysis of EEG signals. This innovative approach assesses the model's decision-making process by str… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  37. arXiv:2312.09446  [pdf, other

    eess.SP cs.AI cs.CV

    A Distributed Inference System for Detecting Task-wise Single Trial Event-Related Potential in Stream of Satellite Images

    Authors: Sung-** Kim, Heon-Gyu Kwak, Hyeon-Taek Han, Dae-Hyeok Lee, Ji-Hoon Jeong, Seong-Whan Lee

    Abstract: Brain-computer interface (BCI) has garnered the significant attention for their potential in various applications, with event-related potential (ERP) performing a considerable role in BCI systems. This paper introduces a novel Distributed Inference System tailored for detecting task-wise single-trial ERPs in a stream of satellite images. Unlike traditional methodologies that employ a single model… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  38. arXiv:2312.09423  [pdf, other

    eess.SP cs.AI cs.LG

    Decoding EEG-based Workload Levels Using Spatio-temporal Features Under Flight Environment

    Authors: Dae-Hyeok Lee, Sung-** Kim, Si-Hyun Kim, Seong-Whan Lee

    Abstract: The detection of pilots' mental states is important due to the potential for their abnormal mental states to result in catastrophic accidents. This study introduces the feasibility of employing deep learning techniques to classify different workload levels, specifically normal state, low workload, and high workload. To the best of our knowledge, this study is the first attempt to classify workload… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, 1 table, 1 algorithm

  39. arXiv:2312.07826  [pdf

    cs.RO eess.SY

    Integrated Path Tracking with DYC and MPC using LSTM Based Tire Force Estimator for Four-wheel Independent Steering and Driving Vehicle

    Authors: Sung** Lim, Bilal Sadiq, Yongsik **, Sangho Lee, Gyeungho Choi, Kanghyun Nam, Yongseob Lim

    Abstract: Active collision avoidance system plays a crucial role in ensuring the lateral safety of autonomous vehicles, and it is primarily related to path planning and tracking control algorithms. In particular, the direct yaw-moment control (DYC) system can significantly improve the lateral stability of a vehicle in environments with sudden changes in road conditions. In order to apply the DYC algorithm,… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  40. arXiv:2312.05828  [pdf, other

    cs.LG eess.SP

    Sparse Multitask Learning for Efficient Neural Representation of Motor Imagery and Execution

    Authors: Hye-Bin Shin, Kang Yin, Seong-Whan Lee

    Abstract: In the quest for efficient neural network models for neural data interpretation and user intent classification in brain-computer interfaces (BCIs), learning meaningful sparse representations of the underlying neural subspaces is crucial. The present study introduces a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks, inspired by the natural partitioning of… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  41. arXiv:2312.05814  [pdf, other

    cs.AI cs.SD eess.AS

    Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks

    Authors: Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim, Seong-Whan Lee

    Abstract: Brain-to-speech technology represents a fusion of interdisciplinary applications encompassing fields of artificial intelligence, brain-computer interfaces, and speech synthesis. Neural representation learning based intention decoding and speech synthesis directly connects the neural activity to the means of human linguistic communication, which may greatly enhance the naturalness of communication.… ▽ More

    Submitted 26 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: 4 pages

  42. arXiv:2312.03196  [pdf, other

    cs.LG eess.SP

    Domain Invariant Representation Learning and Sleep Dynamics Modeling for Automatic Sleep Staging

    Authors: Seungyeon Lee, Thai-Hoang Pham, Zhao Cheng, ** Zhang

    Abstract: Sleep staging has become a critical task in diagnosing and treating sleep disorders to prevent sleep related diseases. With growing large scale sleep databases, significant progress has been made toward automatic sleep staging. However, previous studies face critical problems in sleep studies; the heterogeneity of subjects' physiological signals, the inability to extract meaningful information fro… ▽ More

    Submitted 9 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

  43. arXiv:2312.01638  [pdf, other

    eess.IV cs.CV

    J-Net: Improved U-Net for Terahertz Image Super-Resolution

    Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

    Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  44. arXiv:2311.17923  [pdf, other

    eess.AS cs.HC

    Enhanced Generative Adversarial Networks for Unseen Word Generation from EEG Signals

    Authors: Young-Eun Lee, Seo-Hyun Lee, Soowon Kim, Jung-Sun Lee, Deok-Seon Kim, Seong-Whan Lee

    Abstract: Recent advances in brain-computer interface (BCI) technology, particularly based on generative adversarial networks (GAN), have shown great promise for improving decoding performance for BCI. Within the realm of Brain-Computer Interfaces (BCI), GANs find application in addressing many areas. They serve as a valuable tool for data augmentation, which can solve the challenge of limited data availabi… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures

  45. arXiv:2311.15683  [pdf

    eess.AS cs.SD eess.SP

    Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency

    Authors: Chenyu Tang, Muzi Xu, Wentian Yi, Zibo Zhang, Edoardo Occhipinti, Chaoqun Dong, Dafydd Ravenscroft, Sung-Min Jung, Sanghyo Lee, Shuo Gao, Jong Min Kim, Luigi G. Occhipinti

    Abstract: Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 42… ▽ More

    Submitted 7 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 5 figures in the article; 11 figures and 4 tables in supplementary information

    Journal ref: npj Flexible Electronics (2024)

  46. arXiv:2311.14208  [pdf, other

    cs.CV eess.IV

    ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization

    Authors: Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge

    Abstract: Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 10 pages, 6 figures, 4 tables

  47. arXiv:2311.13687  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

    Authors: Jayeon Yi, Sungho Lee, Kyogu Lee

    Abstract: In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players. We newly formulate chart generation as a sequence generation task and train a Transformer using a large dataset. We also introduce tempo-informed preprocessing and training procedures, some of which are suggested to be integral for a success… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: ISMIR 2023 LBD. Demo videos and code at stet-stet.github.io/goct

  48. arXiv:2311.12454  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

    Authors: Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong-Whan Lee

    Abstract: Large language models (LLM)-based speech synthesis has been widely adopted in zero-shot speech synthesis. However, they require a large-scale data and possess the same limitations as previous autoregressive speech models, including slow inference speed and lack of robustness. This paper proposes HierSpeech++, a fast and strong zero-shot speech synthesizer for text-to-speech (TTS) and voice convers… ▽ More

    Submitted 27 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 16 pages, 9 figures, 12 tables

  49. arXiv:2311.10430  [pdf, other

    eess.IV cs.CV cs.LG

    Deep Residual CNN for Multi-Class Chest Infection Diagnosis

    Authors: Ryan Donghan Kwon, Dohyun Lim, Yoonha Lee, Seung Won Lee

    Abstract: The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The im… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  50. arXiv:2311.09354  [pdf

    q-bio.QM cs.LG eess.IV

    Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

    Authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani

    Abstract: Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measure… ▽ More

    Submitted 11 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 52 total pages, Main text and SI included, 35 figures (5 main text, 30 supplemental), 9 tables, 6 datasets (provided on linked GitHub), linked image files on Zenodo