Skip to main content

Showing 1–50 of 73 results for author: Choi, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.15725  [pdf, other

    eess.AS cs.SD

    Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

    Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

    Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency war** and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 4 technical report

  2. arXiv:2406.05472  [pdf, other

    cs.CR eess.SY

    A Novel Generative AI-Based Framework for Anomaly Detection in Multicast Messages in Smart Grid Communications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-** Song, Junho Hong

    Abstract: Cybersecurity breaches in digital substations can pose significant challenges to the stability and reliability of power system operations. To address these challenges, defense and mitigation techniques are required. Identifying and detecting anomalies in information and communication technology (ICT) is crucial to ensure secure device interactions within digital substations. This paper proposes a… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures, Submitted to IEEE Transactions on Information Forensics and Security

  3. arXiv:2405.01591  [pdf, other

    cs.CL cs.AI eess.IV

    Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

    Authors: Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Su** Choi, Joo Heung Yoon

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: Under review

  4. arXiv:2402.17127  [pdf, other

    cs.SD eess.AS

    Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

    Authors: Taein Kang, Soyul Han, Sunmook Choi, Jae** Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

    Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 5 pages

    MSC Class: 00A71 ACM Class: I.2.6

  5. arXiv:2401.13498  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting

    Authors: Hounsu Kim, Soonbeom Choi, Juhan Nam

    Abstract: Synthesizing performing guitar sound is a highly challenging task due to the polyphony and high variability in expression. Recently, deep generative models have shown promising results in synthesizing expressive polyphonic instrument sounds from music scores, often using a generic MIDI input. In this work, we propose an expressive acoustic guitar sound synthesis model with a customized input repre… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  6. arXiv:2401.12473  [pdf, other

    eess.AS cs.SD

    Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

    Authors: Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe

    Abstract: We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, accepted by ICASSP 2024

  7. arXiv:2311.18287  [pdf, other

    eess.IV cs.CV cs.GR

    Dispersed Structured Light for Hyperspectral 3D Imaging

    Authors: Suhyun Shin, Seokjun Choi, Felix Heide, Seung-Hwan Baek

    Abstract: Hyperspectral 3D imaging aims to acquire both depth and spectral information of a scene. However, existing methods are either prohibitively expensive and bulky or compromise on spectral and depth accuracy. In this work, we present Dispersed Structured Light (DSL), a cost-effective and compact method for accurate hyperspectral 3D imaging. DSL modifies a traditional projector-camera system by placin… ▽ More

    Submitted 25 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  8. arXiv:2311.05462  [pdf, other

    cs.CR eess.SY

    ChatGPT and Other Large Language Models for Cybersecurity of Smart Grid Applications

    Authors: Aydin Zaboli, Seong Lok Choi, Tai-** Song, Junho Hong

    Abstract: Cybersecurity breaches targeting electrical substations constitute a significant threat to the integrity of the power grid, necessitating comprehensive defense and mitigation strategies. Any anomaly in information and communication technology (ICT) should be detected for secure communications between devices in digital substations. This paper proposes large language models (LLM), e.g., ChatGPT, fo… ▽ More

    Submitted 25 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, Accepted, 2024 IEEE Power & Energy Society General Meeting (PESGM), Seattle, WA, USA

  9. arXiv:2311.00332  [pdf, other

    q-bio.TO cs.CV eess.IV

    SDF4CHD: Generative Modeling of Cardiac Anatomies with Congenital Heart Defects

    Authors: Fanwei Kong, Sascha Stocker, Perry S. Choi, Michael Ma, Daniel B. Ennis, Alison Marsden

    Abstract: Congenital heart disease (CHD) encompasses a spectrum of cardiovascular structural abnormalities, often requiring customized treatment plans for individual patients. Computational modeling and analysis of these unique cardiac anatomies can improve diagnosis and treatment planning and may ultimately lead to improved outcomes. Deep learning (DL) methods have demonstrated the potential to enable effi… ▽ More

    Submitted 8 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  10. arXiv:2310.10633  [pdf, other

    physics.optics eess.IV

    Telescope imaging beyond the Rayleigh limit in extremely low SNR

    Authors: Hyunsoo Choi, Seungman Choi, Peter Menart, Angshuman Deka, Zubin Jacob

    Abstract: The Rayleigh limit and low Signal-to-Noise Ratio (SNR) scenarios pose significant limitations to optical imaging systems used in remote sensing, infrared thermal imaging, and space domain awareness. In this study, we introduce a Stochastic Sub-Rayleigh Imaging (SSRI) algorithm to localize point objects and estimate their positions, brightnesses, and number in low SNR conditions, even below the Ray… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 18 pages, 5 figures

  11. arXiv:2310.06364  [pdf, other

    cs.SD cs.AI eess.AS

    Noisy-ArcMix: Additive Noisy Angular Margin Loss Combined With Mixup Anomalous Sound Detection

    Authors: Soonhyeon Choi, Jung-Woo Choi

    Abstract: Unsupervised anomalous sound detection (ASD) aims to identify anomalous sounds by learning the features of normal operational sounds and sensing their deviations. Recent approaches have focused on the self-supervised task utilizing the classification of normal data, and advanced models have shown that securing representation space for anomalous data is important through representation learning yie… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  12. arXiv:2309.07937  [pdf, other

    eess.AS cs.LG cs.SD

    Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

    Authors: Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe

    Abstract: We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates text vocabulary with discrete speech tokens from self-supervised speech features and uses special tokens to enable multitask learning. Compared to a single-task model, VoxtLM exhibits a significant improvement in speech syn… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

  13. arXiv:2308.02133  [pdf, other

    eess.SP

    NeuralEQ: Neural-Network-Based Equalizer for High-Speed Wireline Communication

    Authors: Hanseok Kim, Jae Hyung Ju, Hyun Seok Choi, Hyeri Roh, Woo-Seok Choi

    Abstract: With the growing demand for high-bandwidth applications like video streaming and cloud services, the data transfer rates required for wireline communication keeps increasing, making the channel loss a major obstacle in achieving low bit error rate (BER). Equalization techniques such as feed-forward equalizer (FFE) and decision feedback equalizer (DFE) are commonly used to compensate for channel lo… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

  14. arXiv:2306.14310  [pdf, other

    cs.CL cs.SD eess.AS

    Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

    Authors: Jungbae Park, Seungtaek Choi

    Abstract: Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: Accepted at Interspeech 2023, 4 pages, 1 page for reference

  15. arXiv:2306.06340  [pdf, other

    eess.SP cs.LG q-bio.QM

    ECGBERT: Understanding Hidden Language of ECGs with Self-Supervised Representation Learning

    Authors: Seokmin Choi, Sajad Mousavi, Phillip Si, Haben G. Yhdego, Fatemeh Khadem, Fatemeh Afghah

    Abstract: In the medical field, current ECG signal analysis approaches rely on supervised deep neural networks trained for specific tasks that require substantial amounts of labeled data. However, our paper introduces ECGBERT, a self-supervised representation learning approach that unlocks the underlying language of ECGs. By unsupervised pre-training of the model, we mitigate challenges posed by the lack of… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

  16. An empirical study on speech restoration guided by self supervised speech representation

    Authors: Jaeuk Byun, Youna Ji, Soo Whan Chung, Soyeon Choe, Min Seok Choi

    Abstract: Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clip**, and speech attenuation can all adversely affect speech quality. Speech restoration aims to recover speech components from these distortions. This paper focuses on exploring the impact of self-supervised speech represen… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: To be presented at ICASSP 2023

  17. arXiv:2305.08878  [pdf, other

    eess.IV cs.CV cs.LG

    Learning to Learn Unlearned Feature for Brain Tumor Segmentation

    Authors: Seungyub Han, Yeongmo Kim, Seokhyeon Ha, Jungwoo Lee, Seunghong Choi

    Abstract: We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

    Comments: Medical Imaging Meets NeurIPS 2018

  18. arXiv:2304.08707  [pdf, other

    eess.AS cs.SD

    Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band bl… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: in ICASSP 2023

  19. arXiv:2304.02389  [pdf, other

    eess.IV cs.CV cs.LG

    DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images

    Authors: Bo Qian, Hao Chen, Xiangning Wang, Haoxuan Che, Gitaek Kwon, Jaeyoung Kim, Sung** Choi, Seoyoung Shin, Felix Krause, Markus Unterdechler, Junlin Hou, Rui Feng, Yihao Li, Mostafa El Habib Daho, Qiang Wu, ** Zhang, Xiaokang Yang, Yiyu Cai, Wei** Jia, Huating Li, Bin Sheng

    Abstract: Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  20. arXiv:2304.00471  [pdf, other

    cs.SD cs.CV cs.GR cs.LG eess.AS

    A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

    Authors: Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

    Abstract: Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limi… ▽ More

    Submitted 28 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: MLSys Workshop on On-Device Intelligence, 2023; Demo: https://huggingface.co/spaces/nota-ai/compressed_wav2lip

  21. arXiv:2303.16511  [pdf, other

    eess.AS

    Joint unsupervised and supervised learning for context-aware language identification

    Authors: **seok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim

    Abstract: Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this probl… ▽ More

    Submitted 14 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  22. arXiv:2303.07592  [pdf, other

    eess.AS cs.SD

    Lightweight feature encoder for wake-up word detection based on self-supervised speech representation

    Authors: Hyungjun Lim, Younggwan Kim, Kiho Yeom, Eunjoo Seo, Hoodong Lee, Stanley Jungkyu Choi, Honglak Lee

    Abstract: Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  23. arXiv:2303.01105  [pdf, other

    eess.IV cs.CV cs.LG

    Evidence-empowered Transfer Learning for Alzheimer's Disease

    Authors: Kai Tzu-iunn Ong, Hana Kim, Min** Kim, **seong Jang, Beomseok Sohn, Yoon Seong Choi, Dosik Hwang, Seong Jae Hwang, **young Yeo

    Abstract: Transfer learning has been widely utilized to mitigate the data scarcity problem in the field of Alzheimer's disease (AD). Conventional transfer learning relies on re-using models trained on AD-irrelevant tasks such as natural image classification. However, it often leads to negative transfer due to the discrepancy between the non-medical source and target medical domains. To address this, we pres… ▽ More

    Submitted 17 April, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to IEEE International Symposium on Biomedical Imaging (ISBI) 2023. The authorship was changed from co-first authors to a single first author, which was authorized by the adviser/corresponding author **young Yeo (Apr 18th, 2023)

  24. arXiv:2211.15948  [pdf, other

    cs.SD eess.AS

    Neural Vocoder Feature Estimation for Dry Singing Voice Separation

    Authors: Jaekwon Im, Soonbeom Choi, Sangeon Yong, Juhan Nam

    Abstract: Singing voice separation (SVS) is a task that separates singing voice audio from its mixture with instrumental audio. Previous SVS studies have mainly employed the spectrogram masking method which requires a large dimensionality in predicting the binary masks. In addition, they focused on extracting a vocal stem that retains the wet sound with the reverberation effect. This result may hinder the r… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 6 pages, 4 figures

    Journal ref: 14th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2022

  25. arXiv:2211.12433  [pdf, other

    cs.SD eess.AS

    TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral map**, where the real and imaginary (RI)… ▽ More

    Submitted 4 August, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: In IEEE/ACM Transactions on Audio, Speech, and Language Processing. A sound demo is available at https://zqwang7.github.io/demos/TF-GridNet-demo/index.html, and the code is available at https://github.com/espnet/espnet/pull/5395

  26. arXiv:2210.09135  [pdf, other

    cs.CV eess.IV

    Gated Recurrent Unit for Video Denoising

    Authors: Kai Guo, Seungwon Choi, Jongseong Choi

    Abstract: Current video denoising methods perform temporal fusion by designing convolutional neural networks (CNN) or combine spatial denoising with temporal fusion into basic recurrent neural networks (RNNs). However, there have not yet been works which adapt gated recurrent unit (GRU) mechanisms for video denoising. In this letter, we propose a new video denoising model based on GRU, namely GRU-VD. First,… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 5 pages, 5 figures

    MSC Class: 62H35; 68U10 ACM Class: I.4.4

  27. arXiv:2209.03952  [pdf, other

    cs.SD eess.AS

    TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation

    Authors: Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe

    Abstract: We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-frequency (T-F) domain, for monaural talker-independent speaker separation in anechoic conditions. The model stacks several multi-path blocks, each consisting of an intra-frame spectral module, a sub-band temporal module, and a full-band self-attention module, to leverage local and global spectro-temporal inf… ▽ More

    Submitted 15 March, 2023; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: in IEEE ICASSP 2023

  28. Optimal Parking Planning for Shared Autonomous Vehicles

    Authors: Seong** Choi, **woo Lee

    Abstract: Parking is a crucial element of the driving experience in urban transportation systems. Especially in the coming era of Shared Autonomous Vehicles (SAVs), parking operations in urban transportation networks will inevitably change. Parking stations will serve as storage places for unused vehicles and depots that control the level-of-service of SAVs. This study presents an Analytical Parking Plannin… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: 27 pages, 9 figures, 9 tables

  29. arXiv:2206.12059  [pdf

    eess.AS cs.SD

    Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

    Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

    Abstract: Performance of sound event localization and detection (SELD) in real scenes is limited by small size of SELD dataset, due to difficulty in obtaining sufficient amount of realistic multi-channel audio data recordings with accurate label. We used two main strategies to solve problems arising from the small real SELD dataset. First, we applied various data augmentation methods on all data dimensions:… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Report submitted for DCASE2022 Challenge Task3

  30. arXiv:2206.11645  [pdf, ps, other

    eess.AS

    Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi, Yong-Hwa Park

    Abstract: While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Prev… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: Technical Reprot submitted for DCASE2022 Challenge Task4

  31. arXiv:2206.03612  [pdf, other

    cs.CV cs.AI cs.LG eess.SP

    Predictive Modeling of Charge Levels for Battery Electric Vehicles using CNN EfficientNet and IGTD Algorithm

    Authors: Seongwoo Choi, Chongzhou Fang, David Haddad, Minsung Kim

    Abstract: Convolutional Neural Networks (CNN) have been a good solution for understanding a vast image dataset. As the increased number of battery-equipped electric vehicles is flourishing globally, there has been much research on understanding which charge levels electric vehicle drivers would choose to charge their vehicles to get to their destination without any prevention. We implemented deep learning a… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  32. Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

    Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More

    Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

  33. arXiv:2204.03778  [pdf, other

    eess.SP q-bio.NC

    Mitigating Mismatch Compression in Differential Local Field Potentials

    Authors: Vineet Tiruvadi, Sam James, Bryan Howell, Mosadoluwa Obatusin, Andrea Crowell, Patricio Riva-Posse, Ki Sueng Choi, Allison Waters, Robert E. Gross, Cameron C. McIntyre, Helen S. Mayberg, Robert Butera

    Abstract: Bidirectional deep brain stimulation (bdDBS) devices capable of recording differential local field potentials (dLFP) enable neural recordings alongside clinical therapy. Efforts to identify objective signals of various brain disorders, or disease readouts, are challenging in dLFP, especially during active DBS. In this report we identified, characterized, and mitigated a major source of distortion… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: 9 pages, 9 figures

  34. arXiv:2203.10047  [pdf, ps, other

    eess.SP

    High-Density Coding Scheme for SWIPT Systems

    Authors: Dongheon Lee, Gyuyeol Kong, Jang-Won Lee, Sooyong Choi

    Abstract: In this study, a novel coding scheme called highdensity coding based on high-density codebooks using a genetic local search algorithm is proposed. The high-density codebook maximizes the energy transfer capability by maximizing the ratio of 1 in the codebook while satisfying the conditions of a codeword with length n, a codebook with 2k codewords, and a minimum Hamming distance of the codebook of… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  35. arXiv:2203.03166  [pdf

    eess.AS cs.SD eess.SP

    HRTF measurement for accurate sound localization cues

    Authors: Gyeong-Tae Lee, Sang-Min Choi, Byeong-Yun Ko, Yong-Hwa Park

    Abstract: A new database of head-related transfer functions (HRTFs) for accurate sound source localization is presented through precise measurement and post-processing in terms of improved frequency bandwidth and causality of head-related impulse responses (HRIRs) for accurate spectral cue (SC) and interaural time difference (ITD), respectively. The improvement effects of the proposed methods on binaural so… ▽ More

    Submitted 5 April, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 39 pages, 27 figures, and 1 table

  36. arXiv:2202.04328  [pdf, other

    cs.SD eess.AS

    CAU_KU team's submission to ADD 2022 Challenge task 1: Low-quality fake audio detection through frequency feature masking

    Authors: Il-Youp Kwak, Sunmook Choi, Jonghoon Yang, Yerin Lee, Seungsang Oh

    Abstract: This technical report describes Chung-Ang University and Korea University (CAU_KU) team's model participating in the Audio Deep Synthesis Detection (ADD) 2022 Challenge, track 1: Low-quality fake audio detection. For track 1, we propose a frequency feature masking (FFM) augmentation technique to deal with a low-quality audio environment. %detection that spectrogram-based models can be applied. We… ▽ More

    Submitted 9 February, 2022; originally announced February 2022.

  37. arXiv:2201.06735  [pdf

    eess.SP

    AI Augmented Digital Metal Component

    Authors: Eunhyeok Seo, Hyokyung Sung, Hayeol Kim, Taekyeong Kim, Sangeun Park, Min Sik Lee, Seung Ki Moon, Jung Gi Kim, Hayoung Chung, Seong-Kyum Choi, Ji-hun Yu, Kyung Tae Kim, Seong ** Park, Namhun Kim, Im Doo Jung

    Abstract: The aim of this work is to propose a new paradigm that imparts intelligence to metal parts with the fusion of metal additive manufacturing and artificial intelligence (AI). Our digital metal part classifies the status with real time data processing with convolutional neural network (CNN). The training data for the CNN is collected from a strain gauge embedded in metal parts by laser powder bed fus… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 46 pages

  38. arXiv:2112.02896  [pdf, other

    eess.IV cs.CV cs.LG

    Tunable Image Quality Control of 3-D Ultrasound using Switchable CycleGAN

    Authors: Jaeyoung Huh, Shujaat Khan, Sung** Choi, Dongkuk Shin, Eun Sun Lee, Jong Chul Ye

    Abstract: In contrast to 2-D ultrasound (US) for uniaxial plane imaging, a 3-D US imaging system can visualize a volume along three axial planes. This allows for a full view of the anatomy, which is useful for gynecological (GYN) and obstetrical (OB) applications. Unfortunately, the 3-D US has an inherent limitation in resolution compared to the 2-D US. In the case of 3-D US with a 3-D mechanical probe, for… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  39. arXiv:2110.06546  [pdf, other

    eess.AS cs.LG cs.SD

    A Melody-Unsupervision Model for Singing Voice Synthesis

    Authors: Soonbeom Choi, Juhan Nam

    Abstract: Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To… ▽ More

    Submitted 14 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: ICASSP 2022

  40. arXiv:2109.12253  [pdf

    eess.SY

    Development of Safety Monitoring System of Connected and Automated Vehicles considering the Trade-off between Communication Efficiency and Data Reliability

    Authors: Sehyun Tak, Seong** Choi

    Abstract: The safety of urban transportation systems is considered a public health issue worldwide, and many researchers have contributed to improving it. Connected automated vehicles (CAVs) and cooperative intelligent transportation systems (C-ITSs) are considered solutions to ensure urban transportation systems' safety using various sensors and communication devices. However, it is found difficult to depl… ▽ More

    Submitted 30 September, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 13 pages, 16 figures, 1 table

  41. arXiv:2108.10147  [pdf, other

    cs.LG cs.AI eess.IV

    Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies with COVID-19 CT, X-Ray, and Cholesterol Data

    Authors: Yoo Jeong Ha, Minjae Yoo, Gusang Lee, Soyi Jung, Sae Won Choi, Joongheon Kim, Seehwan Yoo

    Abstract: Machine learning requires a large volume of sample data, especially when it is used in high-accuracy medical applications. However, patient records are one of the most sensitive private information that is not usually shared among institutes. This paper presents spatio-temporal split learning, a distributed deep neural network framework, which is a turning point in allowing collaboration among pri… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

  42. arXiv:2108.05543   

    eess.SY

    Development of Simulation-based Lane Change Control System for Autonomous Vehicles

    Authors: Seong** Choi

    Abstract: Originally, the decision and control of the lane change of the vehicle were on the human driver. In previous studies, the decision-making of lane-changing of the human drivers was mainly used to increase the individual's benefit. However, the lane-changing behavior of these human drivers can sometimes have a bad influence on the overall traffic flow. As technology for autonomous vehicles develop,… ▽ More

    Submitted 27 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: bitmapped submission

  43. arXiv:2108.01812  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation

    Authors: Seongmin Park, Dongchan Shin, Sangyoun Paik, Subong Choi, Alena Kazakova, Jihwa Lee

    Abstract: Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing. While error detection systems often take advantage of statistical language archetypes captured by LMs, at times the pretrained knowledge can hinder error detection performance. For instance, presence of speech disfluencies might confuse the post-processin… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  44. Deep learning based cough detection camera using enhanced features

    Authors: Gyeong-Tae Lee, Hyeonuk Nam, Seong-Hu Kim, Sang-Min Choi, Youngkey Kim, Yong-Hwa Park

    Abstract: Coughing is a typical symptom of COVID-19. To detect and localize coughing sounds remotely, a convolutional neural network (CNN) based deep learning model was developed in this work and integrated with a sound camera for the visualization of the cough sounds. The cough detection model is a binary classifier of which the input is a two second acoustic feature and the output is one of two inferences… ▽ More

    Submitted 24 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 28 pages, 20 figures, and 14 tables

    Journal ref: Expert Systems With Applications, Vol. 206, No. 15, pp. 1-20, 2022

  45. arXiv:2107.05004  [pdf, other

    eess.SP

    Designing a Robust Carrier Frequency Offset Estimation Scheme for Meeting Target Decoding Performance in an OFDM System

    Authors: Minkyeong Jeong, Sang-Won Choi, Juyeop Kim

    Abstract: In a target communication system, a delicately designed frequency offset estimation scheme is required to meet certain decoding performance. In this paper, we proposed at wo-step estimation scheme, coarse and residual, with different value of an time interval parameter. A result of RF conduction test shows that the proposed method has an 1dB gain of SNR compared to coarse-only estimator. A result… ▽ More

    Submitted 19 July, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

    Comments: 22 pages, 13 figures

  46. arXiv:2107.04526  [pdf, ps, other

    cs.NI eess.SY

    A Dual-Connection based Handover Scheme for Ultra-Dense Millimeter-Wave Cellular Networks

    Authors: Seongjoon Kang, Siyoung Choi, Goodsol Lee, Saewoong Bahk

    Abstract: Mobile users in an ultra-dense millimeter-wave cellular network experience handover events more frequently than in conventional networks, which results in increased service interruption time and performance degradation due to blockages. Multi-connectivity has been proposed to resolve this, and it also extends the coverage of millimeter-wave communications. In this paper, we propose a dual-connecti… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  47. arXiv:2107.03649  [pdf

    eess.AS cs.SD

    Heavily Augmented Sound Event Detection utilizing Weak Predictions

    Authors: Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park

    Abstract: The performances of Sound Event Detection (SED) systems are greatly limited by the difficulty in generating large strongly labeled dataset. In this work, we used two main approaches to overcome the lack of strongly labeled data. First, we applied heavy data augmentation on input features. Data augmentation methods used include not only conventional methods used in speech/audio domains but also our… ▽ More

    Submitted 14 September, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Won 3rd place on IEEE DCASE 2021 Task 4

  48. Tracking Cells and their Lineages via Labeled Random Finite Sets

    Authors: Tran Thien Dat Nguyen, Ba-Ngu Vo, Ba-Tuong Vo, Du Yong Kim, Yu Suk Choi

    Abstract: Determining the trajectories of cells and their lineages or ancestries in live-cell experiments are fundamental to the understanding of how cells behave and divide. This paper proposes novel online algorithms for jointly tracking and resolving lineages of an unknown and time-varying number of cells from time-lapse video data. Our approach involves modeling the cell ensemble as a labeled random fin… ▽ More

    Submitted 27 October, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

  49. arXiv:2103.00006  [pdf, other

    eess.SP cs.LG

    Towards Synthesizing Twelve-Lead Electrocardiograms from Two Asynchronous Leads

    Authors: Yong-Yeon Jo, Young Sang Choi, Jong-Hwan Jang, Joon-Myoung Kwon

    Abstract: The electrocardiogram (ECG) records electrical signals in a non-invasive way to observe the condition of the heart, typically looking at the heart from 12 different directions. Several types of the cardiac disease are diagnosed by using 12-lead ECGs Recently, various wearable devices have enabled immediate access to the ECG without the use of wieldy equipment. However, they only provide ECGs with… ▽ More

    Submitted 25 June, 2024; v1 submitted 28 February, 2021; originally announced March 2021.

  50. arXiv:2102.13228  [pdf

    eess.SP

    A High-Throughput Multi-Mode LDPC Decoder for 5G NR

    Authors: Sina Pourjabar, Gwan S. Choi

    Abstract: This paper presents a partially parallel low-density parity-check (LDPC) decoder designed for the 5G New Radio (NR) standard. The design is using a multi-block parallel architecture with a flooding schedule. The decoder can support any code rates and code lengths up to the lifting size Zmax= 96. To compensate for the dropped throughput associated with the smaller Z values, the design can double an… ▽ More

    Submitted 10 March, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: More explanation added to section II.B. Fig. 3.(b) Revised. Typos corrected