Skip to main content

Showing 1–50 of 1,941 results for author: Kim, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, ** Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  3. arXiv:2406.18551  [pdf, other

    cs.CV cs.GR

    GFFE: G-buffer Free Frame Extrapolation for Low-latency Real-time Rendering

    Authors: Songyin Wu, Deepak Vembar, Anton Sochenov, Selvakumar Panneer, Sungye Kim, Anton Kaplanyan, Ling-Qi Yan

    Abstract: Real-time rendering has been embracing ever-demanding effects, such as ray tracing. However, rendering such effects in high resolution and high frame rate remains challenging. Frame extrapolation methods, which don't introduce additional latency as opposed to frame interpolation methods such as DLSS 3 and FSR 3, boost the frame rate by generating future frames based on previous frames. However, it… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  4. arXiv:2406.17869  [pdf, other

    cs.CV

    Burst Image Super-Resolution with Base Frame Selection

    Authors: Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho

    Abstract: Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: CVPR2024W NTIRE accepted

  5. arXiv:2406.17254  [pdf, other

    cs.CV

    Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

    Authors: Youngmin Kim, Sae** Kim, Hoyeon Moon, Youngjae Yu, Junhyug Noh

    Abstract: Scalp diseases and alopecia affect millions of people around the world, underscoring the urgent need for early diagnosis and management of the disease. However, the development of a comprehensive AI-based diagnosis system encompassing these conditions remains an underexplored domain due to the challenges associated with data imbalance and the costly nature of labeling. To address these issues, we… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: IEEE Transactions on Medical Imaging (Under Review)

  6. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  8. arXiv:2406.16695  [pdf, other

    cs.CV

    Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

    Authors: Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, **-wha Kim, Seungryong Kim

    Abstract: Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.16175  [pdf, other

    cs.SI cs.CY physics.soc-ph

    The Persistence of Contrarianism on Twitter: Map** users' sharing habits for the Ukraine war, COVID-19 vaccination, and the 2020 Midterm Elections

    Authors: David Axelrod, Sangyeon Kim, John Paolillo

    Abstract: Empirical studies of online disinformation emphasize matters of public concern such as the COVID-19 pandemic, foreign election interference, and the Russo-Ukraine war, largely in studies that treat the topics separately. Comparatively fewer studies attempt to relate such disparate topics and address the extent to which they share behaviors. In this study, we compare three samples of Twitter data o… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  10. arXiv:2406.16042  [pdf, other

    cs.CV

    Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

    Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong **, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: The project page is available at https://ku-cvlab.github.io/Diff-ID/

  11. arXiv:2406.15664  [pdf, other

    stat.ML cs.LG

    Flat Posterior Does Matter For Bayesian Transfer Learning

    Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, **ho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

    Abstract: The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  12. arXiv:2406.15102  [pdf, other

    cs.CV cs.LG

    HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

    Authors: Seonggon Kim, Eunhyeok Park

    Abstract: With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  13. arXiv:2406.13214  [pdf, other

    cs.LG

    Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

    Authors: Sangwoo Seo, Sungwon Kim, Jihyeong Jung, Yoonho Lee, Chanyoung Park

    Abstract: Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal grap… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  14. arXiv:2406.12904  [pdf, other

    cs.LG physics.comp-ph physics.optics

    Meent: Differentiable Electromagnetic Simulator for Machine Learning

    Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chae** Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, **myoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

    Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: under review

  15. arXiv:2406.12632  [pdf, other

    eess.IV cs.CV

    Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Image Synthesis: T1 MRI to Tau-PET

    Authors: Symac Kim, Junho Moon, Haejun Chung, Ikbeom Jang

    Abstract: Alzheimer's Disease (AD) is the most common form of dementia, characterised by cognitive decline and biomarkers such as tau-proteins. Tau-positron emission tomography (tau-PET), which employs a radiotracer to selectively bind, detect, and visualise tau protein aggregates within the brain, is valuable for early AD diagnosis but is less accessible due to high costs, limited availability, and its inv… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

  16. arXiv:2406.12095  [pdf, other

    cs.CV cs.AI cs.RO

    DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

    Authors: Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

    Abstract: We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  17. arXiv:2406.11280  [pdf, other

    cs.CV

    i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

    Authors: Daechul Ahn, Yura Choi, San Kim, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in le… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Technical report

  18. arXiv:2406.10996  [pdf, other

    cs.CL

    THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

    Authors: Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, **young Yeo

    Abstract: Large language models (LLMs) are capable of processing lengthy dialogue histories during prolonged interaction with users without additional memory modules; however, their responses tend to overlook or incorrectly recall information from the past. In this paper, we revisit memory-augmented response generation in the era of LLMs. While prior work focuses on getting rid of outdated memories, we argu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Under Review

  19. arXiv:2406.10549  [pdf, other

    eess.AS cs.CL cs.SD

    Lightweight Audio Segmentation for Long-form Speech Translation

    Authors: Jaesong Lee, Soyoon Kim, Hanbyul Kim, Joon Son Chung

    Abstract: Speech segmentation is an essential part of speech translation (ST) systems in real-world scenarios. Since most ST models are designed to process speech segments, long-form audio must be partitioned into shorter segments before translation. Recently, data-driven approaches for the speech segmentation task have been developed. Although the approaches improve overall translation quality, a performan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  20. arXiv:2406.10324  [pdf, other

    cs.CV cs.LG

    L4GM: Large 4D Gaussian Reconstruction Model

    Authors: Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

    Abstract: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/l4gm

  21. GPT4Rec: Graph Prompt Tuning for Streaming Recommendation

    Authors: Peiyan Zhang, Yuchen Yan, Xi Zhang, Liying Kang, Chaozhuo Li, Feiran Huang, Senzhang Wang, Sunghun Kim

    Abstract: In the realm of personalized recommender systems, the challenge of adapting to evolving user preferences and the continuous influx of new users and items is paramount. Conventional models, typically reliant on a static training-test approach, struggle to keep pace with these dynamic demands. Streaming recommendation, particularly through continual graph learning, has emerged as a novel solution. H… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by SIGIR 2024. arXiv admin note: text overlap with arXiv:2303.11700 by other authors

    ACM Class: H.3.3

  22. arXiv:2406.08222  [pdf

    cs.CV cs.AI cs.CY cs.HC

    A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning about Gender and Emotion

    Authors: Sha Luo, Sang Jung Kim, Zening Duan, Kai** Chen

    Abstract: In the evolving landscape of computer vision (CV) technologies, the automatic detection and interpretation of gender and emotion in images is a critical area of study. This paper investigates social biases in CV models, emphasizing the limitations of traditional evaluation metrics such as precision, recall, and accuracy. These metrics often fall short in capturing the complexities of gender and em… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  23. arXiv:2406.07922  [pdf

    cs.CL

    Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA

    Authors: Dongsuk Jang, Hyeryun Park, Jiye Son, Hyeonuk Hwang, Su** Kim, **wook Choi

    Abstract: In the rapidly evolving field of healthcare, the integration of artificial intelligence (AI) has become a pivotal component in the automation of clinical workflows, ushering in a new era of efficiency and accuracy. This study focuses on the transformative capabilities of the fine-tuned KoELECTRA model in comparison to the GPT-4 model, aiming to facilitate automated information extraction from thyr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, 3 tables

    Journal ref: AMIA Joint Summits on Translational Science Proceedings, 2024, pp. 249-257

  24. arXiv:2406.07826  [pdf, other

    cs.LG cs.AI

    The Max-Min Formulation of Multi-Objective Reinforcement Learning: From Theory to a Model-Free Algorithm

    Authors: Giseung Park, Woohyeon Byeon, Seongmin Kim, Elad Havakuk, Amir Leshem, Youngchul Sung

    Abstract: In this paper, we consider multi-objective reinforcement learning, which arises in many real-world problems with multiple optimization goals. We approach the problem with a max-min framework focusing on fairness among the multiple goals and develop a relevant theory and a practical model-free algorithm under the max-min framework. The developed theory provides a theoretical advance in multi-object… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  25. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  26. arXiv:2406.07803  [pdf, other

    cs.SD cs.AI eess.AS

    EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

    Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

    Abstract: Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  27. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-** Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  28. arXiv:2406.06650  [pdf, other

    eess.IV cs.CV

    Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

    Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

    Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  29. arXiv:2406.05965  [pdf, other

    eess.AS cs.AI

    MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance

    Authors: Semin Kim, Myeonghun Jeong, Hyeonseung Lee, Minchan Kim, Byoung ** Choi, Nam Soo Kim

    Abstract: In this paper, we propose MakeSinger, a semi-supervised training method for singing voice synthesis (SVS) via classifier-free diffusion guidance. The challenge in SVS lies in the costly process of gathering aligned sets of text, pitch, and audio data. MakeSinger enables the training of the diffusion-based SVS model from any speech and singing voice data regardless of its labeling, thereby enhancin… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  30. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  31. arXiv:2406.05341  [pdf, other

    eess.AS cs.SD

    Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

    Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

    Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH 2024

  32. arXiv:2406.05270  [pdf

    physics.med-ph cs.CV cs.LG eess.IV

    fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

    Authors: Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

    Abstract: This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.05250  [pdf, other

    cs.AI cs.AR cs.LG

    LLM-Enhanced Bayesian Optimization for Efficient Analog Layout Constraint Generation

    Authors: Guo** Chen, Keren Zhu, Seunggeun Kim, Hanqing Zhu, Yao Lai, Bei Yu, David Z. Pan

    Abstract: Analog layout synthesis faces significant challenges due to its dependence on manual processes, considerable time requirements, and performance instability. Current Bayesian Optimization (BO)-based techniques for analog layout synthesis, despite their potential for automation, suffer from slow convergence and extensive data needs, limiting their practical application. This paper presents the \text… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  34. arXiv:2406.03486  [pdf, other

    cs.CL

    BIPED: Pedagogically Informed Tutoring System for ESL Education

    Authors: Soonwoo Kwon, Sojung Kim, Minju Park, Seunghyun Lee, Kyuseok Kim

    Abstract: Large Language Models (LLMs) have a great potential to serve as readily available and cost-efficient Conversational Intelligent Tutoring Systems (CITS) for teaching L2 learners of English. Existing CITS, however, are designed to teach only simple concepts or lack the pedagogical depth necessary to address diverse learning strategies. To develop a more pedagogically informed CITS capable of teachin… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  35. arXiv:2406.02806  [pdf, other

    cs.LG math.OC stat.ML

    Randomized Geometric Algebra Methods for Convex Neural Networks

    Authors: Yifei Wang, Sungyoon Kim, Paul Chu, Indu Subramaniam, Mert Pilanci

    Abstract: We introduce randomized algorithms to Clifford's Geometric Algebra, generalizing randomized linear algebra to hypercomplex vector spaces. This novel approach has many implications in machine learning, including training neural networks to global optimality via convex optimization. Additionally, we consider fine-tuning large language model (LLM) embeddings as a key application area, exploring the i… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  36. arXiv:2406.02763  [pdf, ps, other

    cs.FL cs.DS

    Indexing Finite-State Automata Using Forward-Stable Partitions

    Authors: Ruben Becker, Sung-Hwan Kim, Nicola Prezza, Carlo Tosoni

    Abstract: An index on a finite-state automaton is a data structure able to locate specific patterns on the automaton's paths and consequently on the regular language accepted by the automaton itself. Cotumaccio and Prezza [SODA '21], introduced a data structure able to solve pattern matching queries on automata, generalizing the famous FM-index for strings of Ferragina and Manzini [FOCS '00]. The efficiency… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 20 pages, 3 figures, submitted in SPIRE 2024

  37. arXiv:2406.02355  [pdf, other

    cs.CV cs.AI cs.DC cs.LG

    FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

    Authors: Seongyoon Kim, Minchan Jeong, Sungnyun Kim, Sungwoo Cho, Sumyeong Ahn, Se-Young Yun

    Abstract: Federated Learning (FL) has emerged as a pivotal framework for the development of effective global models (global FL) or personalized models (personalized FL) across clients with heterogeneous, non-iid data distribution. A key challenge in FL is client drift, where data heterogeneity impedes the aggregation of scattered knowledge. Recent studies have tackled the client drift issue by identifying s… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  38. arXiv:2406.01921  [pdf, other

    cs.IT cs.ET cs.NI cs.PF math.NA

    A Novel Paradigm Shift for Next-Generation: Symbiotic Backscatter Rate-Splitting Multiple Access Systems

    Authors: Thai-Hoc Vu, Daniel Benevides da Costa, Bao Vo Nguyen Quoc, Sunghwan Kim

    Abstract: Next-generation wireless networks are projected to empower a broad range of Internet-of-things (IoT) applications and services with extreme data rates, posing new challenges in delivering large-scale connectivity at a low cost to current communication paradigms. Rate-splitting multiple access (RSMA) is one of the most spotlight nominees, conceived to address spectrum scarcity while reaching massiv… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by IEEE International Conference on Communications and Electronics 2024

  39. arXiv:2406.01079  [pdf, other

    cs.CV cs.AI

    Object Aware Egocentric Online Action Detection

    Authors: Joungbin An, Yunsu Park, Hyolim Kang, Seon Joo Kim

    Abstract: Advancements in egocentric video datasets like Ego4D, EPIC-Kitchens, and Ego-Exo4D have enriched the study of first-person human interactions, which is crucial for applications in augmented reality and assisted living. Despite these advancements, current Online Action Detection methods, which efficiently detect actions in streaming videos, are predominantly designed for exocentric views and thus f… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: CVPR First Joint Egocentric Vision Workshop 2024

  40. arXiv:2406.01020  [pdf, other

    cs.CV

    CLIP-Guided Attribute Aware Pretraining for Generalizable Image Quality Assessment

    Authors: Daekyu Kwon, Dongyoung Kim, Sehwan Ki, Younghyun Jo, Hyong-Euk Lee, Seon Joo Kim

    Abstract: In no-reference image quality assessment (NR-IQA), the challenge of limited dataset sizes hampers the development of robust and generalizable models. Conventional methods address this issue by utilizing large datasets to extract rich representations for IQA. Also, some approaches propose vision language models (VLM) based IQA, but the domain gap between generic VLM and IQA constrains their scalabi… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  41. arXiv:2406.00515  [pdf, other

    cs.CL cs.AI cs.SE

    A Survey on Large Language Models for Code Generation

    Authors: Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, Sunghun Kim

    Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and industry professionals due to its practical significance in software development, e… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  42. arXiv:2405.20652  [pdf, other

    cs.LG

    Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs

    Authors: Langzhang Liang, Sunwoo Kim, Kijung Shin, Zenglin Xu, Shirui Pan, Yuan Qi

    Abstract: Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICML 2024

  43. arXiv:2405.20610  [pdf, other

    cs.CV

    Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation

    Authors: Wooseok Shin, Hyun Joon Park, ** Sob Kim, Sung Won Han

    Abstract: In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatc… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, submitted to IEEE TPAMI. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. arXiv:2405.20574  [pdf, other

    cs.CL cs.AI

    Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

    Authors: Chanjun Park, Hyeonwoo Kim, Dahyun Kim, Seonghwan Cho, Sanghoon Kim, Sukyung Lee, Yungi Kim, Hwalsuk Lee

    Abstract: This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted at ACL 2024 Main

  45. arXiv:2405.20024  [pdf, other

    cs.NI cs.AI

    Applications of Generative AI (GAI) for Mobile and Wireless Networking: A Survey

    Authors: Thai-Hoc Vu, Senthil Kumar Jagatheesaperumal, Minh-Duong Nguyen, Nguyen Van Huynh, Sunghwan Kim, Quoc-Viet Pham

    Abstract: The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  46. arXiv:2405.19961  [pdf, other

    cs.LG

    Collective Variable Free Transition Path Sampling with Generative Flow Network

    Authors: Kiyoung Seong, Seonghyun Park, Seonghwan Kim, Woo Youn Kim, Sungsoo Ahn

    Abstract: Understanding transition paths between meta-stable states in molecular systems is fundamental for material design and drug discovery. However, sampling these paths via molecular dynamics simulations is computationally prohibitive due to the high-energy barriers between the meta-stable states. Recent machine learning approaches are often restricted to simple systems or rely on collective variables… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures, 2 tables

  47. arXiv:2405.19902  [pdf, other

    cs.LG stat.ML

    Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection

    Authors: Suyeon Kim, Dongha Lee, SeongKu Kang, Sukang Chae, Sanghwan Jang, Hwanjo Yu

    Abstract: Label noise, commonly found in real-world datasets, has a detrimental impact on a model's generalization. To effectively detect incorrectly labeled instances, previous works have mostly relied on distinguishable training signals, such as training loss, as indicators to differentiate between clean and noisy labels. However, they have limitations in that the training signals incompletely reveal the… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  48. arXiv:2405.19795  [pdf, other

    cs.CL cs.AI

    SLM as Guardian: Pioneering AI Safety with Small Language Models

    Authors: Ohjoon Kwon, Donghyeon Jeon, Nayoung Choi, Gyu-Hwung Cho, Changbong Kim, Hyunwoo Lee, Inho Kang, Sun Kim, Taiwoo Park

    Abstract: Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful us… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  49. arXiv:2405.19691  [pdf, other

    cs.HC

    Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing

    Authors: Minsun Kim, SeonGyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Hyungseung Lim, Jieun Han, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong, Tak Yeon Lee

    Abstract: While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  50. arXiv:2405.19519  [pdf, other

    cs.CL cs.AI

    Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

    Authors: Sudeshna Das, Yao Ge, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker

    Abstract: Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for qu… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.