Skip to main content

Showing 1–50 of 386 results for author: Kim, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01158  [pdf, other

    cs.CL

    Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

    Authors: Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee

    Abstract: Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlin… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress. Resources are available at https://github.com/youngerous/qtree

  2. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  3. arXiv:2406.14675  [pdf, other

    cs.CV cs.AI cs.LG

    This Looks Better than That: Better Interpretable Models with ProtoPNeXt

    Authors: Frank Willard, Luke Moffett, Emmanuel Mokel, Jon Donnelly, Stark Guo, Julia Yang, Giyoung Kim, Alina Jade Barnett, Cynthia Rudin

    Abstract: Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), w… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.13997  [pdf, other

    cs.CL cs.CE

    "Global is Good, Local is Bad?": Understanding Brand Bias in LLMs

    Authors: Mahammed Kamruzzaman, Hieu Minh Nguyen, Gene Louis Kim

    Abstract: Many recent studies have investigated social biases in LLMs but brand bias has received little attention. This research examines the biases exhibited by LLMs towards different brands, a significant concern given the widespread use of LLMs in affected use cases such as product recommendation and market analysis. Biased models may perpetuate societal inequalities, unfairly favoring established globa… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.13993  [pdf, other

    cs.CL cs.LG

    Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs

    Authors: Mahammed Kamruzzaman, Gene Louis Kim

    Abstract: Persona assignment has become a common strategy for customizing LLM use to particular tasks and contexts. In this study, we explore how perceptions of different nations change when LLMs are assigned specific nationality personas. We assign 193 different nationality personas (e.g., an American person) to four LLMs and examine how the LLM perceptions of countries change. We find that all LLM-persona… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.11823  [pdf, other

    cs.CV cs.CL

    On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning

    Authors: Geewook Kim, Minjoon Seo

    Abstract: Recent advancements in language and vision assistants have showcased impressive capabilities but suffer from a lack of transparency, limiting broader research and reproducibility. While open-source models handle general image tasks effectively, they face challenges with the high computational demands of complex visually-situated text understanding. Such tasks often require increased token inputs a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures

  7. arXiv:2406.08051  [pdf, other

    cs.AR cs.PF

    ONNXim: A Fast, Cycle-level Multi-core NPU Simulator

    Authors: Hyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim

    Abstract: As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is becoming more important. However, existing architectural NPU simulators lack support for high-speed simulation, multi-core modeling, multi-tenant scenarios, detailed DRAM/NoC modeling, and/or different de… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  8. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  9. arXiv:2406.06044  [pdf, other

    cs.CV

    FRAG: Frequency Adapting Group for Diffusion Video Editing

    Authors: Sunjae Yoon, Gwanhyeong Koo, Geonwoo Kim, Chang D. Yoo

    Abstract: In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 16 pages, 16 figures, ICML 2024

  10. arXiv:2406.05606  [pdf, other

    cs.CL

    GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?

    Authors: Dayoon Ko, **young Kim, Hahyeon Choi, Gunhee Kim

    Abstract: In the real world, knowledge is constantly evolving, which can render existing knowledge-based datasets outdated. This unreliability highlights the critical need for continuous updates to ensure both accuracy and relevance in knowledge-intensive tasks. To address this, we propose GrowOVER-QA and GrowOVER-Dialogue, dynamic open-domain QA and dialogue benchmarks that undergo a continuous cycle of up… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Main

  11. arXiv:2406.05270  [pdf

    physics.med-ph cs.CV cs.LG eess.IV

    fastMRI Breast: A publicly available radial k-space dataset of breast dynamic contrast-enhanced MRI

    Authors: Eddy Solomon, Patricia M. Johnson, Zhengguo Tan, Radhika Tibrewala, Yvonne W. Lui, Florian Knoll, Linda Moy, Sungheon Gene Kim, Laura Heacock

    Abstract: This data curation work introduces the first large-scale dataset of radial k-space and DICOM data for breast DCE-MRI acquired in diagnostic breast MRI exams. Our dataset includes case-level labels indicating patient age, menopause status, lesion status (negative, benign, and malignant), and lesion type for each case. The public availability of this dataset and accompanying reconstruction code will… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  12. arXiv:2406.02562  [pdf, other

    eess.AS cs.AI cs.CL

    Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

    Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

    Abstract: In recent times, there has been a growing interest in utilizing personalized large models on low-spec devices, such as mobile and CPU-only devices. However, utilizing a personalized large model in the on-device is inefficient, and sometimes limited due to computational cost. To tackle the problem, this paper presents the weights separation method to minimize on-device model weights using parameter… ▽ More

    Submitted 23 April, 2024; originally announced June 2024.

    Comments: Table 2 is revised

    Journal ref: ICASSP 2024 Workshop(HSCMA 2024) paper

  13. arXiv:2406.01592  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    Text-guided Controllable Mesh Refinement for Interactive 3D Modeling

    Authors: Yun-Chun Chen, Selena Ling, Zhiqin Chen, Vladimir G. Kim, Matheus Gadelha, Alec Jacobson

    Abstract: We propose a novel technique for adding geometric details to an input coarse 3D mesh guided by a text prompt. Our method is composed of three stages. First, we generate a single-view RGB image conditioned on the input coarse geometry and the input text prompt. This single-view image generation step allows the user to pre-visualize the result and offers stronger conditioning for subsequent multi-vi… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://text-mesh-refinement.github.io

  14. arXiv:2406.00014  [pdf, other

    cs.DB cs.AI cs.CL cs.IR

    KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR

    Authors: Hajung Kim, Chanhwi Kim, Hoonick Lee, Kyochul Jang, Jiwoo Lee, Kyungjae Lee, Gangwoo Kim, Jaewoo Kang

    Abstract: Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handle… ▽ More

    Submitted 19 June, 2024; v1 submitted 21 May, 2024; originally announced June 2024.

    Comments: Published at ClinicalNLP workshop @ NAACL 2024

  15. arXiv:2405.20821  [pdf, other

    cs.LG cs.DC stat.ML

    Pursuing Overall Welfare in Federated Learning through Sequential Decision Making

    Authors: Seok-Ju Hahn, Gi-Soo Kim, Junghye Lee

    Abstract: In traditional federated learning, a single global model cannot perform equally well for all clients. Therefore, the need to achieve the client-level fairness in federated system has been emphasized, which can be realized by modifying the static aggregation scheme for updating the global model to an adaptive one, in response to the local signals of the participating clients. Our work reveals that… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  16. arXiv:2405.19380  [pdf, other

    stat.ML cs.LG eess.SY

    Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

    Authors: Yeoneung Kim, Gihun Kim, Insoon Yang

    Abstract: We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby acc… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 61 pages, 6 figures

  17. arXiv:2405.18758  [pdf, other

    cs.LG cs.AI

    Learning to Continually Learn with the Bayesian Principle

    Authors: Soochan Lee, Hyeonseong Jeon, Jaehyeon Son, Gunhee Kim

    Abstract: In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch trainin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  18. arXiv:2405.18027  [pdf, other

    cs.CL

    TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

    Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, **-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

    Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

  19. arXiv:2405.13762  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

    Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

    Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  20. arXiv:2405.11807  [pdf, other

    cs.HC cs.RO eess.SY

    Dual-sided Peltier Elements for Rapid Thermal Feedback in Wearables

    Authors: Seongjun Kang, Gwangbin Kim, Seokhyun Hwang, Jeongju Park, Ahmed Elsharkawy, SeungJun Kim

    Abstract: This paper introduces a motor-driven Peltier device designed to deliver immediate thermal sensations within extended reality (XR) environments. The system incorporates eight motor-driven Peltier elements, facilitating swift transitions between warm and cool sensations by rotating preheated or cooled elements to opposite sides. A multi-layer structure, comprising aluminum and silicone layers, ensur… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 3 pages, 4 figures, ICRA Wearable Workshop 2024 - 1st Workshop on Advancing Wearable Devices and Applications through Novel Design, Sensing, Actuation, and AI

  21. arXiv:2405.11802  [pdf, other

    cs.HC cs.AI cs.LG

    Counterfactual Explanation-Based Badminton Motion Guidance Generation Using Wearable Sensors

    Authors: Minwoo Seong, Gwangbin Kim, Yumin Kang, Junhyuk Jang, Joseph DelPreto, SeungJun Kim

    Abstract: This study proposes a framework for enhancing the stroke quality of badminton players by generating personalized motion guides, utilizing a multimodal wearable dataset. These guides are based on counterfactual algorithms and aim to reduce the performance gap between novice and expert players. Our approach provides joint-level guidance through visualizable data to assist players in improving their… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: ICRA Wearable Workshop 2024 - 1st Workshop on Advancing Wearable Devices and Applications through Novel Design, Sensing, Actuation, and AI

  22. arXiv:2405.05749  [pdf, other

    cs.CV

    NeRFFaceSpeech: One-shot Audio-driven 3D Talking Head Synthesis via Generative Prior

    Authors: Gihoon Kim, Kwanggyoon Seo, Sihun Cha, Junyong Noh

    Abstract: Audio-driven talking head generation is advancing from 2D to 3D content. Notably, Neural Radiance Field (NeRF) is in the spotlight as a means to synthesize high-quality 3D talking head outputs. Unfortunately, this NeRF-based approach typically requires a large number of paired audio-visual data for each identity, thereby limiting the scalability of the method. Although there have been attempts to… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  23. arXiv:2405.01016  [pdf, other

    cs.CV cs.AI

    Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction

    Authors: Minsu Kim, Giseop Kim, Sunwook Choi

    Abstract: Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable map** of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause signifi… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  24. arXiv:2405.00260  [pdf, other

    cs.CV

    CREPE: Coordinate-Aware End-to-End Document Parser

    Authors: Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim, Moon Bin Yim, Seunghyun Park, Bado Lee

    Abstract: In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OC… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024) main conference

  25. arXiv:2404.19381  [pdf, other

    cs.AR

    Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

    Authors: Hyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, **hoon Bae, Eunhyeok Park, Hyo** Sung, Euicheol Lim, Gwangsun Kim

    Abstract: To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overc… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  26. arXiv:2404.17218  [pdf, other

    cs.CL

    Prompting Techniques for Reducing Social Bias in LLMs through System 1 and System 2 Cognitive Processes

    Authors: Mahammed Kamruzzaman, Gene Louis Kim

    Abstract: Dual process theory posits that human cognition arises via two systems. System 1, which is a quick, emotional, and intuitive process, which is subject to cognitive biases, and System 2, a slow, onerous, and deliberate process. NLP researchers often compare zero-shot prompting in LLMs to System 1 reasoning and chain-of-thought (CoT) prompting to System 2. In line with this interpretation, prior res… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  27. arXiv:2404.16804  [pdf, other

    cs.CV cs.AI cs.LG

    AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

    Authors: Gahyeon Kim, Sohee Kim, Seokju Lee

    Abstract: Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improve… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 Workshop on Prompting in Vision, Project Page: https://github.com/Gahyeonkim09/AAPL

  28. arXiv:2404.16292  [pdf, other

    cs.GR cs.CV cs.LG

    One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns

    Authors: Arman Maesumi, Dylan Hu, Krishi Saripalli, Vladimir G. Kim, Matthew Fisher, Sören Pirk, Daniel Ritchie

    Abstract: Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit "natural" random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: In ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2024, 21 pages

  29. arXiv:2404.15707  [pdf, other

    cs.CV

    ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images

    Authors: **seo Jeong, Junseo Koo, Qimeng Zhang, Gunhee Kim

    Abstract: Existing NeRF-based inverse rendering methods suppose that scenes are exclusively illuminated by distant light sources, neglecting the potential influence of emissive sources within a scene. In this work, we confront this limitation using LDR multi-view images captured with emissive sources turned on and off. Two key issues must be addressed: 1) ambiguity arising from the limited dynamic range alo… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  30. arXiv:2404.11925  [pdf, other

    cs.LG cs.AI cs.CV

    EdgeFusion: On-Device Text-to-Image Generation

    Authors: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim

    Abstract: The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 4 pages, accepted to CVPR24 First Workshop on Efficient and On-Device Generation (EDGE)

  31. arXiv:2404.04682  [pdf, other

    cs.LG cs.AI cs.RO

    Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning

    Authors: Yeda Song, Dongwook Lee, Gunhee Kim

    Abstract: Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporatin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  32. arXiv:2404.04544  [pdf, other

    cs.CV cs.AI

    BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

    Authors: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

    Abstract: Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Project page: https://janeyeon.github.io/beyond-scene

  33. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  34. arXiv:2403.15209  [pdf, other

    cs.CV

    MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

    Authors: Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

    Abstract: Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in certain cases (e.g., thermal-obscured pedestrians), particularly due to the modality bias learned from statistically biased datasets. In this paper, we investigate how to mitigate moda… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  35. Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory

    Authors: Jeongmin Hong, Sungjun Cho, Geonwoo Park, Wonhyuk Yang, Young-Ho Gong, Gwangsun Kim

    Abstract: We propose overcoming the memory capacity limitation of GPUs with high-capacity Storage-Class Memory (SCM) and DRAM cache. By significantly increasing the memory capacity with SCM, the GPU can capture a larger fraction of the memory footprint than HBM for workloads that oversubscribe memory, achieving high speedups. However, the DRAM cache needs to be carefully designed to address the latency and… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Published in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA'24)

  36. arXiv:2403.04207  [pdf, other

    cs.LG cs.DC

    HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning

    Authors: Gyudong Kim, Mehdi Ghasemi, Soroush Heidari, Seungryong Kim, Young Geun Kim, Sarma Vrudhula, Carole-Jean Wu

    Abstract: Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device. In FL, participating user-end devices are highly fragmented in terms of hardware and software configurations. Such fragmentation introduces a new type of data heterogeneity in FL, namely \textit{system-induced data heterogen… ▽ More

    Submitted 10 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  37. arXiv:2403.02460  [pdf, other

    cs.GR

    MagicClay: Sculpting Meshes With Generative Neural Fields

    Authors: Amir Barda, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, Thibault Groueix

    Abstract: The recent developments in neural fields have brought phenomenal capabilities to the field of shape generation, but they lack crucial properties, such as incremental control - a fundamental requirement for artistic work. Triangular meshes, on the other hand, are the representation of choice for most geometry related tasks, offering efficiency and intuitive control, but do not lend themselves to ne… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: project page: https://amir90.github.io/MagicClay.github.io/

  38. arXiv:2403.01300  [pdf, other

    cs.CV

    Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

    Authors: Taeheon Kim, Sebin Shin, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

    Abstract: RGBT multispectral pedestrian detection has emerged as a promising solution for safety-critical applications that require day/night operations. However, the modality bias problem remains unsolved as multispectral pedestrian detectors learn the statistical bias in datasets. Specifically, datasets in multispectral pedestrian detection mainly distribute between ROTO (day) and RXTO (night) data; the m… ▽ More

    Submitted 5 April, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  39. NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

    Authors: Guseul Heo, Sangyeop Lee, Jaehong Cho, Hyunmin Choi, Sanghyeon Lee, Hyungkyu Ham, Gwangsun Kim, Divya Mahajan, Jongse Park

    Abstract: Modern transformer-based Large Language Models (LLMs) are constructed with a series of decoder blocks. Each block comprises three key components: (1) QKV generation, (2) multi-head attention, and (3) feed-forward networks. In batched processing, QKV generation and feed-forward networks involve compute-intensive matrix-matrix multiplications (GEMM), while multi-head attention requires bandwidth-hea… ▽ More

    Submitted 29 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 16 pages, 15 figures

    Journal ref: ASPLOS 2024

  40. arXiv:2402.16994  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    GEM3D: GEnerative Medial Abstractions for 3D Shape Synthesis

    Authors: Dmitry Petrov, Pradyumn Goyal, Vikas Thamizharasan, Vladimir G. Kim, Matheus Gadelha, Melinos Averkiou, Siddhartha Chaudhuri, Evangelos Kalogerakis

    Abstract: We introduce GEM3D -- a new deep, topology-aware generative model of 3D shapes. The key ingredient of our method is a neural skeleton-based representation encoding information on both shape topology and geometry. Through a denoising diffusion probabilistic model, our method first generates skeleton-based representations following the Medial Axis Transform (MAT), then generates surfaces through a s… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Webpage: https://lodurality.github.io/GEM3D/ -- Cond. accept. to SIGGRAPH 2024 (conf. track) -- Changes (based on reviews): changed style to sigconf; rearranged figures for readability; added missing citations; fixed misaligned centers in Fig. 3; added failure cases (Fig. 10); rewrote discussion; added categories averages to Tab. 8; added Tab. 10 with model capacities

  41. arXiv:2402.12842  [pdf, other

    cs.CL cs.AI cs.LG

    PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

    Authors: Gyeongman Kim, Doohyuk Jang, Eunho Yang

    Abstract: Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Code: https://github.com/gmkim-ai/PromptKD

  42. arXiv:2402.11827  [pdf, other

    cs.IR cs.CL

    Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

    Authors: Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon, Sungdong Kim, Yohan Jo, Jaewoo Kang

    Abstract: Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results.… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 8 pages

  43. arXiv:2402.11201  [pdf, other

    cs.CV

    A Decoding Scheme with Successive Aggregation of Multi-Level Features for Light-Weight Semantic Segmentation

    Authors: Jiwon Yoo, Jangwon Lee, Gyeonghwan Kim

    Abstract: Multi-scale architecture, including hierarchical vision transformer, has been commonly applied to high-resolution semantic segmentation to deal with computational complexity with minimum performance loss. In this paper, we propose a novel decoding scheme for semantic segmentation in this regard, which takes multi-level features from the encoder with multi-scale architecture. The decoding scheme ba… ▽ More

    Submitted 14 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures, ICIP2024 Accepted paper

  44. arXiv:2402.06440  [pdf, other

    cs.CR

    A Method for Decrypting Data Infected with Rhysida Ransomware

    Authors: Giyoon Kim, Soo** Kang, Seungjun Baek, Kimoon Kim, Jongsung Kim

    Abstract: Ransomware is malicious software that is a prominent global cybersecurity threat. Typically, ransomware encrypts data on a system, rendering the victim unable to decrypt it without the attacker's private key. Subsequently, victims often pay a substantial ransom to recover their data, yet some may still incur damage or loss. This study examines Rhysida ransomware, which caused significant damage in… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  45. arXiv:2402.02834  [pdf, other

    cs.LG cs.CL

    Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods

    Authors: Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song

    Abstract: Structured pruning of modern large language models (LLMs) has emerged as a way of decreasing their high computational needs. Width pruning reduces the size of projection weight matrices (e.g., by removing attention heads) while maintaining the number of layers. Depth pruning, in contrast, removes entire layers or blocks, while kee** the size of the remaining weights unchanged. Most current resea… ▽ More

    Submitted 23 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Update (arXiv-v2): continued pretraining for severe pruning ratios, compatibility with quantization, and enhanced baselines. Preliminary work (arXiv-v1) accepted at ICLR 2024 Workshop on ME-FoMo: https://openreview.net/forum?id=18VGxuOdpu

  46. arXiv:2401.17547  [pdf, other

    cs.CV

    Task-Oriented Diffusion Model Compression

    Authors: Geonung Kim, Beomsu Kim, Eunhyeok Park, Sunghyun Cho

    Abstract: As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper,… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  47. Quantum-Secure Hybrid Blockchain System for DID-based Verifiable Random Function with NTRU Linkable Ring Signature

    Authors: Bong Gon Kim, Dennis Wong, Yoon Seok Yang

    Abstract: In this study, we present a secure smart contract-based Verifiable Random Function (VRF) model, addressing the shortcomings of existing systems. As quantum computing emerges, conventional public key cryptography faces potential vulnerabilities. To enhance our VRF's robustness, we employ post-quantum Ring-LWE encryption for generating pseudo-random sequences. Given the computational intensity of th… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 25 pages, 5 figures, 2023 International Journal on Cryptography and Information Security (IJCIS). arXiv admin note: text overlap with arXiv:2311.11734

    Journal ref: Volume 13, Number 4, December 2023

  48. arXiv:2401.13191  [pdf, other

    cs.CV

    Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model

    Authors: Yuanming Li, Gwantae Kim, Jeong-gi Kwak, Bon-hwa Ku, Hanseok Ko

    Abstract: Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face landmark detection in other domains (e.g. cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage training approach that effectively leverages limited data… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 6 pages, ICASSP 2024 accepted

  49. arXiv:2401.09787  [pdf, other

    cs.LG cs.AI stat.ML

    Querying Easily Flip-flopped Samples for Deep Active Learning

    Authors: Seong ** Cho, Gwangsu Kim, Junghyun Lee, **woo Shin, Chang D. Yoo

    Abstract: Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty… ▽ More

    Submitted 16 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 34 pages, 17 figures, 5 tables. Accepted to the 12th International Conference on Learning Representations (ICLR 2024) (ver2: fixed some typos and improved some parts of the writing)

  50. arXiv:2401.08962  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

    Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

    Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.