Skip to main content

Showing 1–50 of 236 results for author: Yun, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00693  [pdf, other

    cs.AI cs.CL cs.LG

    BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

    Authors: Gihun Lee, Minchan Jeong, Yu** Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

    Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: under review

  2. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, **hong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  3. arXiv:2406.18815  [pdf, other

    cs.LG

    MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation

    Authors: Sanggeon Yun, Ryozo Masukawa, Minhyoung Na, Mohsen Imani

    Abstract: In the context of escalating safety concerns across various domains, the tasks of Video Anomaly Detection (VAD) and Video Anomaly Recognition (VAR) have emerged as critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc. These tasks, aimed at identifying and classifying deviations from normal behavior in video data, face significant challen… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.16758  [pdf, other

    cs.CL

    Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

    Authors: Euiin Yi, Taehyeon Kim, Hongseok Jeung, Du-Seong Chang, Se-Young Yun

    Abstract: Large language models (LLMs) have revolutionized natural language processing and broadened their applicability across diverse commercial applications. However, the deployment of these models is constrained by high inference time in multilingual settings. To mitigate this challenge, this paper explores a training recipe of an assistant model in speculative decoding, which are leveraged to draft and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.02657  [pdf, other

    cs.CL cs.AI cs.LG

    Block Transformer: Global-to-Local Language Modeling for Fast Inference

    Authors: Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik Jo, Yireun Kim, Tal Schuster, Adam Fisch, James Thorne, Se-Young Yun

    Abstract: This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inferenc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures, 5 tables

  6. arXiv:2406.02355  [pdf, other

    cs.CV cs.AI cs.DC cs.LG

    FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

    Authors: Seongyoon Kim, Minchan Jeong, Sungnyun Kim, Sungwoo Cho, Sumyeong Ahn, Se-Young Yun

    Abstract: Federated Learning (FL) has emerged as a pivotal framework for the development of effective global models (global FL) or personalized models (personalized FL) across clients with heterogeneous, non-iid data distribution. A key challenge in FL is client drift, where data heterogeneity impedes the aggregation of scattered knowledge. Recent studies have tackled the client drift issue by identifying s… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  7. arXiv:2406.02021  [pdf, other

    cs.CV cs.AI cs.LG

    MetaMixer Is All You Need

    Authors: Seokju Yun, Dongheon Lee, Youngmin Ro

    Abstract: Transformer, composed of self-attention and Feed-Forward Network, has revolutionized the landscape of network design across various vision tasks. FFN is a versatile operator seamlessly integrated into nearly all AI models to effectively harness rich representations. Recent works also show that FFN functions like key-value memories. Thus, akin to the query-key-value mechanism within self-attention,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/ysj9909/FFNet

  8. arXiv:2405.19806  [pdf, other

    cs.LG

    Preference Alignment with Flow Matching

    Authors: Minu Kim, Yongsik Lee, Sehyeok Kang, Jihwan Oh, Song Chong, Seyoung Yun

    Abstract: We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs lik… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  9. arXiv:2405.18027  [pdf, other

    cs.CL

    TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

    Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, **-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

    Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

  10. arXiv:2405.17995  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture

    Authors: Shentong Mo, Sukmin Yun

    Abstract: The joint-embedding predictive architecture (JEPA) recently has shown impressive results in extracting visual representations from unlabeled imagery under a masking strategy. However, we reveal its disadvantages, notably its insufficient understanding of local semantics. This deficiency originates from masked modeling in the embedding space, resulting in a reduction of discriminative power and can… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  11. arXiv:2405.16907  [pdf, other

    cs.AI cs.LG

    GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

    Authors: Jaewoo Lee, Su** Yun, Taeyoung Yun, **kyoo Park

    Abstract: Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the qualit… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted (Spotlight) to ICLR 2024 Workshop on Generative Models for Decision Making. Jaewoo Lee and Su** Yun are equal contribution authors

  12. arXiv:2405.13396  [pdf, other

    cs.LG stat.ML

    Why In-Context Learning Transformers are Tabular Data Classifiers

    Authors: Felix den Breejen, Sangmin Bae, Stephen Cha, Se-Young Yun

    Abstract: The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 pages main body, 22 pages total. Preprint under review

  13. arXiv:2405.07857  [pdf, other

    cs.CV cs.AI

    Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs

    Authors: Mingyu Kim, Jun-Seong Kim, Se-Young Yun, **-Hwa Kim

    Abstract: The multi-plane representation has been highlighted for its fast training and inference across static and dynamic neural radiance fields. This approach constructs relevant features via projection onto learnable grids and interpolating adjacent vertices. However, it has limitations in capturing low-frequency details and tends to overuse parameters for low-frequency features due to its bias toward f… ▽ More

    Submitted 5 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: ICML2024 ; Project page is accessible at https://mingyukim87.github.io/SynergyNeRF ; Code is available at https://github.com/MingyuKim87/SynergyNeRF

  14. arXiv:2405.04819  [pdf, other

    cs.CL cs.AI

    DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

    Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

    Abstract: Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on… ▽ More

    Submitted 12 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: Under Review; Incorrect author name revised

  15. arXiv:2405.04497  [pdf, other

    cs.HC

    Unveiling Disparities in Web Task Handling Between Human and Web Agent

    Authors: Kihoon Son, **hyeon Kwon, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim

    Abstract: With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizabili… ▽ More

    Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  16. arXiv:2405.01686  [pdf, other

    cs.CL cs.AI

    Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

    Authors: Hye Sun Yun, David Pogrebitskiy, Iain J. Marshall, Byron C. Wallace

    Abstract: Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 24 pages, 7 figures, 6 tables

  17. arXiv:2405.01588  [pdf, other

    cs.CL cs.AI

    Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL

    Authors: Yong** Yang, Sihyeon Kim, SangMook Kim, Gyubok Lee, Se-Young Yun, Edward Choi

    Abstract: Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identi… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

    Comments: DPFM Workshop, ICLR 2024

  18. arXiv:2404.17507  [pdf, other

    cs.CV

    HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

    Authors: Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun

    Abstract: In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our appr… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 28pages, 4.5MB

  19. arXiv:2404.14202  [pdf, other

    cs.LG stat.ML

    An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

    Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun

    Abstract: In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  20. arXiv:2404.13949  [pdf, other

    cs.CV cs.RO

    PeLiCal: Targetless Extrinsic Calibration via Penetrating Lines for RGB-D Cameras with Limited Co-visibility

    Authors: Jaeho Shin, Seungsang Yun, Ayoung Kim

    Abstract: RGB-D cameras are crucial in robotic perception, given their ability to produce images augmented with depth data. However, their limited FOV often requires multiple cameras to cover a broader area. In multi-camera RGB-D setups, the goal is typically to reduce camera overlap, optimizing spatial coverage with as few cameras as possible. The extrinsic calibration of these systems introduces additiona… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  21. arXiv:2404.11848  [pdf, other

    cs.CV

    Partial Large Kernel CNNs for Efficient Super-Resolution

    Authors: Dongheon Lee, Seokju Yun, Youngmin Ro

    Abstract: Recently, in the super-resolution (SR) domain, transformers have outperformed CNNs with fewer FLOPs and fewer parameters since they can deal with long-range dependency and adaptively adjust weights based on instance. In this paper, we demonstrate that CNNs, although less focused on in the current SR domain, surpass Transformers in direct efficiency measures. By incorporating the advantages of Tran… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  22. arXiv:2404.11025  [pdf, other

    cs.CV

    NeuroHash: A Hyperdimensional Neuro-Symbolic Framework for Spatially-Aware Image Hashing and Retrieval

    Authors: Sanggeon Yun, Ryozo Masukawa, SungHeon Jeong, Mohsen Imani

    Abstract: Customizable image retrieval from large datasets remains a critical challenge, particularly when preserving spatial relationships within images. Traditional hashing methods, primarily based on deep learning, often fail to capture spatial information adequately and lack transparency. In this paper, we introduce NeuroHash, a novel neuro-symbolic framework leveraging Hyperdimensional Computing (HDC)… ▽ More

    Submitted 22 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  23. arXiv:2404.10308  [pdf, other

    cs.LG cs.AI

    Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

    Authors: Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, Jung-Woo Ha, **woo Shin

    Abstract: Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted to ICLR 2024. The first two authors contributed equally

  24. arXiv:2404.09207  [pdf, other

    cs.LG

    DEGNN: Dual Experts Graph Neural Network Handling Both Edge and Node Feature Noise

    Authors: Tai Hasegawa, Sukwon Yun, Xin Liu, Yin Jun Phua, Tsuyoshi Murata

    Abstract: Graph Neural Networks (GNNs) have achieved notable success in various applications over graph data. However, recent research has revealed that real-world graphs often contain noise, and GNNs are susceptible to noise in the graph. To address this issue, several Graph Structure Learning (GSL) models have been introduced. While GSL models are tailored to enhance robustness against edge noise through… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: PAKDD 2024, the code is available at https://github.com/TaiHasegawa/DEGNN

  25. arXiv:2403.19522  [pdf, other

    cs.LG cs.CV

    Model Stock: All we need is just a few fine-tuned models

    Authors: Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

    Abstract: This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the we… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code at https://github.com/naver-ai/model-stock

  26. arXiv:2403.18260  [pdf, other

    cs.CV cs.CL

    Toward Interactive Regional Understanding in Vision-Large Language Models

    Authors: Jungbeom Lee, Sanghyuk Chun, Sangdoo Yun

    Abstract: Recent Vision-Language Pre-training (VLP) models have demonstrated significant advancements. Nevertheless, these models heavily rely on image-text pairs that capture only coarse and global information of an image, leading to a limitation in their regional understanding ability. In this work, we introduce \textbf{RegionVLM}, equipped with explicit regional modeling capabilities, allowing them to un… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 Main Conference

  27. arXiv:2403.14027  [pdf, other

    cs.CV

    EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration

    Authors: Wenjun Huang, Hanning Chen, Yang Ni, Arghavan Rezvani, Sanggeon Yun, Sungheon Jeon, Eric Pedley, Mohsen Imani

    Abstract: Detecting marine objects inshore presents challenges owing to algorithmic intricacies and complexities in system deployment. We propose a difficulty-aware edge-cloud collaborative sensing system that splits the task into object localization and fine-grained classification. Objects are classified either at the edge or within the cloud, based on their estimated difficulty. The framework comprises a… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  28. arXiv:2403.13298  [pdf, other

    cs.CV cs.LG

    Rotary Position Embedding for Vision Transformer

    Authors: Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun

    Abstract: Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 20 pages, 5 figures

  29. arXiv:2403.08108  [pdf, other

    cs.CV

    TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection

    Authors: Hanning Chen, Wenjun Huang, Yang Ni, Sanggeon Yun, Fei Wen, Hugo Latapie, Mohsen Imani

    Abstract: Task-oriented object detection aims to find objects suitable for accomplishing specific tasks. As a challenging task, it requires simultaneous visual data processing and reasoning under ambiguous semantics. Recent solutions are mainly all-in-one models. However, the object detection backbones are pre-trained without text supervision. Thus, to incorporate task requirements, their intricate models u… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  30. arXiv:2403.06342  [pdf, other

    math.NA cs.LG

    Separable Physics-informed Neural Networks for Solving the BGK Model of the Boltzmann Equation

    Authors: Jaemin Oh, Seung Yeon Cho, Seok-Bae Yun, Eunbyung Park, Youngjoon Hong

    Abstract: In this study, we introduce a method based on Separable Physics-Informed Neural Networks (SPINNs) for effectively solving the BGK model of the Boltzmann equation. While the mesh-free nature of PINNs offers significant advantages in handling high-dimensional partial differential equations (PDEs), challenges arise when applying quadrature rules for accurate integral evaluation in the BGK operator, w… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    MSC Class: 68T20; 35R09

  31. arXiv:2403.05973  [pdf, other

    cs.CL cs.AI cs.LG

    Calibrating Large Language Models Using Their Generations Only

    Authors: Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

    Abstract: As large language models (LLMs) are increasingly deployed in user-facing applications, building trust and maintaining safety by accurately quantifying a model's confidence in its prediction becomes even more important. However, finding effective ways to calibrate LLMs - especially when the only interface to the models is their generated text - remains a challenge. We propose APRICOT (auxiliary pre… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  32. arXiv:2403.05763  [pdf, other

    cs.AR cs.AI cs.LG

    HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning

    Authors: Hanning Chen, Yang Ni, Ali Zakeri, Zhuowen Zou, Sanggeon Yun, Fei Wen, Behnam Khaleghi, Narayan Srinivasa, Hugo Latapie, Mohsen Imani

    Abstract: In recent times, a plethora of hardware accelerators have been put forth for graph learning applications such as vertex classification and graph classification. However, previous works have paid little attention to Knowledge Graph Completion (KGC), a task that is well-known for its significantly higher algorithm complexity. The state-of-the-art KGC solutions based on graph convolution neural netwo… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  33. arXiv:2402.12991  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification

    Authors: Martin Gubri, Dennis Ulmer, Hwaran Lee, Sangdoo Yun, Seong Joon Oh

    Abstract: Large Language Model (LLM) services and models often come with legal rules on who can use them and how they must use them. Assessing the compliance of the released LLMs is crucial, as these rules protect the interests of the LLM contributor and prevent misuse. In this context, we describe the novel fingerprinting problem of Black-box Identity Verification (BBIV). The goal is to determine whether a… ▽ More

    Submitted 6 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (findings)

  34. Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning

    Authors: Haeju Lee, Minchan Jeong, Se-Young Yun, Kee-Eung Kim

    Abstract: Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in a multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: The first two authors equally contributed to this work. Findings of EMNLP 2023

  35. arXiv:2402.06974  [pdf, other

    cs.LG

    Hypernetwork-Driven Model Fusion for Federated Domain Generalization

    Authors: Marc Bartholet, Taehyeon Kim, Ami Beuret, Se-Young Yun, Joachim M. Buhmann

    Abstract: Federated Learning (FL) faces significant challenges with domain shifts in heterogeneous data, degrading performance. Traditional domain generalization aims to learn domain-invariant features, but the federated nature of model averaging often limits this due to its linear aggregation of local learning. To address this, we propose a robust framework, coined as hypernetwork-based Federated Fusion (h… ▽ More

    Submitted 28 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  36. arXiv:2402.05353  [pdf, other

    cs.LG cs.DC

    Revisiting Early-Learning Regularization When Federated Learning Meets Noisy Labels

    Authors: Taehyeon Kim, Donggyu Kim, Se-Young Yun

    Abstract: In the evolving landscape of federated learning (FL), addressing label noise presents unique challenges due to the decentralized and diverse nature of data collection across clients. Traditional centralized learning approaches to mitigate label noise are constrained in FL by privacy concerns and the heterogeneity of client data. This paper revisits early-learning regularization, introducing an inn… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  37. arXiv:2402.03898  [pdf, other

    cs.CL cs.AI cs.LG

    DistiLLM: Towards Streamlined Distillation for Large Language Models

    Authors: Jongwoo Ko, Sungnyun Kim, Tianyi Chen, Se-Young Yun

    Abstract: Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to addre… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/jongwooko/distillm

  38. arXiv:2402.02043  [pdf, other

    cs.LG cs.AI cs.NI

    A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data Transmission

    Authors: Wenjun Huang, Arghavan Rezvani, Hanning Chen, Yang Ni, Sanggeon Yun, Sungheon Jeong, Mohsen Imani

    Abstract: Applications in the Internet of Things (IoT) utilize machine learning to analyze sensor-generated data. However, a major challenge lies in the lack of targeted intelligence in current sensing systems, leading to vast data generation and increased computational and communication costs. To address this challenge, we propose a novel sensing module to equip sensing frameworks with intelligent data tra… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: 14 pages, 6 figures

  39. arXiv:2401.16456  [pdf, other

    cs.CV

    SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

    Authors: Seokju Yun, Youngmin Ro

    Abstract: Recently, efficient Vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner… ▽ More

    Submitted 27 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: CVPR 2024

  40. arXiv:2401.10267  [pdf, other

    cs.AR cs.AI

    HyperSense: Hyperdimensional Intelligent Sensing for Energy-Efficient Sparse Data Processing

    Authors: Sanggeon Yun, Hanning Chen, Ryozo Masukawa, Hamza Errahmouni Barkam, Andrew Ding, Wenjun Huang, Arghavan Rezvani, Shaahin Angizi, Mohsen Imani

    Abstract: Introducing HyperSense, our co-designed hardware and software system efficiently controls Analog-to-Digital Converter (ADC) modules' data generation rate based on object presence predictions in sensor data. Addressing challenges posed by escalating sensor quantities and data rates, HyperSense reduces redundant digital data using energy-efficient low-precision ADC, diminishing machine learning syst… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  41. arXiv:2312.11890  [pdf, other

    cs.CL cs.SI

    Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction

    Authors: Unggi Lee, Sungjun Yoon, Joon Seo Yun, Kyoungsoo Park, YoungHoon Jung, Damji Stratton, Hyeoncheol Kim

    Abstract: This paper presents novel techniques for enhancing the performance of knowledge tracing (KT) models by focusing on the crucial factor of question and concept difficulty level. Despite the acknowledged significance of difficulty, previous KT research has yet to exploit its potential for model optimization and has struggled to predict difficulty from unseen data. To address these problems, we propos… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 10 pages, 4 figures, 2 tables

  42. arXiv:2312.11260  [pdf, other

    cs.CV cs.AI

    Leveraging Normalization Layer in Adapters With Progressive Learning and Adaptive Distillation for Cross-Domain Few-Shot Learning

    Authors: Yong** Yang, Taehyeon Kim, Se-Young Yun

    Abstract: Cross-domain few-shot learning presents a formidable challenge, as models must be trained on base classes and then tested on novel classes from various domains with only a few samples at hand. While prior approaches have primarily focused on parameter-efficient methods of using adapters, they often overlook two critical issues: shifts in batch statistics and noisy sample statistics arising from do… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 38th AAAI Conference on Artificial Intelligence (AAAI'24)

  43. arXiv:2312.03414  [pdf, other

    cs.LG cs.CL

    Compressed Context Memory For Online Language Model Interaction

    Authors: Jang-Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song

    Abstract: This paper presents a context key/value compression method for Transformer language models in online scenarios, where the context continually expands. As the context lengthens, the attention process demands increasing memory and computations, which in turn reduces the throughput of the language model. To address this challenge, we propose a compressed context memory system that continually compres… ▽ More

    Submitted 6 February, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: ICLR 2024. Add streaming setting results and training set analyses

  44. arXiv:2312.01998  [pdf, other

    cs.CV cs.IR

    Language-only Efficient Training of Zero-shot Composed Image Retrieval

    Authors: Geonmo Gu, Sanghyuk Chun, Wonjae Kim, Yoohoon Kang, Sangdoo Yun

    Abstract: Composed image retrieval (CIR) task takes a composed query of image and text, aiming to search relative images for both conditions. Conventional CIR approaches need a training dataset composed of triplets of query image, query text, and target image, which is very expensive to collect. Several recent works have worked on the zero-shot (ZS) CIR paradigm to tackle the issue without using pre-collect… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 camera-ready; First two authors contributed equally; 17 pages, 3.1MB

  45. arXiv:2311.18540  [pdf, other

    cs.CV cs.LG

    Match me if you can: Semantic Correspondence Learning with Unpaired Images

    Authors: Jiwon Kim, Byeongho Heo, Sangdoo Yun, Seungryong Kim, Dongyoon Han

    Abstract: Recent approaches for semantic correspondence have focused on obtaining high-quality correspondences using a complicated network, refining the ambiguous or noisy matching points. Despite their performance improvements, they remain constrained by the limited training pairs due to costly point-level annotations. This paper proposes a simple yet effective method that performs training with unlabeled… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 12 pages

  46. arXiv:2311.17801  [pdf, other

    cs.ET cs.AR cs.LG

    Towards Efficient Hyperdimensional Computing Using Photonics

    Authors: Farbin Fayza, Cansu Demirkiran, Hanning Chen, Che-Kai Liu, Avi Mohan, Hamza Errahmouni, Sanggeon Yun, Mohsen Imani, David Zhang, Darius Bunandar, Ajay Joshi

    Abstract: Over the past few years, silicon photonics-based computing has emerged as a promising alternative to CMOS-based computing for Deep Neural Networks (DNN). Unfortunately, the non-linear operations and the high-precision requirements of DNNs make it extremely challenging to design efficient silicon photonics-based systems for DNN inference and training. Hyperdimensional Computing (HDC) is an emerging… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  47. arXiv:2311.15569  [pdf, other

    cs.CV cs.AI

    Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models

    Authors: Yong** Yang, Jongwoo Ko, Se-Young Yun

    Abstract: Vision-Language Models (VLMs) like CLIP have demonstrated remarkable applicability across a variety of downstream tasks, including zero-shot image classification. Recently, the use of prompts or adapters for efficient transfer learning has gained significant attention for effectively adapting to downstream tasks. However, the roles of vision and text prompts, as well as adapters in terms of genera… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 11 pages (19 pages including supplementary), 10 figures (12 figures including supplementary), 6 tables (17 tables including supplementary)

  48. arXiv:2311.13267  [pdf, other

    cs.LG cs.AI cs.CV

    FedFN: Feature Normalization for Alleviating Data Heterogeneity Problem in Federated Learning

    Authors: Seongyoon Kim, Gihun Lee, Jaehoon Oh, Se-Young Yun

    Abstract: Federated Learning (FL) is a collaborative method for training models while preserving data privacy in decentralized settings. However, FL encounters challenges related to data heterogeneity, which can result in performance degradation. In our study, we observe that as data heterogeneity increases, feature representation in the FedAVG model deteriorates more significantly compared to classifier we… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: NeurIPS Workshop: "Federated Learning in the Age of Foundation Models" 2023

  49. arXiv:2311.12707  [pdf, other

    cs.HC cs.AI cs.CL

    Kee** Users Engaged During Repeated Administration of the Same Questionnaire: Using Large Language Models to Reliably Diversify Questions

    Authors: Hye Sun Yun, Mehdi Arjmand, Phillip Raymond Sherlock, Michael Paasche-Orlow, James W. Griffith, Timothy Bickmore

    Abstract: Standardized, validated questionnaires are vital tools in HCI research and healthcare, offering dependable self-report data. However, their repeated use in longitudinal or pre-post studies can induce respondent fatigue, impacting data quality via response biases and decreased response rates. We propose utilizing large language models (LLMs) to generate diverse questionnaire versions while retainin… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 22 pages, preprint

  50. arXiv:2311.08106  [pdf, other

    cs.CL

    Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

    Authors: Yu** Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun

    Abstract: The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a tempora… ▽ More

    Submitted 20 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 15 pages, 10 figures, 5 tables; accepted to NAACL 2024