Skip to main content

Showing 1–50 of 1,215 results for author: lee, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04182  [pdf, other

    cs.AR

    Towards Generalized On-Chip Communication for Programmable Accelerators in Heterogeneous Architectures

    Authors: Joseph Zuckerman, John-David Wellman, Ajay Vanamali, Manish Shankar, Gabriele Tombesi, Karthik Swaminathan, Kevin Lee, Mohit Kapur, Robert Philhower, Pradip Bose, Luca P. Carloni

    Abstract: We present several enhancements to the open-source ESP platform to support flexible and efficient on-chip communication for programmable accelerators in heterogeneous SoCs. These enhancements include 1) a flexible point-to-point communication mechanism between accelerators, 2) a multicast NoC that supports data forwarding to multiple accelerators simultaneously, 3) accelerator synchronization leve… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Appeared in the Sixth International Workshop on Domain Specific System Architecture (DOSSA-6)

  2. arXiv:2407.03086  [pdf, other

    cs.LG cs.AI cs.DC

    Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight Generation

    Authors: Yu** Shin, Kichang Lee, Sungmin Lee, You Rim Choi, Hyung-Sin Kim, JeongGil Ko

    Abstract: While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architectu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.01158  [pdf, other

    cs.CL

    Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

    Authors: Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee

    Abstract: Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlin… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress. Resources are available at https://github.com/youngerous/qtree

  4. arXiv:2407.00878  [pdf, other

    cs.DC cs.LG

    A Robust Power Model Training Framework for Cloud Native Runtime Energy Metric Exporter

    Authors: Sunyanan Choochotkaew, Chen Wang, Huamin Chen, Tatsuhiro Chiba, Marcelo Amaral, Eun Kyung Lee, Tamar Eilam

    Abstract: Estimating power consumption in modern Cloud environments is essential for carbon quantification toward green computing. Specifically, it is important to properly account for the power consumed by each of the running applications, which are packaged as containers. This paper examines multiple challenges associated with this goal. The first challenge is that multiple customers are sharing the same… ▽ More

    Submitted 9 April, 2024; originally announced July 2024.

    Comments: This is a full-version (8-page) paper of our previous publication in IEEE MASCOTS 2023, which has been accepted as a 4-page short paper (https://ieeexplore.ieee.org/document/10387542)

  5. arXiv:2406.19292  [pdf, other

    cs.LG cs.AI cs.CL

    From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

    Authors: Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. Our experiments on models like GPT-3.5 Turbo and Mistral 7B dem… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.17746  [pdf, other

    cs.CL cs.AI

    Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

    Authors: USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra

    Abstract: Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, recons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.17376  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

    Authors: Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng

    Abstract: Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in sp… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  8. arXiv:2406.17294  [pdf, other

    cs.CL

    Math-LLaVA: Bootstrap** Mathematical Reasoning for Multimodal Large Language Models

    Authors: Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge th… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages

  9. arXiv:2406.16807  [pdf, other

    cs.LG cs.CL cs.CV

    Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

    Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

    Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  10. arXiv:2406.16620  [pdf, other

    cs.CV cs.CL

    OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

    Authors: Lu Zhang, Tiancheng Zhao, Heting Ying, Yibo Ma, Kyusong Lee

    Abstract: Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents significant challenges due to the vast data and processing demands. Traditional methods, like extracting key frames or converting frames to text, ofte… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.14176  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

    Authors: Kyungbok Lee, You Zhang, Zhiyao Duan

    Abstract: This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  12. arXiv:2406.12223  [pdf, other

    cs.CL cs.CY

    ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

    Authors: Yunze Xiao, Yujia Hu, Kenny Tsu Wei Choo, Roy Ka-wei Lee

    Abstract: Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages,5 Tables, 2 Figures

  13. arXiv:2406.11427  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

    Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

    Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.11354  [pdf, other

    cs.CL cs.AI cs.CV

    Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

    Authors: Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

    Abstract: Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  15. Expanding the Design Space of Computer Vision-based Interactive Systems for Group Dance Practice

    Authors: Soohwan Lee, Seoyeong Hwang, Ian Oakley, Kyungho Lee

    Abstract: Group dance, a sub-genre characterized by intricate motions made by a cohort of performers in tight synchronization, has a longstanding and culturally significant history and, in modern forms such as cheerleading, a broad base of current adherents. However, despite its popularity, learning group dance routines remains challenging. Based on the prior success of interactive systems to support indivi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 20 pages, 10 figures, 1 table, to be published in the proceedings of the ACM Designing Interactive Systems Conference, 2024, (DIS '24)

    Journal ref: ACM Designing Interactive Systems Conference, 2024, (DIS '24)

  16. arXiv:2406.11125  [pdf, other

    cs.HC

    Conversational Agents as Catalysts for Critical Thinking: Challenging Design Fixation in Group Design

    Authors: Soohwan Lee, Seoyeong Hwang, Kyungho Lee

    Abstract: This paper investigates the potential of LLM-based conversational agents (CAs) to enhance critical reflection and mitigate design fixation in group design work. By challenging AI-generated recommendations and prevailing group opinions, these agents address issues such as groupthink and promote a more dynamic and inclusive design process. Key design considerations include optimizing intervention ti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures, DIS2024 Workshop on 'Death of Design Researcher'

  17. arXiv:2406.10836  [pdf, other

    eess.AS cs.SD

    Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

    Authors: Xin Wang, Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noé, Junichi Yamagishi

    Abstract: Fusing outputs from automatic speaker verification (ASV) and spoofing countermeasure (CM) is expected to make an integrated system robust to zero-effort imposters and synthesized spoofing attacks. Many score-level fusion methods have been proposed, but many remain heuristic. This paper revisits score-level fusion using tools from decision theory and presents three main findings. First, fusion by s… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024 Accepted. https://github.com/nii-yamagishilab/SpeechSPC-mini

  18. arXiv:2406.10815  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

    Authors: Jeongheon Oh, Kibok Lee

    Abstract: Supervised contrastive representation learning has been shown to be effective in various transfer learning scenarios. However, while asymmetric non-contrastive learning (ANCL) often outperforms its contrastive learning counterpart in self-supervised representation learning, the extension of ANCL to supervised scenarios is less explored. To bridge the gap, we study ANCL for supervised representatio… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  19. arXiv:2406.08702  [pdf, other

    cs.AI cs.CL cs.CV

    VLind-Bench: Measuring Language Priors in Large Vision-Language Models

    Authors: Kang-il Lee, Minbeom Kim, Minsung Kim, Dongryeol Lee, Hyukhun Koh, Kyomin Jung

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated outstanding performance across various multimodal tasks. However, they suffer from a problem known as language prior, where responses are generated based solely on textual patterns while disregarding image information. Addressing the issue of language prior is crucial, as it can lead to undesirable biases or hallucinations when dealing with im… ▽ More

    Submitted 17 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.08200  [pdf, other

    cs.SD cs.AI eess.AS

    Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding

    Authors: Rui Wang, Li** Chen, Kong AiK Lee, Zhen-Hua Ling

    Abstract: Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: accpeted by Interspeech2024

  21. arXiv:2406.07909  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

    Authors: Eungbeom Kim, Hantae Kim, Kyogu Lee

    Abstract: Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student models in frame-level alignment which ultimately hinders it from improving the student model's performance. In order to resolve this problem, this paper introduce… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  22. arXiv:2406.06822  [pdf, other

    cs.CR cs.AI cs.SE

    An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

    Authors: Shenao Yan, Shen Wang, Yue Duan, Hanbin Hong, Kiho Lee, Doowon Kim, Yuan Hong

    Abstract: Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: To appear in USENIX Security '24

  23. arXiv:2406.06717  [pdf, ps, other

    cs.SI cs.HC

    Analyzing user archetypes in Singapore's Telegram groups on COVID-19 and climate change

    Authors: Val Alvern Cueco Ligo, Lan Tianxiang, Ying Zeng, Lam Yin Cheung, Pi Zonooz, Roy Ka-Wei Lee, Koustuv Saha, Edson C. Tandoc Jr., Navin Kumar

    Abstract: Social media platforms, particularly Telegram, play a pivotal role in sha** public perceptions and opinions on global and national issues. Unlike traditional news media, Telegram allows for the proliferation of user-generated content with minimal oversight, making it a significant venue for the spread of controversial and misinformative content. During the COVID-19 pandemic, Telegram's popularit… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  24. arXiv:2406.06037  [pdf, other

    cs.LG cs.AI cs.CV

    Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

    Authors: Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo

    Abstract: Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions fro… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  25. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  26. arXiv:2406.04412  [pdf, other

    cs.LG cs.AI cs.CL

    Aligning Large Language Models with Self-generated Preference Data

    Authors: Dongyoung Kim, Kimin Lee, **woo Shin, Jaehyung Kim

    Abstract: Aligning large language models (LLMs) with human preferences becomes a key component to obtaining state-of-the-art performance, but it yields a huge cost to construct a large human-annotated preference dataset. To tackle this problem, we propose a new framework that boosts the alignment of LLMs through Self-generated Preference data (Selfie) using only a very small amount of human-annotated prefer… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 18 pages, under review

  27. arXiv:2406.02847  [pdf, other

    cs.LG stat.ML

    Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

    Authors: Brian K Chen, Tianyang Hu, Hui **, Hwee Kuan Lee, Kenji Kawaguchi

    Abstract: In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  28. arXiv:2406.02331  [pdf, other

    cs.CL

    Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering

    Authors: ChaeHun Park, Koanho Lee, Hyesu Lim, Jaeseok Kim, Junmo Park, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo

    Abstract: Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings Accepted

  29. arXiv:2406.01049  [pdf, other

    cs.SD

    Searching For Music Mixing Graphs: A Pruning Approach

    Authors: Sungho Lee, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji

    Abstract: Music mixing is compositional -- experts combine multiple audio processors to achieve a cohesive mix from dry source tracks. We propose a method to reverse engineer this process from the input and output audio. First, we create a mixing console that applies all available processors to every chain. Then, after the initial console parameter optimization, we alternate between removing redundant proce… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to DAFx 2024

  30. arXiv:2406.00636  [pdf, other

    cs.CV

    T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

    Authors: Taeryung Lee, Fabien Baradel, Thomas Lucas, Kyoung Mu Lee, Gregory Rogez

    Abstract: In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step. However, this appr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 HuMoGen Workshop

  31. arXiv:2406.00014  [pdf, other

    cs.DB cs.AI cs.CL cs.IR

    KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR

    Authors: Hajung Kim, Chanhwi Kim, Hoonick Lee, Kyochul Jang, Jiwoo Lee, Kyungjae Lee, Gangwoo Kim, Jaewoo Kang

    Abstract: Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handle… ▽ More

    Submitted 19 June, 2024; v1 submitted 21 May, 2024; originally announced June 2024.

    Comments: Published at ClinicalNLP workshop @ NAACL 2024

  32. arXiv:2405.20829  [pdf, other

    cs.CV cs.LG

    Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

    Authors: Seongheon Park, Hyuk Kwon, Kwanghoon Sohn, Kibok Lee

    Abstract: Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets. Despite the recent advancements in OWSSL, the success often relies on the assumptions that 1) labeled and unlabeled datasets share the same balanced class prior distribution, which does not generally hold in real-world applic… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: CVPR Workshop on Computer Vision in the Wild (CVinW), 2024

  33. arXiv:2405.20672  [pdf, other

    cs.CV

    Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations

    Authors: Davide Coppola, Hwee Kuan Lee

    Abstract: This study explores the impact of adversarial perturbations on Convolutional Neural Networks (CNNs) with the aim of enhancing the understanding of their underlying mechanisms. Despite numerous defense methods proposed in the literature, there is still an incomplete understanding of this phenomenon. Instead of treating the entire model as vulnerable, we propose that specific feature maps learned du… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 22 pages, 15 figures (including appendix)

  34. arXiv:2405.20305  [pdf, other

    cs.CV

    Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

    Authors: Himangi Mittal, Nakul Agarwal, Shao-Yuan Lo, Kwonjoon Lee

    Abstract: We introduce PlausiVL, a large video-language model for anticipating action sequences that are plausible in the real-world. While significant efforts have been made towards anticipating future actions, prior approaches do not take into account the aspect of plausibility in an action sequence. To address this limitation, we explore the generative capability of a large video-language model in our wo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  35. arXiv:2405.20233  [pdf, other

    cs.LG cs.AI

    Grokfast: Accelerated Grokking by Amplifying Slow Gradients

    Authors: Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

    Abstract: One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 17 pages, 13 figures. Typo fixed. Project page: https://jaerinlee.com/research/grokfast

  36. arXiv:2405.19598  [pdf, other

    cs.CR

    Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

    Authors: Fujiao Ji, Kiho Lee, Hyungjoon Koo, Wenhao You, Eui** Choo, Hyoungshick Kim, Doowon Kim

    Abstract: Phishing attacks pose a significant threat to Internet users, with cybercriminals elaborately replicating the visual appearance of legitimate websites to deceive victims. Visual similarity-based detection systems have emerged as an effective countermeasure, but their effectiveness and robustness in real-world scenarios have been unexplored. In this paper, we comprehensively scrutinize and evaluate… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 12 pages

  37. arXiv:2405.18698  [pdf, other

    cs.LG cs.AI

    Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

    Authors: Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, Songhwai Oh

    Abstract: The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 26 pages

  38. arXiv:2405.18047  [pdf, other

    cs.LG cs.AI cs.DC

    2BP: 2-Stage Backpropagation

    Authors: Christopher Rae, Joseph K. L. Lee, James Richings

    Abstract: As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic diff… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  39. arXiv:2405.17111  [pdf, other

    cs.LG

    Diffusion Bridge AutoEncoders for Unsupervised Representation Learning

    Authors: Yeongmin Kim, Kwanghyeon Lee, Minsang Park, Byeonghu Na, Il-Chul Moon

    Abstract: Diffusion-based representation learning has achieved substantial attention due to its promising capabilities in latent representation and sample generation. Recent studies have employed an auxiliary encoder to identify a corresponding representation from a sample and to adjust the dimensionality of a latent variable z. Meanwhile, this auxiliary structure invokes information split problem because t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  40. The AI-DEC: A Card-based Design Method for User-centered AI Explanations

    Authors: Christine P Lee, Min Kyung Lee, Bilge Mutlu

    Abstract: Increasing evidence suggests that many deployed AI systems do not sufficiently support end-user interaction and information needs. Engaging end-users in the design of these systems can reveal user needs and expectations, yet effective ways of engaging end-users in the AI explanation design remain under-explored. To address this gap, we developed a design method, called AI-DEC, that defines four di… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Journal ref: Designing Interactive Systems Conference, 2024, (DIS '24)

  41. arXiv:2405.16305  [pdf, other

    cs.LG

    Efficiently Parameterized Neural Metriplectic Systems

    Authors: Anthony Gruber, Kook** Lee, Haksoo Lim, Noseong Park, Nathaniel Trask

    Abstract: Metriplectic systems are learned from data in a way that scales quadratically in both the size of the state and the rank of the metriplectic data. Besides being provably energy conserving and entropy stable, the proposed approach comes with approximation results demonstrating its ability to accurately learn metriplectic dynamics from data as well as an error estimate indicating its potential for g… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  42. arXiv:2405.16301  [pdf, other

    cs.CV cs.LG

    Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples

    Authors: Dae Ung Jo, Kyuewang Lee, JaeHo Chung, ** Young Choi

    Abstract: Securing a sufficient amount of paired data is important to train an image-text retrieval (ITR) model, but collecting paired data is very expensive. To address this issue, in this paper, we propose an active learning algorithm for ITR that can collect paired data cost-efficiently. Previous studies assume that image-text pairs are given and their category labels are asked to the annotator. However,… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  43. arXiv:2405.15961  [pdf, other

    cs.CV

    Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images

    Authors: Yiran Luo, Joshua Feinglass, Tejas Gokhale, Kuan-Cheng Lee, Chitta Baral, Yezhou Yang

    Abstract: Domain Generalization (DG) is a challenging task in machine learning that requires a coherent ability to comprehend shifts across various domains through extraction of domain-invariant features. DG performance is typically evaluated by performing image classification in domains of various image styles. However, current methodology lacks quantitative understanding about shifts in stylistic domain,… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted at the 3rd CVPR Workshop on Vision Datasets Understanding

  44. arXiv:2405.15018  [pdf, other

    cs.LG cs.AI cs.CV

    What Variables Affect Out-Of-Distribution Generalization in Pretrained Models?

    Authors: Md Yousuf Harun, Kyungbok Lee, Jhair Gallardo, Giri Krishnan, Christopher Kanan

    Abstract: Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which suggests deeper DNN layers compress representations and hinder OOD performance. Contrary to earlie… ▽ More

    Submitted 11 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Project Website: https://yousuf907.github.io/oodg

  45. arXiv:2405.12656  [pdf, other

    cs.CL cs.AI

    Retrieval-Augmented Language Model for Extreme Multi-Label Knowledge Graph Link Prediction

    Authors: Yu-Hsiang Lin, Huang-Ting Shieh, Chih-Yu Liu, Kuang-Ting Lee, Hsiao-Cheng Chang, **g-Lun Yang, Yu-Sheng Lin

    Abstract: Extrapolation in Large language models (LLMs) for open-ended inquiry encounters two pivotal issues: (1) hallucination and (2) expensive training costs. These issues present challenges for LLMs in specialized domains and personalized data, requiring truthful responses and low fine-tuning costs. Existing works attempt to tackle the problem by augmenting the input of a smaller language model with inf… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  46. arXiv:2405.11563  [pdf, other

    cs.IT

    User-Centric Association and Feedback Bit Allocation for FDD Cell-Free Massive MIMO

    Authors: Kwangjae Lee, Jung Hoon Lee, Wan Choi

    Abstract: In this paper, we introduce a novel approach to user-centric association and feedback bit allocation for the downlink of a cell-free massive MIMO (CF-mMIMO) system, operating under limited feedback constraints. In CF-mMIMO systems employing frequency division duplexing, each access point (AP) relies on channel information provided by its associated user equipments (UEs) for beamforming design. Sin… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  47. arXiv:2405.07414  [pdf, other

    cs.LG cs.AI

    Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

    Authors: Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

    Abstract: The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self… ▽ More

    Submitted 13 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 18 pages (including supplementary materials)

  48. arXiv:2405.06331  [pdf, other

    cs.LG cs.CL

    LMD3: Language Model Data Density Dependence

    Authors: John Kirchenbauer, Garrett Honke, Gowthami Somepalli, Jonas Gei**, Daphne Ippolito, Katherine Lee, Tom Goldstein, David Andre

    Abstract: We develop a methodology for analyzing language model task performance at the individual example level based on training data density estimation. Experiments with paraphrasing as a controlled intervention on finetuning data demonstrate that increasing the support in the training distribution for specific test queries results in a measurable increase in density, which is also a significant predicto… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 10 pages in the main body

  49. arXiv:2405.04842  [pdf, ps, other

    cs.SC math.AG math.NA

    Effective alpha theory certification using interval arithmetic: alpha theory over regions

    Authors: Kisun Lee

    Abstract: We reexamine Smale's alpha theory as a way to certify a numerical solution to an analytic system. For a given point and a system, Smale's alpha theory determines whether Newton's method applied to this point shows the quadratic convergence to an exact solution. We introduce the alpha theory computation using interval arithmetic to avoid costly exact arithmetic. As a straightforward variation of th… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted for the Proceedings of ICMS 2024

    MSC Class: 65H14

  50. arXiv:2405.01842  [pdf, ps, other

    cs.CL

    SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

    Authors: Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

    Abstract: To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native ann… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.