Skip to main content

Showing 1–50 of 945 results for author: Lee, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00888  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Papez: Resource-Efficient Speech Separation with Auditory Working Memory

    Authors: Hyunseok Oh, Juheon Yi, Youngki Lee

    Abstract: Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 5 pages. Accepted by ICASSP 2023

  2. arXiv:2407.00762  [pdf, other

    eess.SY cs.MA

    Guarding a Target Area from a Heterogeneous Group of Cooperative Attackers

    Authors: Yoonjae Lee, Goutam Das, Daigo Shishika, Efstathios Bakolas

    Abstract: In this paper, we investigate a multi-agent target guarding problem in which a single defender seeks to capture multiple attackers aiming to reach a high-value target area. In contrast to previous studies, the attackers herein are assumed to be heterogeneous in the sense that they have not only different speeds but also different weights representing their respective degrees of importance (e.g., t… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: This is the revised version of the paper, with the same title, to be presented at American Control Conference (ACC) 2024

  3. arXiv:2407.00699  [pdf, other

    cs.LG cs.AI

    Tackling Long-Horizon Tasks with Model-based Offline Reinforcement Learning

    Authors: Kwanyoung Park, Youngwoon Lee

    Abstract: Model-based offline reinforcement learning (RL) is a compelling approach that addresses the challenge of learning from limited, static data by generating imaginary trajectories using learned models. However, it falls short in solving long-horizon tasks due to high bias in value estimation from model rollouts. In this paper, we introduce a novel model-based offline RL method, Lower Expectile Q-lear… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: https://kwanyoungpark.github.io/LEQ/

  4. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, **ghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.18561  [pdf, other

    cs.CV cs.LG

    SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching

    Authors: Yongmin Lee, Hye Won Chung

    Abstract: Dataset distillation aims to synthesize a small number of images per class (IPC) from a large dataset to approximate full dataset training with minimal performance loss. While effective in very small IPC ranges, many distillation methods become less effective, even underperforming random sample selection, as IPC increases. Our examination of state-of-the-art trajectory-matching based distillation… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

    Comments: ICML 2024

  6. arXiv:2406.17963  [pdf, other

    cs.LG cs.HC cs.SI

    Empowering Interdisciplinary Insights with Dynamic Graph Embedding Trajectories

    Authors: Yiqiao **, Andrew Zhao, Yeon-Chang Lee, Meng Ye, Ajay Divakaran, Srijan Kumar

    Abstract: We developed DyGETViz, a novel framework for effectively visualizing dynamic graphs (DGs) that are ubiquitous across diverse real-world systems. This framework leverages recent advancements in discrete-time dynamic graph (DTDG) models to adeptly handle the temporal dynamics inherent in dynamic graphs. DyGETViz effectively captures both micro- and macro-level structural shifts within these graphs,… ▽ More

    Submitted 28 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 27 pages, 11 figures

  7. arXiv:2406.17808  [pdf, other

    cs.CL cs.AI cs.LG

    Training-Free Exponential Extension of Sliding Window Context with Cascading KV Cache

    Authors: Jeffrey Willette, Heejun Lee, Youngwan Lee, Myeongjae Jeon, Sung Ju Hwang

    Abstract: The context window within a transformer provides a form of active memory for the current task, which can be useful for few-shot learning and conditional generation, both which depend heavily on previous context tokens. However, as the context length grows, the computational cost increases quadratically. Recent works have shown that saving a few initial tokens along with a fixed-sized sliding windo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  8. arXiv:2406.15959  [pdf, other

    math.NA cs.CE cs.LG

    A Nonoverlap** Domain Decomposition Method for Extreme Learning Machines: Elliptic Problems

    Authors: Chang-Ock Lee, Youngkyu Lee, Byungeun Ryoo

    Abstract: Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network. It presets the weight/bias coefficients in the hidden layer with random values, which remain fixed throughout the computation, and uses a linear least squares method for training the parameters of the output layer of the neural network. It is kn… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 18 pages, 4 figures, 7 tables

    MSC Class: 65N55 (Primary); 35J25; 68T07 (Secondary)

  9. arXiv:2406.14703  [pdf, other

    cs.CL cs.AI

    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

    Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, **young Yeo, Youngjae Yu

    Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint; Under review

  10. arXiv:2406.14571  [pdf, other

    cs.AR cs.AI cs.LG

    PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models

    Authors: Yunjae Lee, Hyeseong Kim, Minsoo Rhu

    Abstract: Training recommendation systems (RecSys) faces several challenges as it requires the "data preprocessing" stage to preprocess an ample amount of raw data and feed them to the GPU for training in a seamless manner. To sustain high training throughput, state-of-the-art solutions reserve a large fleet of CPU servers for preprocessing which incurs substantial deployment cost and power consumption. Our… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Journal ref: Published at 51th IEEE/ACM International Symposium on Computer Architecture (ISCA-51), 2024

  11. arXiv:2406.13214  [pdf, other

    cs.LG

    Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

    Authors: Sangwoo Seo, Sungwon Kim, Jihyeong Jung, Yoonho Lee, Chanyoung Park

    Abstract: Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal grap… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  12. arXiv:2406.12356  [pdf, other

    cs.IR

    A Gradient Accumulation Method for Dense Retriever under Memory Constraint

    Authors: Jaehee Kim, Yukyung Lee, Pilsung Kang

    Abstract: InfoNCE loss is commonly used to train dense retriever in information retrieval tasks. It is well known that a large batch is essential to stable and effective training with InfoNCE loss, which requires significant hardware resources. Due to the dependency of large batch, dense retriever has bottleneck of application and research. Recently, memory reduction methods have been broadly adopted to res… ▽ More

    Submitted 18 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  13. arXiv:2406.10997  [pdf, other

    math.NA cs.LG math.OC

    Two-level overlap** additive Schwarz preconditioner for training scientific machine learning applications

    Authors: Youngkyu Lee, Alena Kopaničáková, George Em Karniadakis

    Abstract: We introduce a novel two-level overlap** additive Schwarz preconditioner for accelerating the training of scientific machine learning applications. The design of the proposed preconditioner is motivated by the nonlinear two-level overlap** additive Schwarz preconditioner. The neural network parameters are decomposed into groups (subdomains) with overlap** regions. In addition, the network's… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 24 pages, 9 figures

    MSC Class: 90C30; 90C26; 90C06; 65M55; 68T07

  14. arXiv:2406.10920  [pdf, other

    math.OC cs.AI cs.LG math.NA

    Hamilton-Jacobi Based Policy-Iteration via Deep Operator Learning

    Authors: Jae Yong Lee, Yeoneung Kim

    Abstract: The framework of deep operator network (DeepONet) has been widely exploited thanks to its capability of solving high dimensional partial differential equations. In this paper, we incorporate DeepONet with a recently developed policy iteration scheme to numerically solve optimal control problems and the corresponding Hamilton--Jacobi--Bellman (HJB) equations. A notable feature of our approach is th… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures

    MSC Class: 68T20; 68U07; 35F21; 49L12; 49L25

  15. arXiv:2406.09894  [pdf, other

    eess.AS cs.SD

    Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis

    Authors: Taewoo Kim, Choongsang Cho, Young Han Lee

    Abstract: In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully addr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  16. arXiv:2406.09827  [pdf, other

    cs.CL cs.CV cs.DC cs.LG

    HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning

    Authors: Heejun Lee, Geon Park, Youngwan Lee, **a Kim, Wonyoung Jeong, Myeongjae Jeon, Sung Ju Hwang

    Abstract: In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limite… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 26 pages, 15 figures

  17. arXiv:2406.09400  [pdf, other

    cs.CV cs.LG

    Yo'LLaVA: Your Personalized Language and Vision Assistant

    Authors: Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://thaoshibe.github.io/YoLLaVA

  18. arXiv:2406.09117  [pdf, other

    cs.CV cs.AI

    PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

    Authors: Injoon Hwang, Haewon Park, Youngwan Lee, Jooyoung Yang, SunJae Maeng

    Abstract: Low-rank adaption (LoRA) is a prominent method that adds a small number of learnable parameters to the frozen pre-trained weights for parameter-efficient fine-tuning. Prompted by the question, ``Can we make its representation enough with LoRA weights solely at the final phase of finetuning without the pre-trained weights?'' In this work, we introduce Progressive Compression LoRA~(PC-LoRA), which u… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at T4V@CVPR

  19. arXiv:2406.08719  [pdf, other

    cs.CR

    TikTag: Breaking ARM's Memory Tagging Extension with Speculative Execution

    Authors: Juhee Kim, **bum Park, Sihyeon Roh, Jaeyoung Chung, Youngjoo Lee, Taesoo Kim, Byoungyoung Lee

    Abstract: ARM Memory Tagging Extension (MTE) is a new hardware feature introduced in ARMv8.5-A architecture, aiming to detect memory corruption vulnerabilities. The low overhead of MTE makes it an attractive solution to mitigate memory corruption attacks in modern software systems and is considered the most promising path forward for improving C/C++ software security. This paper explores the potential secur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.08497  [pdf, other

    cs.CC cs.ET

    On the Simulation Power of Surface Chemical Reaction Networks

    Authors: Yi-Xuan Lee, Ho-Lin Chen

    Abstract: The Chemical Reaction Network (CRN) is a well-studied model that describes the interaction of molecules in well-mixed solutions. In 2014, Qian and Winfree [22] proposed the abstract surface chemical reaction network model (sCRN), which takes the advantage of spatial separation by placing molecules on a structured surface, limiting the interaction between molecules. In this model, molecules can onl… ▽ More

    Submitted 26 April, 2024; originally announced June 2024.

    Comments: 46 pages, 8 figures

  21. arXiv:2406.06004  [pdf, other

    cs.CV cs.AI cs.CL

    FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model

    Authors: Yebin Lee, Imseong Park, Myungjoo Kang

    Abstract: Most existing image captioning evaluation metrics focus on assigning a single numerical score to a caption by comparing it with reference captions. However, these methods do not provide an explanation for the assigned score. Moreover, reference captions are expensive to acquire. In this paper, we propose FLEUR, an explainable reference-free metric to introduce explainability into image captioning… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL (Main) 2024

  22. arXiv:2406.05963  [pdf, other

    cs.CV cs.AI

    Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

    Authors: **woo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

    Abstract: In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  23. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  24. arXiv:2406.04364  [pdf

    cs.CV cs.HC cs.LG

    Use of a Multiscale Vision Transformer to predict Nursing Activities Score from Low Resolution Thermal Videos in an Intensive Care Unit

    Authors: Isaac YL Lee, Thanh Nguyen-Duc, Ryo Ueno, Jesse Smith, Peter Y Chan

    Abstract: Excessive caregiver workload in hospital nurses has been implicated in poorer patient care and increased worker burnout. Measurement of this workload in the Intensive Care Unit (ICU) is often done using the Nursing Activities Score (NAS), but this is usually recorded manually and sporadically. Previous work has made use of Ambient Intelligence (AmI) by using computer vision to passively derive car… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

    Comments: 4 pages, 1 figure

  25. arXiv:2406.00160  [pdf, other

    cs.GT

    Robustness of Online Proportional Response in Stochastic Online Fisher Markets: a Decentralized Approach

    Authors: Yongge Yang, Yu-Ching Lee, Po-An Chen, Chuang-Chieh Lin

    Abstract: This study is focused on periodic Fisher markets where items with time-dependent and stochastic values are regularly replenished and buyers aim to maximize their utilities by spending budgets on these items. Traditional approaches of finding a market equilibrium in the single-period Fisher market rely on complete information about buyers' utility functions and budgets. However, it is impractical t… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  26. arXiv:2405.20334  [pdf, other

    cs.CV cs.GR

    VividDream: Generating 3D Scene with Ambient Dynamics

    Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

    Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://vivid-dream-4d.github.io

  27. arXiv:2405.19806  [pdf, other

    cs.LG

    Preference Alignment with Flow Matching

    Authors: Minu Kim, Yongsik Lee, Sehyeok Kang, Jihwan Oh, Song Chong, Seyoung Yun

    Abstract: We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs lik… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  28. arXiv:2405.19691  [pdf, other

    cs.HC

    Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing

    Authors: Minsun Kim, SeonGyeom Kim, Suyoun Lee, Yoosang Yoon, Junho Myung, Haneul Yoo, Hyungseung Lim, Jieun Han, Yoonsu Kim, So-Yeon Ahn, Juho Kim, Alice Oh, Hwajung Hong, Tak Yeon Lee

    Abstract: While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  29. arXiv:2405.18784  [pdf, other

    cs.CV

    LP-3DGS: Learning to Prune 3D Gaussian Splatting

    Authors: Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset prunin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  30. arXiv:2405.18727  [pdf, other

    cs.CL cs.AI cs.IR

    CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control

    Authors: Huanshuo Liu, Hao Zhang, Zhijiang Guo, Kuicai Dong, Xiangyang Li, Yi Quan Lee, Cong Zhang, Yong Liu

    Abstract: Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures, 9 tables

  31. arXiv:2405.18623  [pdf

    cs.HC

    I See You: Teacher Analytics with GPT-4 Vision-Powered Observational Assessment

    Authors: Unggi Lee, Yeil Jeong, Junbo Koh, Gyuri Byun, Yunseo Lee, Hyunwoong Lee, Seunmin Eun, Jewoong Moon, Cheolil Lim, Hyeoncheol Kim

    Abstract: This preliminary study explores the integration of GPT-4 Vision (GPT-4V) technology into teacher analytics, focusing on its applicability in observational assessment to enhance reflective teaching practice. This research is grounded in develo** a Video-based Automatic Assessment System (VidAAS) empowered by GPT-4V. Our approach aims to revolutionize teachers' assessment of students' practices by… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 27 pages, 5 figures, 4 tables

  32. arXiv:2405.18042  [pdf, other

    cs.CV cs.LG

    Visualizing the loss landscape of Self-supervised Vision Transformer

    Authors: Youngwan Lee, Jeffrey Ryan Willette, Jonghee Kim, Sung Ju Hwang

    Abstract: The Masked autoencoder (MAE) has drawn attention as a representative self-supervised approach for masked image modeling with vision transformers. However, even though MAE shows better generalization capability than fully supervised training from scratch, the reason why has not been explored. In another line of work, the Reconstruction Consistent Masked Auto Encoder (RC-MAE), has been proposed whic… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

  33. arXiv:2405.17793  [pdf, other

    cs.CV

    SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction

    Authors: Yongjae Lee, Zhaoliang Zhang, Deliang Fan

    Abstract: 3D Gaussian Splatting (3DGS) has made a significant stride in novel view synthesis, demonstrating top-notch rendering quality while achieving real-time rendering speed. However, the excessively large number of Gaussian primitives resulting from 3DGS' suboptimal densification process poses a major challenge, slowing down frame-per-second (FPS) and demanding considerable memory cost, making it unfav… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Comprehensive experiments are in progress

  34. arXiv:2405.17430  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Matryoshka Multimodal Models

    Authors: Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning. These models first embed images into a fixed large number of visual tokens and then feed them into a Large Language Model (LLM). However, this design causes an excessive number of tokens for dense visual scenarios such as high-resolution images and videos, leading to great inefficiency. While… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://matryoshka-mm.github.io/

  35. arXiv:2405.16766  [pdf, other

    cs.CV cs.AI cs.LG

    Reframing the Relationship in Out-of-Distribution Detection

    Authors: YuXiao Lee, Xiaofeng Cao

    Abstract: The remarkable achievements of Large Language Models (LLMs) have captivated the attention of both academia and industry, transcending their initial role in dialogue generation. The utilization of LLMs as intermediary agents in various tasks has yielded promising results, sparking a wave of innovation in artificial intelligence. Building on these breakthroughs, we introduce a novel approach that in… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  36. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  37. arXiv:2405.12713  [pdf, other

    cs.CV

    Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification

    Authors: Peng Gao, Yujian Lee, Hui Zhang, Xubo Liu, Yiyang Hu, Guquan **g

    Abstract: Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities. VI-ReID is a challenging task due to the large differences in individual appearance under different modalities. Existing methods generally try to bridge the cross-modal differences at image or feature level, which lacks exploring the discriminative embeddings. Ef… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  38. arXiv:2405.10135  [pdf, other

    cs.CE cond-mat.mtrl-sci

    Self-supervised feature distillation and design of experiments for efficient training of micromechanical deep learning surrogates

    Authors: Patxi Fernandez-Zelaia, Jason Mayeur, Jiahao Cheng, Yousub Lee, Kevin Knipe, Kai Kadau

    Abstract: Machine learning surrogate emulators are needed in engineering design and optimization tasks to rapidly emulate computationally expensive physics-based models. In micromechanics problems the local full-field response variables are desired at microstructural length scales. While there has been a great deal of work on establishing architectures for these tasks there has been relatively little work o… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  39. arXiv:2405.09805  [pdf, other

    cs.CL cs.CR

    SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data

    Authors: Abdulrahman Alabdulkareem, Christian M Arnold, Yerim Lee, Pieter M Feenstra, Boris Katz, Andrei Barbu

    Abstract: Traditional security mechanisms isolate resources from users who should not access them. We reflect the compositional nature of such security mechanisms back into the structure of LLMs to build a provably secure LLM; that we term SecureLLM. Other approaches to LLM safety attempt to protect against bad actors or bad outcomes, but can only do so to an extent making them inappropriate for sensitive d… ▽ More

    Submitted 13 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  40. arXiv:2405.08597  [pdf, other

    cs.LG

    Risks and Opportunities of Open-Source Generative AI

    Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster

    Abstract: Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This reg… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Extension of arXiv:2404.17047

  41. arXiv:2405.08424  [pdf, other

    cs.LG math.OC

    Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More

    Authors: Fanchen Bu, Hyeonsoo Jo, Soo Yong Lee, Sungsoo Ahn, Kijung Shin

    Abstract: Combinatorial optimization (CO) is naturally discrete, making machine learning based on differentiable optimization inapplicable. Karalias & Loukas (2020) adapted the probabilistic method to incorporate CO into differentiable optimization. Their work ignited the research on unsupervised learning for CO, composed of two main components: probabilistic objectives and derandomization. However, each co… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  42. arXiv:2405.07520  [pdf, ps, other

    cs.CV

    Dehazing Remote Sensing and UAV Imagery: A Review of Deep Learning, Prior-based, and Hybrid Approaches

    Authors: Gao Yu Lee, **kuan Chen, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N Duong

    Abstract: High-quality images are crucial in remote sensing and UAV applications, but atmospheric haze can severely degrade image quality, making image dehazing a critical research area. Since the introduction of deep convolutional neural networks, numerous approaches have been proposed, and even more have emerged with the development of vision transformers and contrastive/few-shot learning. Simultaneously,… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Submitted to journal and under review, once the paper is accepted, the copyright will be transferred to the corresponding journal

  43. arXiv:2405.05758  [pdf, other

    cs.HC cs.CL cs.CY

    Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma

    Authors: Han Meng, Yitian Yang, Yunan Li, Jungup Lee, Yi-Chieh Lee

    Abstract: Qualitative analysis is a challenging, yet crucial aspect of advancing research in the field of Human-Computer Interaction (HCI). Recent studies show that large language models (LLMs) can perform qualitative coding within existing schemes, but their potential for collaborative human-LLM discovery and new insight generation in qualitative analysis is still underexplored. To bridge this gap and adva… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 55 pages

  44. arXiv:2405.05581  [pdf, other

    cs.HC cs.AI cs.CL

    One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations

    Authors: Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, Juho Kim

    Abstract: As Large Language Models (LLMs) are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or a… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to FAccT 2024

  45. arXiv:2405.04746  [pdf, other

    cs.IR cs.AI cs.LG

    SVD-AE: Simple Autoencoders for Collaborative Filtering

    Authors: Seoyoung Hong, Jeongwhan Choi, Yeon-Chang Lee, Srijan Kumar, Noseong Park

    Abstract: Collaborative filtering (CF) methods for recommendation systems have been extensively researched, ranging from matrix factorization and autoencoder-based to graph filtering-based methods. Recently, lightweight methods that require almost no training have been recently proposed to reduce overall computation. However, existing methods still have room to improve the trade-offs among accuracy, efficie… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  46. arXiv:2405.02803  [pdf, other

    cs.LG cs.DC

    Is Flash Attention Stable?

    Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Ye** Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of instability during training, often taking the form of loss spikes. Numeric deviation has emerged as a potential cause of this training instability, although quantify… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  47. arXiv:2405.02762  [pdf, other

    cs.CV cs.LG cs.RO

    TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes

    Authors: Christopher Maxey, Jaehoon Choi, Yonghan Lee, Hyungtae Lee, Dinesh Manocha, Heesung Kwon

    Abstract: In this paper, we present a new approach to bridge the domain gap between synthetic and real-world data for un- manned aerial vehicle (UAV)-based perception. Our formu- lation is designed for dynamic scenes, consisting of moving objects or human actions, where the goal is to recognize the pose or actions. We propose an extension of K-Planes Neural Radiance Field (NeRF), wherein our algorithm store… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 8 pages, submitted to IROS2024

  48. arXiv:2404.19336  [pdf

    cs.AI cs.PL

    Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts

    Authors: Yanggyu Lee, Suchae Jeong, Jihie Kim

    Abstract: LLMs trained in the understanding of programming syntax are now providing effective assistance to developers and are being used in programming education such as in generation of coding problem examples or providing code explanations. A key aspect of programming education is understanding and dealing with error message. However, 'logical errors' in which the program operates against the programmer'… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted in ITS 2024

  49. arXiv:2404.17047  [pdf, other

    cs.LG

    Near to Mid-term Risks and Opportunities of Open-Source Generative AI

    Authors: Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H. S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster

    Abstract: In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation i… ▽ More

    Submitted 24 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML'24 as a position paper

  50. arXiv:2404.15103  [pdf, other

    cs.CL

    Multi-view Content-aware Indexing for Long Document Retrieval

    Authors: Kuicai Dong, Derrick Goh Xin Deik, Yi Quan Lee, Hao Zhang, Xiangyang Li, Cong Zhang, Yong Liu

    Abstract: Long document question answering (DocQA) aims to answer questions from long documents over 10k words. They usually contain content structures such as sections, sub-sections, and paragraph demarcations. However, the indexing methods of long documents remain under-explored, while existing systems generally employ fixed-length chunking. As they do not consider content structures, the resultant chunks… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.