Skip to main content

Showing 1–28 of 28 results for author: Ahn, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17261  [pdf, other

    cs.CL

    TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

    Authors: Yiran Luo, Het Patel, Yu Fu, Dawon Ahn, Jia Chen, Yue Dong, Evangelos E. Papalexakis

    Abstract: Large language models (LLMs) have fundamentally transformed artificial intelligence, catalyzing recent advancements while imposing substantial environmental and computational burdens. We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a novel methodology for optimizing LLMs through tensor decomposition. TRAWL leverages diverse strategies to exploit matrices wit… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures. Submitted to EMNLP 2024 and under review

    MSC Class: 68T50 (Primary); 65F55 (Secondary) ACM Class: I.2.7

  2. arXiv:2406.16695  [pdf, other

    cs.CV

    Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

    Authors: Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, **-Hwa Kim, Seungryong Kim

    Abstract: Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.11280  [pdf, other

    cs.CV

    i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment

    Authors: Daechul Ahn, Yura Choi, San Kim, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Aligning Video Large Multimodal Models (VLMMs) face challenges such as modality misalignment and verbose responses. Although iterative approaches such as self-rewarding or iterative direct preference optimization (DPO) recently showed a significant improvement in language model alignment, particularly on reasoning tasks, self-aligned models applied to large video-language models often result in le… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Technical report

  4. arXiv:2406.09799  [pdf, other

    cs.CY

    GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

    Authors: Sungwon Han, Donghyun Ahn, Seungeon Lee, Minhyuk Song, Sungwon Park, Sangyoon Park, Jihee Kim, Meeyoung Cha

    Abstract: Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Pre… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  5. arXiv:2405.12648  [pdf, other

    cs.CV cs.AI

    Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency

    Authors: Hyeong** Kim, Sangwon Kim, Dasom Ahn, Jong Taek Lee, Byoung Chul Ko

    Abstract: Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding o… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  6. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  7. arXiv:2403.17377  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

    Authors: Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan **, Seungryong Kim

    Abstract: Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project page is available at https://ku-cvlab.github.io/Perturbed-Attention-Guidance

  8. arXiv:2402.10076  [pdf, other

    cs.LG cs.AI cs.CL

    QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

    Authors: Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, Hyungjun Kim

    Abstract: We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs). QUICK addresses the shared memory bank-conflict problem of state-of-the-art mixed precision matrix multiplication kernels. Our method interleaves the quantized weight matrices of LLMs offline to skip the shared memory write-back after the dequantization. We demonstrate… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 9 pages, 8 figures

  9. arXiv:2402.03746  [pdf, other

    cs.CV

    Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback

    Authors: Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang, Jonghyun Choi

    Abstract: Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume an… ▽ More

    Submitted 17 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  10. Fine-Grained Socioeconomic Prediction from Satellite Images with Distributional Adjustment

    Authors: Donghyun Ahn, Minhyuk Song, Seungeon Lee, Yubin Choi, Jihee Kim, Sangyoon Park, Hyunjoo Yang, Meeyoung Cha

    Abstract: While measuring socioeconomic indicators is critical for local governments to make informed policy decisions, such measurements are often unavailable at fine-grained levels like municipality. This study employs deep learning-based predictions from satellite images to close the gap. We propose a method that assigns a socioeconomic score to each satellite image by capturing the distributional behavi… ▽ More

    Submitted 4 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    ACM Class: J.4

  11. arXiv:2308.07575  [pdf, other

    cs.CV cs.AI cs.LG

    Story Visualization by Online Text Augmentation with Context Memory

    Authors: Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi

    Abstract: Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convin… ▽ More

    Submitted 19 August, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: ICCV 2023, Project page: https://dcahn12.github.io/projects/CMOTA/

  12. arXiv:2307.01193  [pdf, other

    cs.LG cs.AI

    Squeezing Large-Scale Diffusion Models for Mobile

    Authors: Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, Hyungjun Kim

    Abstract: The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more t… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 7 pages, 8 figures, ICML 2023 Workshop on Challenges in Deployable Generative AI

  13. arXiv:2306.02316  [pdf, other

    cs.CV

    Temporal Dynamic Quantization for Diffusion Models

    Authors: Junhyuk So, Jungwon Lee, Daehyun Ahn, Hyungjun Kim, Eunhyeok Park

    Abstract: The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property o… ▽ More

    Submitted 11 December, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

  14. arXiv:2303.15413  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation

    Authors: Susung Hong, Donghoon Ahn, Seungryong Kim

    Abstract: Existing score-distilling text-to-3D generation techniques, despite their considerable promise, often encounter the view inconsistency problem. One of the most notable issues is the Janus problem, where the most canonical view of an object (\textit{e.g}., face or head) appears in other views. In this work, we explore existing frameworks for score-distilling text-to-3D generation and identify the m… ▽ More

    Submitted 19 December, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023. Project Page: https://susunghong.github.io/Debiased-Score-Distillation-Sampling/

  15. arXiv:2212.05638  [pdf, other

    cs.CV

    Cross-Modal Learning with 3D Deformable Attention for Action Recognition

    Authors: Sangwon Kim, Dasom Ahn, Byoung Chul Ko

    Abstract: An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D d… ▽ More

    Submitted 17 August, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: Accepted by ICCV2023

  16. arXiv:2210.07503  [pdf, other

    cs.CV cs.AI

    STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

    Authors: Dasom Ahn, Sangwon Kim, Hyunsu Hong, Byoung Chul Ko

    Abstract: In action recognition, although the combination of spatio-temporal videos and skeleton features can improve the recognition performance, a separate model and balancing feature representation for cross-modal data are required. To solve these problems, we propose Spatio-TemporAl cRoss (STAR)-transformer, which can effectively represent two cross-modal features as a recognizable vector. First, from t… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted by WACV 2023

    MSC Class: 68T07

  17. arXiv:2209.02696  [pdf, other

    cs.SD cs.MM eess.AS

    Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model

    Authors: Sangjun Han, Hyeongrae Ihm, DaeHan Ahn, Woohyung Lim

    Abstract: Similar to colorization in computer vision, instrument separation is to assign instrument labels (e.g. piano, guitar...) to notes from unlabeled mixtures which contain only performance information. To address the problem, we adopt diffusion models and explicitly guide them to preserve consistency between mixtures and music. The quantitative results show that our proposed model can generate high-fi… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: Submitted to NeurIPS 2022 Workshop on Machine Learning for Creativity and Design

  18. arXiv:2205.01472  [pdf, other

    cs.CY

    Learning Economic Indicators by Aggregating Multi-Level Geospatial Information

    Authors: Sungwon Park, Sungwon Han, Donghyun Ahn, Jaeyeon Kim, Jeasurk Yang, Susang Lee, Seunghoon Hong, Jihee Kim, Sangyoon Park, Hyunjoo Yang, Meeyoung Cha

    Abstract: High-resolution daytime satellite imagery has become a promising source to study economic activities. These images display detailed terrain over large areas and allow zooming into smaller neighborhoods. Existing methods, however, have utilized images only in a single-level geographical unit. This research presents a deep learning model to predict economic indicators via aggregating traits observed… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted at AAAI2022

  19. Workflows Community Summit: Tightening the Integration between Computing Facilities and Scientific Workflows

    Authors: Rafael Ferreira da Silva, Kyle Chard, Henri Casanova, Dan Laney, Dong Ahn, Shantenu Jha, William E. Allcock, Gregory Bauer, Dmitry Duplyakin, Bjoern Enders, Todd M. Heer, Eric Lancon, Sergiu Sanielevici, Kevin Sayers

    Abstract: The importance of workflows is highlighted by the fact that they have underpinned some of the most significant discoveries of the past decades. Many of these workflows have significant computational, storage, and communication demands, and thus must execute on a range of large-scale computer systems, from local clusters to public clouds and upcoming exascale HPC platforms. Historically, infrastruc… ▽ More

    Submitted 19 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.02168

  20. arXiv:2111.08222  [pdf

    cs.AI cs.HC

    Will We Trust What We Don't Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI

    Authors: Daehwan Ahn, Abdullah Almaatouq, Monisha Gulabani, Kartik Hosanagar

    Abstract: Despite AI's superhuman performance in a variety of domains, humans are often unwilling to adopt AI systems. The lack of interpretability inherent in many modern AI techniques is believed to be hurting their adoption, as users may not trust systems whose decision processes they do not understand. We investigate this proposition with a novel experiment in which we use an interactive prediction task… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  21. arXiv:2110.00428  [pdf, other

    cs.CL cs.AI cs.CV

    Zero-shot Natural Language Video Localization

    Authors: **woo Nam, Daechul Ahn, Dongyeop Kang, Seong Jong Ha, Jonghyun Choi

    Abstract: Understanding videos to localize moments with natural language often requires large expensive annotated video regions paired with language queries. To eliminate the annotation costs, we make a first attempt to train a natural language video localization model in zero-shot manner. Inspired by unsupervised image captioning setup, we merely require random text corpora, unlabeled video collections, an… ▽ More

    Submitted 29 August, 2021; originally announced October 2021.

    Comments: 10 pages, 7 figures

  22. arXiv:2109.03739  [pdf, other

    cs.DC

    A Dynamic, Hierarchical Resource Model for Converged Computing

    Authors: Daniel J. Milroy, Claudia Misale, Stephen Herbein, Dong H. Ahn

    Abstract: Extreme dynamic heterogeneity in high performance computing systems and the convergence of traditional HPC with new simulation, analysis, and data science approaches impose increasingly more complex requirements on resource and job management software (RJMS). However, there is a paucity of RJMS techniques that can solve key technical challenges associated with those new requirements, particularly… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: 11 pages, four figures, five tables

  23. arXiv:2108.13521  [pdf, other

    cs.DC

    ExaWorks: Workflows for Exascale

    Authors: Aymen Al-Saadi, Dong H. Ahn, Yadu Babuji, Kyle Chard, James Corbett, Mihael Hategan, Stephen Herbein, Shantenu Jha, Daniel Laney, Andre Merzky, Todd Munson, Michael Salim, Mikhail Titov, Matteo Turilli, Justin M. Wozniak

    Abstract: Exascale computers will offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. These software combinations and integrations, however, are difficult to achieve due to challenges of coordination and deployment of heterogeneous software components on diverse and massive platforms.… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  24. Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, TainĂ£ Coleman, Dan Laney, Dong Ahn, Shantenu Jha, Dorran Howell, Stian Soiland-Reys, Ilkay Altintas, Douglas Thain, Rosa Filgueira, Yadu Babuji, Rosa M. Badia, Bartosz Balis, Silvina Caino-Lores, Scott Callaghan, Frederik Coppens, Michael R. Crusoe, Kaushik De, Frank Di Natale, Tu M. A. Do, Bjoern Enders, Thomas Fahringer, Anne Fouilloux , et al. (33 additional authors not shown)

    Abstract: Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role i… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  25. Workflows Community Summit: Bringing the Scientific Workflows Community Together

    Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Dan Laney, Dong Ahn, Shantenu Jha, Carole Goble, Lavanya Ramakrishnan, Luc Peterson, Bjoern Enders, Douglas Thain, Ilkay Altintas, Yadu Babuji, Rosa M. Badia, Vivien Bonazzi, Taina Coleman, Michael Crusoe, Ewa Deelman, Frank Di Natale, Paolo Di Tommaso, Thomas Fahringer, Rosa Filgueira, Grigori Fursin, Alex Ganose, Bjorn Gruning , et al. (20 additional authors not shown)

    Abstract: Scientific workflows have been used almost universally across scientific domains, and have underpinned some of the most significant discoveries of the past several decades. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) pla… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  26. arXiv:2012.08855  [pdf, other

    cs.LG stat.ML

    Time-Aware Tensor Decomposition for Missing Entry Prediction

    Authors: Dawon Ahn, Jun-Gi Jang, U Kang

    Abstract: Given a time-evolving tensor with missing entries, how can we effectively factorize it for precisely predicting the missing entries? Tensor factorization has been extensively utilized for analyzing various multi-dimensional real-world data. However, existing models for tensor factorization have disregarded the temporal property for tensor factorization while most real-world data are closely relate… ▽ More

    Submitted 16 December, 2020; originally announced December 2020.

    Comments: 20 pages

  27. arXiv:1912.08197  [pdf, other

    cs.CV cs.CY

    Lightweight and Robust Representation of Economic Scales from Satellite Imagery

    Authors: Sungwon Han, Donghyun Ahn, Hyunji Cha, Jeasurk Yang, Sungwon Park, Meeyoung Cha

    Abstract: Satellite imagery has long been an attractive data source that provides a wealth of information on human-inhabited areas. While super resolution satellite images are rapidly becoming available, little study has focused on how to extract meaningful information about human habitation patterns and economic scales from such data. We present READ, a new approach for obtaining essential spatial represen… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: Accepted for oral presentation at AAAI 2020

  28. Multi-level analysis of compiler induced variability and performance tradeoffs

    Authors: Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Holger E. Jones

    Abstract: Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variab… ▽ More

    Submitted 24 June, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: 12 pages, 11 figures, accepted in HPDC 2019

    Report number: LLNL-CONF-759867