Skip to main content

Showing 1–50 of 223 results for author: Choo, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16535  [pdf, other

    cs.CL cs.AI cs.LG

    Token-based Decision Criteria Are Suboptimal in In-context Learning

    Authors: Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue

    Abstract: In-Context Learning (ICL) typically utilizes classification criteria from probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation. To address this problem, we propose Hidden Calibration, which renounces token probabilities and u… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 21 pages, 14 figures, 8 tables

  2. arXiv:2406.16275  [pdf, other

    cs.CL

    Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

    Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

    Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 13 tables, under review

  3. arXiv:2406.15225  [pdf, other

    cs.AI cs.RO eess.SP

    Deep UAV Path Planning with Assured Connectivity in Dense Urban Setting

    Authors: Jiyong Oh, Syed M. Raza, Lusungu J. Mwasinga, Moonseong Kim, Hyunseung Choo

    Abstract: Unmanned Ariel Vehicle (UAV) services with 5G connectivity is an emerging field with numerous applications. Operator-controlled UAV flights and manual static flight configurations are major limitations for the wide adoption of scalability of UAV services. Several services depend on excellent UAV connectivity with a cellular network and maintaining it is challenging in predetermined flight paths. T… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures, Published in the 2024 IEEE Network Operations and Management Symposium (NOMS 2024)

  4. arXiv:2406.07923  [pdf, other

    cs.SD cs.AI eess.AS

    CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting

    Authors: Sichen **, Youngmoon Jung, Seung** Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho

    Abstract: This paper introduces a novel approach for streaming openvocabulary keyword spotting (KWS) with text-based keyword enrollment. For every input frame, the proposed method finds the optimal alignment ending at the frame using connectionist temporal classification (CTC) and aggregates the frame-level acoustic embedding (AE) to obtain higher-level (i.e., character, word, or phrase) AE that aligns with… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.06111  [pdf, other

    eess.AS cs.AI cs.SD eess.SP

    JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

    Authors: Hyunjae Cho, Junhyeok Lee, Wonbin Jung

    Abstract: Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent alia… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  6. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  7. arXiv:2406.05314  [pdf, other

    eess.AS cs.AI eess.SP

    Relational Proxy Loss for Audio-Text based Keyword Spotting

    Authors: Youngmoon Jung, Seung** Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho

    Abstract: In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes text input during the enrollment phase and audio input during actual usage, we call this task audio-text based KWS. To enable this task, both acoustic and text encoders are typically trained using deep… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, Accepted by Interspeech 2024

  8. arXiv:2406.02596  [pdf, other

    cs.LG cs.AI

    Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

    Authors: Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, Clare Lyle

    Abstract: This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: accepted to ICML 2024

  9. arXiv:2406.01468  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding Token Probability Encoding in Output Embeddings

    Authors: Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue

    Abstract: In this paper, we investigate the output token probability information in the output embedding of language models. We provide an approximate common log-linear encoding of output token probabilities within the output embedding vectors and demonstrate that it is accurate and sparse when the output space is large and output logits are concentrated. Based on such findings, we edit the encoding in outp… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages, 17 figures, 3 tables

  10. arXiv:2405.20671  [pdf, other

    cs.LG cs.AI cs.CL

    Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

    Authors: Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun

    Abstract: Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absol… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 73 pages, 20 figures, 90 tables

  11. arXiv:2405.15737  [pdf

    cs.SE

    More Insight from Being More Focused: Analysis of Clustered Market Apps

    Authors: Maleknaz Nayebi, Homayoon Farrahi, Ada Lee, Henry Cho, Guenther Ruhe

    Abstract: The increasing attraction of mobile apps has inspired researchers to analyze apps from different perspectives. As with any software product, apps have different attributes such as size, content maturity, rating, category, or number of downloads. Current research studies mostly consider sampling across all apps. This often results in comparisons of apps being quite different in nature and category… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Authors pre-print

  12. arXiv:2405.07414  [pdf, other

    cs.LG cs.AI

    Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

    Authors: Kyungeun Lee, Ye Seul Sim, Hye-Seung Cho, Moonjung Eo, Suhee Yoon, Sanghyu Yoon, Woohyung Lim

    Abstract: The ability of deep networks to learn superior representations hinges on leveraging the proper inductive biases, considering the inherent properties of datasets. In tabular domains, it is critical to effectively handle heterogeneous features (both categorical and numerical) in a unified manner and to grasp irregular functions like piecewise constant functions. To address the challenges in the self… ▽ More

    Submitted 13 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 18 pages (including supplementary materials)

  13. arXiv:2405.03685  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Language-Image Models with 3D Understanding

    Authors: Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

    Abstract: Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Project page: https://janghyuncho.github.io/Cube-LLM

  14. Multi-intent-aware Session-based Recommendation

    Authors: Min** Choi, Hye-young Kim, Hyunsouk Cho, Jongwuk Lee

    Abstract: Session-based recommendation (SBR) aims to predict the following item a user will interact with during an ongoing session. Most existing SBR models focus on designing sophisticated neural-based encoders to learn a session representation, capturing the relationship among session items. However, they tend to focus on the last item, neglecting diverse user intents that may exist within a session. Thi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: SIGIR 2024. 5 pages

  15. arXiv:2404.17598  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    Revealing and Utilizing In-group Favoritism for Graph-based Collaborative Filtering

    Authors: Hoin Jung, Hyunsoo Cho, Myungje Choi, Joowon Lee, Jung Ho Park, Myungjoo Kang

    Abstract: When it comes to a personalized item recommendation system, It is essential to extract users' preferences and purchasing patterns. Assuming that users in the real world form a cluster and there is common favoritism in each cluster, in this work, we introduce Co-Clustering Wrapper (CCW). We compute co-clusters of users and items with co-clustering algorithms and add CF subnetworks for each cluster… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 7 pages, 6 figures

  16. arXiv:2404.10355  [pdf, other

    cs.AR

    AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs

    Authors: Sungjun Cho, Beomjun Kim, Hyunuk Cho, Gyeongseob Seo, Onur Mutlu, Myungsuk Kim, Jisung Park

    Abstract: This work investigates a new erase scheme in NAND flash memory to improve the lifetime and performance of modern solid-state drives (SSDs). In NAND flash memory, an erase operation applies a high voltage (e.g., > 20 V) to flash cells for a long time (e.g., > 3.5 ms), which degrades cell endurance and potentially delays user I/O requests. While a large body of prior work has proposed various techni… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at Proceedings of the 29th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

  17. arXiv:2404.09717  [pdf, other

    cs.CL cs.AI cs.LG

    Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model

    Authors: Hyunsoo Cho

    Abstract: Many recent studies endeavor to improve open-source language models through imitation learning, and re-training on the synthetic instruction data from state-of-the-art proprietary models like ChatGPT and GPT-4. However, the innate nature of synthetic data inherently contains noisy data, giving rise to a substantial presence of low-quality data replete with erroneous responses, and flawed reasoning… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Under review @ *ACL

  18. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  19. arXiv:2403.18771  [pdf, other

    cs.CL

    CheckEval: Robust Evaluation Framework using Large Language Model via Checklist

    Authors: Yukyung Lee, Joonghoon Kim, Jaehee Kim, Hyowon Cho, Pilsung Kang

    Abstract: We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: HEAL at CHI 2024

  20. arXiv:2403.17377  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

    Authors: Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan **, Seungryong Kim

    Abstract: Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project page is available at https://ku-cvlab.github.io/Perturbed-Attention-Guidance

  21. arXiv:2403.09022  [pdf, ps, other

    cs.IT eess.SP

    Smart Resource Allocation at mmWave/THz Frequencies with Cooperative Rate-Splitting

    Authors: Hyesang Cho, Junil Choi

    Abstract: In this paper, we propose algorithms to minimize the energy consumption in millimeter wave/terahertz multi-user downlink communication systems. To ensure coverage in blockage-vulnerable high frequency systems, we consider cooperative rate-splitting (CRS) and transmission over multiple time blocks, where via CRS, multiple users cooperate to assist a blocked user. Moreover, we show that transmission… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures, accepted to IEEE Transactions on Wireless Communications (TWC)

  22. MineXR: Mining Personalized Extended Reality Interfaces

    Authors: Hyunsung Cho, Yukang Yan, Kashyap Todi, Mark Parent, Missie Smith, Tanya R. Jonker, Hrvoje Benko, David Lindlbauer

    Abstract: Extended Reality (XR) interfaces offer engaging user experiences, but their effective design requires a nuanced understanding of user behavior and preferences. This knowledge is challenging to obtain without the widespread adoption of XR devices. We introduce MineXR, a design mining workflow and data analysis platform for collecting and analyzing personalized XR user interaction and experience dat… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 17 pages, 18 figures, Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

    ACM Class: H.5.2

  23. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  24. arXiv:2403.02966  [pdf, other

    cs.CL cs.AI cs.LG

    Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering

    Authors: Sungho Ko, Hyun** Cho, Hyungjoo Chae, **young Yeo, Dongha Lee

    Abstract: Recent studies have investigated utilizing Knowledge Graphs (KGs) to enhance Quesetion Answering (QA) performance of Large Language Models (LLMs), yet structured KG verbalization remains challengin. Existing methods, such as triple-form or free-form textual conversion of triple-form facts, encounter several issues. These include reduced evidence density due to duplicated entities or relationships,… ▽ More

    Submitted 19 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  25. Personalizing Smart Home Privacy Protection With Individuals' Regulatory Focus: Would You Preserve or Enhance Your Information Privacy?

    Authors: Reza Ghaiumy Anaraky, Yao Li, Hichang Cho, Danny Yuxing Huang, Kaileigh A. Byrne, Bart Knijnenburg, Oded Nov

    Abstract: In this study, we explore the effectiveness of persuasive messages endorsing the adoption of a privacy protection technology (IoT Inspector) tailored to individuals' regulatory focus (promotion or prevention). We explore if and how regulatory fit (i.e., tuning the goal-pursuit mechanism to individuals' internal regulatory focus) can increase persuasion and adoption. We conducted a between-subject… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Journal ref: ACM Conference on Human Factors in Computing Systems (CHI2024)

  26. arXiv:2402.17323  [pdf, other

    cs.CV

    SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection

    Authors: Junsu Kim, Hoseong Cho, Jihyeon Kim, Yihalem Yimolal Tiruneh, Seungryul Baek

    Abstract: In the field of class incremental learning (CIL), generative replay has become increasingly prominent as a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models. However, its application in class incremental object detection (CIOD) has been significantly limited, primarily due to the complexities of scenes involving multiple labels. In this pape… ▽ More

    Submitted 7 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accept to CVPR 2024. The camera-ready version

  27. arXiv:2402.17275  [pdf, other

    cs.CV

    One-Shot Structure-Aware Stylized Image Synthesis

    Authors: Hansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong

    Abstract: While GAN-based models have been successful in image stylization tasks, they often struggle with structure preservation while stylizing a wide range of input images. Recently, diffusion models have been adopted for image stylization but still lack the capability to maintain the original quality of input images. Building on this, we propose OSASIS: a novel one-shot stylization method that is robust… ▽ More

    Submitted 1 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  28. arXiv:2402.15180  [pdf, other

    cs.LG cs.CL cs.CR

    Break the Breakout: Reinventing LM Defense Against Jailbreak Attacks with Self-Refinement

    Authors: Heegyu Kim, Sehyun Yuk, Hyunsouk Cho

    Abstract: Caution: This paper includes offensive words that could potentially cause unpleasantness. Language models (LMs) are vulnerable to exploitation for adversarial misuse. Training LMs for safety alignment is extensive and makes it hard to respond to fast-develo** attacks immediately, such as jailbreaks. We propose self-refine with formatting that achieves outstanding safety even in non-safety-aligne… ▽ More

    Submitted 26 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: under review

  29. arXiv:2402.14395  [pdf, other

    cs.CV

    Semantic Image Synthesis with Unconditional Generator

    Authors: Jungwoo Chae, Hyunin Cho, Sooyeon Go, Kyungmook Choi, Youngjung Uh

    Abstract: Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2023, Project Page: https://hhyunn2.github.io/SIS_UncondG/

  30. arXiv:2402.13211  [pdf, other

    cs.CL

    Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

    Authors: Dong** Kang, Sunghwan Kim, Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, **young Yeo

    Abstract: Emotional Support Conversation (ESC) is a task aimed at alleviating individuals' emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have sug… ▽ More

    Submitted 5 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  31. arXiv:2402.10475  [pdf, other

    math.OC cs.LG

    Fundamental Benefit of Alternating Updates in Minimax Optimization

    Authors: Jaewook Lee, Hanseul Cho, Chulhee Yun

    Abstract: The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 77 pages, 2 figures

  32. Making a prototype of Seoul historical sites chatbot using Langchain

    Authors: Jae Young Suh, Minsoo Kwak, Soo Yong Kim, Hyoungseo Cho

    Abstract: In this paper, we are going to share a draft of the development of a conversational agent created to disseminate information about historical sites located in the Seoul. The primary objective of the agent is to increase awareness among visitors who are not familiar with Seoul, about the presence and precise locations of valuable cultural heritage sites. It aims to promote a basic understanding of… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 4 pages, 4 figures, draft

  33. arXiv:2402.04625  [pdf, other

    cs.CV

    Noise Map Guidance: Inversion with Spatial Context for Real Image Editing

    Authors: Hansam Cho, Jonghyun Lee, Seoung Bum Kim, Tae-Hyun Oh, Yonghyun Jeong

    Abstract: Text-guided diffusion models have become a popular tool in image synthesis, known for producing high-quality and diverse images. However, their application to editing real images often encounters hurdles primarily due to the text condition deteriorating the reconstruction quality and subsequently affecting editing fidelity. Null-text Inversion (NTI) has made strides in this area, but it fails to c… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  34. arXiv:2402.03277  [pdf, other

    cs.IR

    Event-based Product Carousel Recommendation with Query-Click Graph

    Authors: Luyi Ma, Nimesh Sinha, Parth Vajge, Jason HD Cho, Sushant Kumar, Kannan Achan

    Abstract: Many current recommender systems mainly focus on the product-to-product recommendations and user-to-product recommendations even during the time of events rather than modeling the typical recommendations for the target event (e.g., festivals, seasonal activities, or social activities) without addressing the multiple aspects of the shop** demands for the target event. Product recommendations for… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures, 2021 IEEE International Conference on Big Data (Big Data)

  35. arXiv:2401.17005  [pdf, other

    cs.AR

    SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation

    Authors: Wontak Han, Hyunjun Cho, Donghyuk Kim, Joo-Young Kim

    Abstract: Text generation is a compelling sub-field of natural language processing, aiming to generate human-readable text from input words. In particular, the decoder-only generative models, such as generative pre-trained transformer (GPT), are widely used for text generation, with two major computational stages: summarization and generation. Unlike the summarization stage, which can process the input toke… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 14 pages, 15 figures

  36. arXiv:2401.14587  [pdf, other

    cs.CV

    CNA-TTA: Clean and Noisy Region Aware Feature Learning within Clusters for Online-Offline Test-Time Adaptation

    Authors: Hyeonwoo Cho, Chanmin Park, **young Kim, Won Hwa Kim

    Abstract: A domain shift occurs when training (source) and test (target) data diverge in their distribution. Test-time adaptation (TTA) addresses the domain shift problem, aiming to adopt a trained model on the source domain to the target domain in a scenario where only a well-trained source model and unlabeled target data are available. In this scenario, handling false labels in the target domain is crucia… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures

  37. arXiv:2401.11505  [pdf, other

    cs.CL cs.IR

    CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling

    Authors: Jawook Gu, Han-Cheol Cho, Jiho Kim, Kihyun You, Eun Kyoung Hong, Byungseok Roh

    Abstract: Free-text radiology reports present a rich data source for various medical tasks, but effectively labeling these texts remains challenging. Traditional rule-based labeling methods fall short of capturing the nuances of diverse free-text patterns. Moreover, models using expert-annotated data are limited by data scarcity and pre-defined classes, impacting their performance, flexibility and scalabili… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 16 pages, 3 figures

  38. ContextMix: A context-aware data augmentation method for industrial visual inspection systems

    Authors: Hyungmin Kim, Donghun Kim, Pyunghwan Ahn, Sungho Suh, Hansang Cho, Junmo Kim

    Abstract: While deep neural networks have achieved remarkable performance, data augmentation has emerged as a crucial strategy to mitigate overfitting and enhance network performance. These techniques hold particular significance in industrial manufacturing contexts. Recently, image mixing-based methods have been introduced, exhibiting improved performance on public benchmark datasets. However, their applic… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to EAAI

  39. arXiv:2401.09048  [pdf, other

    cs.CV

    Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

    Authors: Jonghyun Lee, Hansam Cho, Youngjoon Yoo, Seoung Bum Kim, Yonghyun Jeong

    Abstract: Addressing the limitations of text as a source of accurate layout representation in text-conditional diffusion models, many works incorporate additional signals to condition certain attributes within a generated image. Although successful, previous works do not account for the specific localization of said attributes extended into the three dimensional plane. In this context, we present a conditio… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  40. arXiv:2401.05659  [pdf, other

    cs.HC cs.SE

    Engineering Adaptive Information Graphics for Disabled Communities: A Case Study with Public Space Indoor Maps

    Authors: Anuradha Madugalla, Yutan Huang, John Grundy, Min Hee Cho, Lasith Koswatta Gamage, Tristan Leao, Sam Thiele

    Abstract: Most software applications contain graphics such as charts, diagrams and maps. Currently, these graphics are designed with a ``one size fits all" approach and do not cater to the needs of people with disabilities. Therefore, when using software with graphics, a colour-impaired user may struggle to interpret graphics with certain colours, and a person with dyslexia may struggle to read the text lab… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  41. arXiv:2312.12491  [pdf, other

    cs.CV cs.GR cs.LG

    StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

    Authors: Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Kurt Keutzer

    Abstract: We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throu… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: tech report, the code is available at https://github.com/cumulo-autumn/StreamDiffusion

  42. arXiv:2312.12488  [pdf, other

    cs.LG cs.CR cs.CV

    Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective

    Authors: HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim

    Abstract: Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy f… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: To appear in AAAI 2024

  43. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  44. arXiv:2312.06279  [pdf, other

    cs.LG cs.AI

    Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

    Authors: JeongJun Park, Lusungu J. Mwasinga, Huigyu Yang, Syed M. Raza, Duc-Tai Le, Moonseong Kim, Min Young Chung, Hyunseung Choo

    Abstract: Mobile traffic data in urban regions shows differentiated patterns during different hours of the day. The exploitation of these patterns enables highly accurate mobile traffic prediction for proactive network management. However, recent Deep Learning (DL) driven studies have only exploited spatiotemporal features and have ignored the geographical correlations, causing high complexity and erroneous… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 4 pages, 5 figures, 1 table. This paper is already accepted on IEEE Consumer Communications & Networking Conference(CCNC) 2024

  45. arXiv:2312.06122  [pdf, other

    cs.CL cs.LG

    GTA: Gated Toxicity Avoidance for LM Performance Preservation

    Authors: Heegyu Kim, Hyunsouk Cho

    Abstract: Caution: This paper includes offensive words that could potentially cause unpleasantness. The fast-paced evolution of generative language models such as GPT-4 has demonstrated outstanding results in various NLP generation tasks. However, due to the potential generation of offensive words related to race or gender, various Controllable Text Generation (CTG) methods have been proposed to mitigate th… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to Findings of EMNLP 2023

  46. arXiv:2312.05528  [pdf, other

    eess.IV cs.CV

    Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge

    Authors: Kwang-Hyun Uhm, Hyunjun Cho, Zhixin Xu, Seohoon Lim, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: MICCAI 2023, KITS 2023 challenge 2nd place

  47. arXiv:2312.03798  [pdf, other

    cs.CV

    Single Image Reflection Removal with Reflection Intensity Prior Knowledge

    Authors: Dongshen Han, Seungkyu Lee, Chaoning Zhang, Heechan Yoon, Hyukmin Kwon, HyunCheol Kim, HyonGon Choo

    Abstract: Single Image Reflection Removal (SIRR) in real-world images is a challenging task due to diverse image degradations occurring on the glass surface during light transmission and reflection. Many existing methods rely on specific prior assumptions to resolve the problem. In this paper, we propose a general reflection intensity prior that captures the intensity of the reflection phenomenon and demons… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  48. arXiv:2311.17902  [pdf, other

    cs.CV

    Language-conditioned Detection Transformer

    Authors: Jang Hyun Cho, Philipp Krähenbühl

    Abstract: We present a new open-vocabulary detection framework. Our framework uses both image-level labels and detailed detection annotations when available. Our framework proceeds in three steps. We first train a language-conditioned object detector on fully-supervised detection data. This detector gets to see the presence or absence of ground truth classes during training, and conditions prediction on the… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Code is at https://github.com/janghyuncho/DECOLA

  49. arXiv:2311.11212  [pdf, other

    cs.AI cs.LG

    Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

    Authors: Chanhui Lee, Juhyeon Kim, Yongjun Jeong, Juhyun Lyu, Junghee Kim, Sangmin Lee, Sangjun Han, Hyeokjun Choe, Soyeon Park, Woohyung Lim, Sungbin Lim, Sanghack Lee

    Abstract: Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of r… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    ACM Class: I.2

  50. arXiv:2311.11169  [pdf

    eess.IV cs.AI cs.LG eess.SP

    Deep Coherence Learning: An Unsupervised Deep Beamformer for High Quality Single Plane Wave Imaging in Medical Ultrasound

    Authors: Hyunwoo Cho, Seongjun Park, **bum Kang, Yangmo Yoo

    Abstract: Plane wave imaging (PWI) in medical ultrasound is becoming an important reconstruction method with high frame rates and new clinical applications. Recently, single PWI based on deep learning (DL) has been studied to overcome lowered frame rates of traditional PWI with multiple PW transmissions. However, due to the lack of appropriate ground truth images, DL-based PWI still remains challenging for… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.