Skip to main content

Showing 1–50 of 116 results for author: Kim, H J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16275  [pdf, other

    cs.CL

    Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

    Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

    Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 13 tables, under review

  2. arXiv:2406.09905  [pdf, other

    cs.CV cs.GR

    Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

    Authors: Lingni Ma, Yuting Ye, Fangzhou Hong, Vladimir Guzov, Yifeng Jiang, Rowan Postyeni, Luis Pesqueira, Alexander Gamino, Vijay Baiyya, Hyo ** Kim, Kevin Bailey, David Soriano Fosas, C. Karen Liu, Ziwei Liu, Jakob Engel, Renzo De Nardi, Richard Newcombe

    Abstract: We introduce Nymeria - a large-scale, diverse, richly annotated human motion dataset collected in the wild with multiple multimodal egocentric devices. The dataset comes with a) full-body 3D motion ground truth; b) egocentric multimodal recordings from Project Aria devices with RGB, grayscale, eye-tracking cameras, IMUs, magnetometer, barometer, and microphones; and c) an additional "observer" dev… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.08176  [pdf, other

    cs.CV cs.RO

    Category-level Neural Field for Reconstruction of Partially Observed Objects in Indoor Environment

    Authors: Taekbeom Lee, Youngseok Jang, H. ** Kim

    Abstract: Neural implicit representation has attracted attention in 3D reconstruction through various success cases. For further applications such as scene understanding or editing, several works have shown progress towards object compositional reconstruction. Despite their superior performance in observed regions, their performance is still limited in reconstructing objects that are partially observed. To… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: RA-L. 8 pages, 8 figures, 4 tables

  4. arXiv:2405.01361  [pdf, other

    cs.RO

    Haptic-Based Bilateral Teleoperation of Aerial Manipulator for Extracting Wedged Object with Compensation of Human Reaction Time

    Authors: Jeonghyun Byun, Dohyun Eom, H. ** Kim

    Abstract: Bilateral teleoperation of an aerial manipulator facilitates the execution of industrial missions thanks to the combination of the aerial platform's maneuverability and the ability to conduct complex tasks with human supervision. Heretofore, research on such operations has focused on flying without any physical interaction or exerting a pushing force on a contact surface that does not involve abru… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: to be presented in 2024 IEEE International Conference on Unmanned Aircraft Systems (ICUAS), Chania, Crete, Greece, 2024

  5. arXiv:2404.11972  [pdf, other

    cs.CL

    Aligning Language Models to Explicitly Handle Ambiguity

    Authors: Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

    Abstract: In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  6. arXiv:2404.11320  [pdf, other

    cs.RO

    Saturated RISE control for considering rotor thrust saturation of fully actuated multirotor

    Authors: Dongjae Lee, H. ** Kim

    Abstract: This work proposes a saturated robust controller for a fully actuated multirotor that takes disturbance rejection and rotor thrust saturation into account. A disturbance rejection controller is required to prevent performance degradation in the presence of parametric uncertainty and external disturbance. Furthermore, rotor saturation should be properly addressed in a controller to avoid performanc… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures, 2024 International Conference on Unmanned Aircraft Systems (ICUAS) accepted

  7. arXiv:2404.11310  [pdf, other

    cs.RO

    Autonomous aerial perching and unperching using omnidirectional tiltrotor and switching controller

    Authors: Dongjae Lee, Sunwoo Hwang, Jeonghyun Byun, Seung Jae Lee, H. ** Kim

    Abstract: Aerial unperching of multirotors has received little attention as opposed to perching that has been investigated to elongate operation time. This study presents a new aerial robot capable of both perching and unperching autonomously on/from a ferromagnetic surface during flight, and a switching controller to avoid rotor saturation and mitigate overshoot during transition between free-flight and pe… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 7 pages, 10 figures, 2024 IEEE International Conference on Robotics and Automation (ICRA) accepted

  8. arXiv:2404.11104  [pdf, other

    cs.CV

    Object Remover Performance Evaluation Methods using Class-wise Object Removal Images

    Authors: Changsuk Oh, Dongseok Shim, Taekbeom Lee, H. ** Kim

    Abstract: Object removal refers to the process of erasing designated objects from an image while preserving the overall appearance, and it is one area where image inpainting is widely used in real-world applications. The performance of an object remover is quantitatively evaluated by measuring the quality of object removal results, similar to how the performance of an image inpainter is gauged. Current work… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  9. arXiv:2404.05687  [pdf, other

    cs.CV

    Retrieval-Augmented Open-Vocabulary Object Detection

    Authors: Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim

    Abstract: Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose R… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted paper at CVPR 2024

  10. arXiv:2404.00851  [pdf, other

    cs.CV

    Prompt Learning via Meta-Regularization

    Authors: **young Park, Juyeon Ko, Hyunwoo J. Kim

    Abstract: Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks. However, most existing prompt learning methods suffer from task overfitting since the general knowledge of th… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  11. arXiv:2403.17709  [pdf, other

    cs.CV

    Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

    Authors: Jongha Kim, Jihwan Park, **young Park, **young Kim, Sehyung Kim, Hyunwoo J. Kim

    Abstract: Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of map** a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  12. arXiv:2403.13347  [pdf, other

    cs.CV

    vid-TLDR: Training Free Token merging for Light-weight Video Transformer

    Authors: Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim

    Abstract: Video Transformers have become the prevalent solution for various video downstream tasks with superior expressive power and flexibility. However, these video transformers suffer from heavy computational costs induced by the massive number of tokens across the entire video frames, which has been the major barrier to training the model. Further, the patches irrelevant to the main contents, e.g., bac… ▽ More

    Submitted 30 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  13. arXiv:2403.10030  [pdf, other

    cs.CV

    Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers

    Authors: Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim

    Abstract: Vision Transformer (ViT) has emerged as a prominent backbone for computer vision. For more efficient ViTs, recent works lessen the quadratic cost of the self-attention layer by pruning or fusing the redundant tokens. However, these works faced the speed-accuracy trade-off caused by the loss of information. Here, we argue that token fusion needs to consider diverse relations between tokens to minim… ▽ More

    Submitted 1 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  14. arXiv:2403.06397  [pdf, other

    cs.LG cs.AI eess.SY

    DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning

    Authors: Xuefeng Wang, Henglin Pu, Hyung Jun Kim, Husheng Li

    Abstract: Safe Multi-agent reinforcement learning (safe MARL) has increasingly gained attention in recent years, emphasizing the need for agents to not only optimize the global return but also adhere to safety requirements through behavioral constraints. Some recent work has integrated control theory with multi-agent reinforcement learning to address the challenge of ensuring safety. However, there have bee… ▽ More

    Submitted 11 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures

  15. arXiv:2403.03181  [pdf, other

    cs.LG cs.AI cs.RO

    Behavior Generation with Latent Actions

    Authors: Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. ** Kim, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

    Abstract: Generative modeling of complex behaviors from labeled datasets has been a longstanding problem in decision making. Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction. A recent class of models calle… ▽ More

    Submitted 28 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Github repo: https://github.com/jayLEE0301/vq_bet_official

  16. arXiv:2402.16506  [pdf, other

    cs.CV

    Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis

    Authors: Juyeon Ko, Inho Kong, Dogyun Park, Hyunwoo J. Kim

    Abstract: Semantic image synthesis (SIS) is a task to generate realistic images corresponding to semantic maps (labels). However, in real-world applications, SIS often encounters noisy user inputs. To address this, we propose Stochastic Conditional Diffusion Model (SCDM), which is a robust conditional diffusion model that features novel forward and generation processes tailored for SIS with noisy labels. It… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  17. arXiv:2402.14579  [pdf, other

    cs.CV cs.CL cs.LG

    Text Role Classification in Scientific Charts Using Multimodal Transformers

    Authors: Hye ** Kim, Nicolas Lell, Ansgar Scherp

    Abstract: Text role classification involves classifying the semantic role of textual elements within scientific charts. For this task, we propose to finetune two pretrained multimodal document layout analysis models, LayoutLMv3 and UDOP, on chart datasets. The transformers utilize the three modalities of text, image, and layout as input. We further investigate whether data augmentation and balancing methods… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  18. arXiv:2401.12517  [pdf, other

    cs.LG stat.ML

    DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations

    Authors: Dogyun Park, Sihyeon Kim, So** Lee, Hyunwoo J. Kim

    Abstract: Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize… ▽ More

    Submitted 20 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  19. arXiv:2312.12664  [pdf, other

    cs.CV

    UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection

    Authors: Bumsoo Kim, Taeho Choi, Jaewoo Kang, Hyunwoo J. Kim

    Abstract: Recent advances in deep neural networks have achieved significant progress in detecting individual objects from an image. However, object detection is not sufficient to fully understand a visual scene. Towards a deeper visual understanding, the interactions between objects, especially humans and objects are essential. Most prior works have obtained this information with a bottom-up approach, where… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: ECCV 2020

  20. arXiv:2311.09762  [pdf, other

    cs.CL cs.AI cs.LG

    Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models

    Authors: **young Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, Joo-Kyung Kim

    Abstract: Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-q… ▽ More

    Submitted 22 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Preprint

  21. arXiv:2311.03784  [pdf, other

    cs.CV cs.LG

    UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields

    Authors: Injae Kim, Minhyuk Choi, Hyunwoo J. Kim

    Abstract: Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. However, these works are limited to relatively simple settings such as photometrically consistent and occluder-free image collections or a sequence of images from a vid… ▽ More

    Submitted 7 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Neural Information Processing Systems (NeurIPS), 2023. The code is available at https://github.com/mlvlab/UP-NeRF

  22. arXiv:2310.20258  [pdf, other

    cs.LG

    Advancing Bayesian Optimization via Learning Correlated Latent Space

    Authors: Seunghun Lee, Jaewon Chu, Sihyeon Kim, Juyeon Ko, Hyunwoo J. Kim

    Abstract: Bayesian optimization is a powerful method for optimizing black-box functions with limited function evaluations. Recent works have shown that optimization in a latent space through deep generative models such as variational autoencoders leads to effective and efficient Bayesian optimization for structured or discrete data. However, as the optimization does not take place in the input space, it lea… ▽ More

    Submitted 19 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

  23. arXiv:2310.19261  [pdf, other

    cs.LG

    Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

    Authors: Daesol Cho, Seungjae Lee, H. ** Kim

    Abstract: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C r… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  24. arXiv:2310.19185  [pdf, other

    cs.RO

    Robotic Barrier Construction through Weaved, Inflatable Tubes

    Authors: H. J. Kim, H. Abdel-Raziq, X. Liu, A. Y. Siskovic, S. Patil, K. H. Petersen, H. L. Kao

    Abstract: In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as bar… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  25. arXiv:2310.17330  [pdf, other

    cs.LG cs.AI

    CQM: Curriculum Reinforcement Learning with a Quantized World Model

    Authors: Seungjae Lee, Daesol Cho, Jonghae Park, H. ** Kim

    Abstract: Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we p… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  26. arXiv:2310.15747  [pdf, other

    cs.CV

    Large Language Models are Temporal and Causal Reasoners for Video Question Answering

    Authors: Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim

    Abstract: Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA). However, such priors often cause suboptimal results on VideoQA by leading the model to over-rel… ▽ More

    Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted paper at EMNLP 2023 Main

  27. arXiv:2310.15484  [pdf, other

    cs.CL cs.AI

    NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

    Authors: Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim

    Abstract: Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the f… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Neural Information Processing Systems (NeurIPS) 2023

  28. arXiv:2310.14849  [pdf, other

    cs.CL

    Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

    Authors: Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim

    Abstract: When deploying machine learning systems to the wild, it is highly desirable for them to effectively leverage prior knowledge to the unfamiliar domain while also firing alarms to anomalous inputs. In order to address these requirements, Universal Domain Adaptation (UniDA) has emerged as a novel research area in computer vision, focusing on achieving both adaptation ability and robustness (i.e., the… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  29. arXiv:2309.03406  [pdf, other

    cs.CV

    Distribution-Aware Prompt Tuning for Vision-Language Models

    Authors: Eulrang Cho, Jooyeon Kim, Hyunwoo J. Kim

    Abstract: Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV2023

  30. Distributed multi-agent target search and tracking with Gaussian process and reinforcement learning

    Authors: Jigang Kim, Dohyun Jang, H. ** Kim

    Abstract: Deploying multiple robots for target search and tracking has many practical applications, yet the challenge of planning over unknown or partially known targets remains difficult to address. With recent advances in deep learning, intelligent control techniques such as reinforcement learning have enabled agents to learn autonomously from environment interactions with little to no prior knowledge. Su… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: 10 pages, 6 figures; preprint submitted to IJCAS; first two authors contributed equally

    Journal ref: International Journal of Control, Automation, and Systems 2023 21(9): 3057-3067

  31. arXiv:2308.14960  [pdf, other

    cs.CV

    Read-only Prompt Optimization for Vision-Language Few-shot Learning

    Authors: Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyung Choi, Sanghyeok Lee, Hyunwoo J. Kim

    Abstract: In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while kee** pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and ge… ▽ More

    Submitted 9 November, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV2023

  32. arXiv:2308.13561  [pdf, other

    cs.HC cs.CV

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

    Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More

    Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  33. arXiv:2308.11920  [pdf, other

    cs.CV cs.AI

    Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification

    Authors: Injae Kim, Jongha Kim, Joonmyung Choi, Hyunwoo J. Kim

    Abstract: Interpretability is a crucial factor in building reliable models for various medical applications. Concept Bottleneck Models (CBMs) enable interpretable image classification by utilizing human-understandable concepts as intermediate targets. Unlike conventional methods that require extensive human labor to construct the concept set, recent works leveraging Large Language Models (LLMs) for generati… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to MedAGI Workshop at MICCAI 2023 (Oral Presentation)

  34. arXiv:2308.11916  [pdf, other

    cs.CV

    Semantic-Aware Implicit Template Learning via Part Deformation Consistency

    Authors: Sihyeon Kim, Minseok Joo, Jaewon Lee, Juyeon Ko, Juhan Cha, Hyunwoo J. Kim

    Abstract: Learning implicit templates as neural fields has recently shown impressive performance in unsupervised shape correspondence. Despite the success, we observe current approaches, which solely rely on geometric information, often learn suboptimal deformation across generic object shapes, which have high structural variability. In this paper, we highlight the importance of part deformation consistency… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: ICCV camera-ready version

  35. arXiv:2308.09363  [pdf, other

    cs.CV

    Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

    Authors: Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim

    Abstract: Video Question Answering (VideoQA) is a challenging task that entails complex multi-modal reasoning. In contrast to multiple-choice VideoQA which aims to predict the answer given several options, the goal of open-ended VideoQA is to answer questions without restricting candidate answers. However, the majority of previous VideoQA models formulate open-ended VideoQA as a classification task to class… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted paper at ICCV 2023

  36. arXiv:2308.05334  [pdf, other

    cs.RO

    Visibility-Constrained Control of Multirotor via Reference Governor

    Authors: Dabin Kim, Matthias Pezzutto, Luca Schenato, H. ** Kim

    Abstract: For safe vision-based control applications, perception-related constraints have to be satisfied in addition to other state constraints. In this paper, we deal with the problem where a multirotor equipped with a camera needs to maintain the visibility of a point of interest while tracking a reference given by a high-level planner. We devise a method based on reference governor that, differently fro… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: 8 pages, 6 figures, Accepted to 62nd IEEE Conference on Decision and Control (CDC 2023)

  37. arXiv:2306.14425  [pdf, other

    cs.RO

    Minimally actuated tiltrotor for perching and normal force exertion

    Authors: Dongjae Lee, Sunwoo Hwang, Changhyeon Kim, Seung Jae Lee, H. ** Kim

    Abstract: This study presents a new hardware design and control of a minimally actuated 5 control degrees of freedom (CDoF) quadrotor-based tiltrotor. The proposed tiltrotor possesses several characteristics distinct from those found in existing works, including: 1) minimal number of actuators for 5 CDoF, 2) large margin to generate interaction force during aerial physical interaction (APhI), and 3) no mech… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: 7 pages, 10 figures, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) accepted

  38. arXiv:2305.09943  [pdf, other

    cs.LG cs.AI cs.RO

    Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum

    Authors: Jigang Kim, Daesol Cho, H. ** Kim

    Abstract: While reinforcement learning (RL) has achieved great success in acquiring complex skills solely from environmental interactions, it assumes that resets to the initial state are readily available at the end of each episode. Such an assumption hinders the autonomous learning of embodied agents due to the time-consuming and cumbersome workarounds for resetting in the physical world. Hence, there has… ▽ More

    Submitted 8 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: ICML 2023, first two authors contributed equally

  39. arXiv:2305.07857  [pdf, other

    cs.CV

    AURA : Automatic Mask Generator using Randomized Input Sampling for Object Removal

    Authors: Changsuk Oh, Dongseok Shim, H. ** Kim

    Abstract: The objective of the image inpainting task is to fill missing regions of an image in a visually plausible way. Recently, deep-learning-based image inpainting networks have generated outstanding results, and some utilize their models as object removers by masking unwanted objects in an image. However, while trying to better remove objects using their networks, the previous works pay less attention… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

  40. arXiv:2303.16450  [pdf, other

    cs.CV

    Self-positioning Point-based Transformer for Point Cloud Understanding

    Authors: **young Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, Hyunwoo J. Kim

    Abstract: Transformers have shown superior performance on various computer vision tasks with their capabilities to capture long-range dependencies. Despite the success, it is challenging to directly apply Transformers on point clouds due to their quadratic cost in the number of points. In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and g… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted paper at CVPR 2023

  41. arXiv:2303.13009  [pdf, other

    cs.CV

    MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

    Authors: Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim

    Abstract: Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted for fine-tuning. However, such fine-tuning methods do not fully leverage other losses that are potentially beneficial for the target task. Therefore, we… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted paper at CVPR 2023

  42. k-SALSA: k-anonymous synthetic averaging of retinal images via local style alignment

    Authors: Minkyu Jeon, Hyeon** Park, Hyunwoo J. Kim, Michael Morley, Hyunghoon Cho

    Abstract: The application of modern machine learning to retinal image analyses offers valuable insights into a broad range of human health conditions beyond ophthalmic diseases. Additionally, data sharing is key to fully realizing the potential of machine learning models by providing a rich and diverse collection of training data. However, the personally-identifying nature of retinal images, encompassing th… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: European Conference on Computer Vision (ECCV), 2022

  43. arXiv:2303.07872  [pdf, other

    cs.CV

    Object-based SLAM utilizing unambiguous pose parameters considering general symmetry types

    Authors: Taekbeom Lee, Youngseok Jang, H. ** Kim

    Abstract: Existence of symmetric objects, whose observation at different viewpoints can be identical, can deteriorate the performance of simultaneous localization and map**(SLAM). This work proposes a system for robustly optimizing the pose of cameras and objects even in the presence of symmetric objects. We classify objects into three categories depending on their symmetry characteristics, which is effic… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: This paper has been accepted to ICRA 2023

  44. arXiv:2303.03966  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Semantic-aware Occlusion Filtering Neural Radiance Fields in the Wild

    Authors: Jaewon Lee, Injae Kim, Hwan Heo, Hyunwoo J. Kim

    Abstract: We present a learning framework for reconstructing neural scene representations from a small number of unconstrained tourist photos. Since each image contains transient occluders, decomposing the static and transient components is necessary to construct radiance fields with such in-the-wild photographs where existing methods require a lot of training data. We introduce SF-NeRF, aiming to disentang… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

    Comments: 11 pages, 5 figures

  45. arXiv:2302.14273  [pdf, other

    cs.RO eess.SY

    QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking

    Authors: Yunwoo Lee, Jungwon Park, Seungwoo Jung, Boseong Jeon, Dahyun Oh, H. ** Kim

    Abstract: Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the pred… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: 15 pages, 13 figures

  46. arXiv:2302.01571  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Robust Camera Pose Refinement for Multi-Resolution Hash Encoding

    Authors: Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J. Kim, **-Hwa Kim

    Abstract: Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF. This method requires accurate camera poses for the neural renderings of given scenes. However, contrary to previous methods jointly optimizing camera poses and 3D scenes, the naive gradient-based camera pose refinement method using multi-resolution hash encoding severely d… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  47. arXiv:2302.00980  [pdf, other

    cs.CV cs.AI cs.LG

    Domain Generalization Emerges from Dreaming

    Authors: Hwan Heo, Young** Oh, Jaewon Lee, Hyunwoo J. Kim

    Abstract: Recent studies have proven that DNNs, unlike human vision, tend to exploit texture information rather than shape. Such texture bias is one of the factors for the poor generalization performance of DNNs. We observe that the texture bias negatively affects not only in-domain generalization but also out-of-distribution generalization, i.e., Domain Generalization. Motivated by the observation, we prop… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 23 pages, 4 figures

  48. arXiv:2301.11741  [pdf, other

    cs.LG cs.AI cs.RO

    Outcome-directed Reinforcement Learning by Uncertainty & Temporal Distance-Aware Curriculum Goal Generation

    Authors: Daesol Cho, Seungjae Lee, H. ** Kim

    Abstract: Current reinforcement learning (RL) often suffers when solving a challenging exploration problem where the desired outcomes or high rewards are rarely observed. Even though curriculum RL, a framework that solves complex tasks by proposing a sequence of surrogate tasks, shows reasonable results, most of the previous works still have difficulty in proposing curriculum due to the absence of a mechani… ▽ More

    Submitted 20 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: ICLR 2023 Spotlight. First two authors contributed equally

  49. arXiv:2301.11660  [pdf, other

    cs.CL

    Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning

    Authors: Hyunsoo Cho, Choonghyun Park, Junyeop Kim, Hyuhng Joon Kim, Kang Min Yoo, Sang-goo Lee

    Abstract: As the size of the pre-trained language model (PLM) continues to increase, numerous parameter-efficient transfer learning methods have been proposed recently to compensate for the tremendous cost of fine-tuning. Despite the impressive results achieved by large pre-trained language models (PLMs) and various parameter-efficient transfer learning (PETL) methods on sundry benchmarks, it remains unclea… ▽ More

    Submitted 13 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: *SEM 2023

  50. arXiv:2301.11520  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning

    Authors: Dongseok Shim, Seungjae Lee, H. ** Kim

    Abstract: As previous representations for reinforcement learning cannot effectively incorporate a human-intuitive understanding of the 3D environment, they usually suffer from sub-optimal performances. In this paper, we present Semantic-aware Neural Radiance Fields for Reinforcement Learning (SNeRL), which jointly optimizes semantic-aware neural radiance fields (NeRF) with a convolutional encoder to learn 3… ▽ More

    Submitted 31 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: ICML 2023. First two authors contributed equally. Order was determined by coin flip