Skip to main content

Showing 1–15 of 15 results for author: Song, C H

.
  1. arXiv:2404.01524  [pdf, other

    cs.CV cs.AI

    On Train-Test Class Overlap and Detection for Image Retrieval

    Authors: Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, Yannis Avrithis

    Abstract: How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  2. arXiv:2404.01156  [pdf, other

    cs.CV cs.AI

    SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

    Authors: Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu

    Abstract: Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are no… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  3. arXiv:2402.04476  [pdf, other

    cs.CV cs.AI cs.CL

    Dual-View Visual Contextualization for Web Navigation

    Authors: Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao

    Abstract: Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, mak… ▽ More

    Submitted 30 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  4. arXiv:2311.18803  [pdf, other

    cs.CV cs.CL cs.LG

    BioCLIP: A Vision Foundation Model for the Tree of Life

    Authors: Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su

    Abstract: Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specif… ▽ More

    Submitted 14 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 (oral) camera-ready version; data released

  5. arXiv:2307.13254  [pdf, other

    cs.CV

    Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network

    Authors: Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu

    Abstract: Many studies in vision tasks have aimed to create effective embedding spaces for single-label object prediction within an image. However, in reality, most objects possess multiple specific attributes, such as shape, color, and length, with each attribute composed of various classes. To apply models in real-world scenarios, it is essential to be able to distinguish between the granular components o… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: ICCV 2023 Accepted

  6. arXiv:2305.08174  [pdf, other

    math.NA

    ReSDF: Redistancing Implicit Surfaces using Neural Networks

    Authors: Yesom Park, Chang hoon Song, Jooyoung Hahn, Myungjoo Kang

    Abstract: This paper proposes a deep-learning-based method for recovering a signed distance function (SDF) of a given hypersurface represented by an implicit level set function. Using the flexibility of constructing a neural network, we use an augmented network by defining an auxiliary output to represent the gradient of the SDF. There are three advantages of the augmented network; (i) the target interface… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  7. arXiv:2302.00854  [pdf, other

    cs.LG

    Learning PDE Solution Operator for Continuous Modeling of Time-Series

    Authors: Yesom Park, Jaemoo Choi, Changyeon Yoon, Chang hoon Song, Myungjoo Kang

    Abstract: Learning underlying dynamics from data is important and challenging in many real-world scenarios. Incorporating differential equations (DEs) to design continuous networks has drawn much attention recently, however, most prior works make specific assumptions on the type of DEs, making the model specialized for particular problems. This work presents a partial differential equation (PDE) based frame… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  8. arXiv:2212.04088  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

    Authors: Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su

    Abstract: This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a… ▽ More

    Submitted 30 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: 14 pages, 5 figures

    Report number: ICCV 2023

  9. arXiv:2211.13866  [pdf, ps, other

    stat.ML cs.LG

    Minimal Width for Universal Property of Deep RNN

    Authors: Chang hoon Song, Geonho Hwang, Jun ho Lee, Myungjoo Kang

    Abstract: A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to… ▽ More

    Submitted 28 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  10. arXiv:2210.11909  [pdf, other

    cs.CV cs.IR cs.LG

    Boosting vision transformers for image retrieval

    Authors: Chull Hwan Song, Jooyoung Yoon, Shunghyun Choi, Yannis Avrithis

    Abstract: Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection. However, in instance-level image retrieval, transformers have not yet shown good performance compared to convolutional networks. We propose a number of improvements that make transformers outperform the state of the art for the first time. (1) We show that a hybrid architecture is more… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: WACV 2023

  11. arXiv:2202.07028  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

    Authors: Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su

    Abstract: We study the problem of develo** autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in… ▽ More

    Submitted 10 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: 10 pages, 5 figures. Accepted to CVPR 2022

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15482-15491

  12. hSDB-instrument: Instrument Localization Database for Laparoscopic and Robotic Surgeries

    Authors: Jihun Yoon, Jiwon Lee, Sunghwan Heo, Hayeong Yu, Jayeon Lim, Chi Hyun Song, SeulGi Hong, Seungbum Hong, Bokyung Park, SungHyun Park, Woo ** Hyung, Min-Kook Choi

    Abstract: Automated surgical instrument localization is an important technology to understand the surgical process and in order to analyze them to provide meaningful guidance during surgery or surgical index after surgery to the surgeon. We introduce a new dataset that reflects the kinematic characteristics of surgical instruments for automated surgical instrument localization of surgical videos. The hSDB(h… ▽ More

    Submitted 25 October, 2021; v1 submitted 24 October, 2021; originally announced October 2021.

    Comments: https://hsdb-instrument.github.io

    Journal ref: MICCAI 2021 pp 393-402

  13. arXiv:2107.08000  [pdf, other

    cs.CV

    All the attention you need: Global-local, spatial-channel attention for image retrieval

    Authors: Chull Hwan Song, Hye Joo Han, Yannis Avrithis

    Abstract: We address representation learning for large-scale instance-level image retrieval. Apart from backbone, training pipelines and loss functions, popular approaches have focused on different spatial pooling and attention mechanisms, which are at the core of learning a powerful global image representation. There are different forms of attention according to the interaction of elements of the feature t… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

  14. arXiv:2003.03072  [pdf, other

    cs.CL cs.LG

    Improving Neural Named Entity Recognition with Gazetteers

    Authors: Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield

    Abstract: The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: Short version accepted to the 33rd FLAIRS conference

  15. arXiv:1909.09922  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Using Chinese Glyphs for Named Entity Recognition

    Authors: Arijit Sehanobish, Chan Hee Song

    Abstract: Most Named Entity Recognition (NER) systems use additional features like part-of-speech (POS) tags, shallow parsing, gazetteers, etc. Such kind of information requires external knowledge like unlabeled texts and trained taggers. Adding these features to NER systems have been shown to have a positive impact. However, sometimes creating gazetteers or taggers can take a lot of time and may require ex… ▽ More

    Submitted 11 February, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

    Comments: Extended abstract accepted to AAAI-2020, student track