Skip to main content

Showing 1–16 of 16 results for author: Im, W

.
  1. arXiv:2406.15755  [pdf, other

    cs.CV cs.AI

    Fine-grained Background Representation for Weakly Supervised Semantic Segmentation

    Authors: Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon

    Abstract: Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper pr… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.06163  [pdf, other

    cs.CV

    Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation

    Authors: Juhyeong Seon, Woobin Im, Sebin Lee, Jumin Lee, Sung-Eui Yoon

    Abstract: Audio-visual segmentation (AVS) aims to segment sound sources in the video sequence, requiring a pixel-level understanding of audio-visual correspondence. As the Segment Anything Model (SAM) has strongly impacted extensive fields of dense prediction problems, prior works have investigated the introduction of SAM into AVS with audio as a new modality of the prompt. Nevertheless, constrained by SAM'… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ICIP 2024

  3. arXiv:2403.07773  [pdf, other

    cs.CV

    SemCity: Semantic Scene Generation with Triplane Diffusion

    Authors: Jumin Lee, Sebin Lee, Changho Jo, Woobin Im, Juhyeong Seon, Sung-Eui Yoon

    Abstract: We present "SemCity," a 3D diffusion model for semantic scene generation in real-world outdoor environments. Most 3D diffusion models focus on generating a single object, synthetic indoor scenes, or synthetic outdoor scenes, while the generation of real-world outdoor scenes is rarely addressed. In this paper, we concentrate on generating a real-outdoor scene through learning a diffusion model on a… ▽ More

    Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  4. arXiv:2403.03662  [pdf, other

    cs.CV

    Harnessing Meta-Learning for Improving Full-Frame Video Stabilization

    Authors: Muhammad Kashif Ali, Eun Woo Im, Dong** Kim, Tae Hyun Kim

    Abstract: Video stabilization is a longstanding computer vision problem, particularly pixel-level synthesis solutions for video stabilization which synthesize full frames add to the complexity of this task. These techniques aim to stabilize videos by synthesizing full frames while enhancing the stability of the considered video. This intensifies the complexity of the task due to the distinct mix of unique m… ▽ More

    Submitted 8 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: CVPR 2024, Code will be made availble on: http://github.com/MKashifAli/MetaVideoStab

  5. arXiv:2310.12189  [pdf, other

    cs.CV

    Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation

    Authors: Bosang Kim, Jonghyun Kim, Hyotae Lee, Lanying **, Jeongwon Ha, Dowoo Kwon, Jungpyo Kim, Wonhyeok Im, KyungMin **, Jungho Lee

    Abstract: In general, hand pose estimation aims to improve the robustness of model performance in the real-world scenes. However, it is difficult to enhance the robustness since existing datasets are obtained in restricted environments to annotate 3D information. Although neural networks quantitatively achieve a high estimation accuracy, unsatisfied results can be observed in visual quality. This discrepanc… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  6. arXiv:2305.01167  [pdf, other

    cs.CV

    Hybrid model for Single-Stage Multi-Person Pose Estimation

    Authors: Jonghyun Kim, Bosang Kim, Hyotae Lee, Jungpyo Kim, Wonhyeok Im, Lanying **, Dowoo Kwon, Jungho Lee

    Abstract: In general, human pose estimation methods are categorized into two approaches according to their architectures: regression (i.e., heatmap-free) and heatmap-based methods. The former one directly estimates precise coordinates of each keypoint using convolutional and fully-connected layers. Although this approach is able to detect overlapped and dense keypoints, unexpected results can be obtained by… ▽ More

    Submitted 18 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  7. arXiv:2301.00527  [pdf, other

    cs.CV

    Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data

    Authors: Jumin Lee, Woobin Im, Sebin Lee, Sung-Eui Yoon

    Abstract: In this paper, we learn a diffusion model to generate 3D data on a scene-scale. Specifically, our model crafts a 3D scene consisting of multiple objects, while recent diffusion research has focused on a single object. To realize our goal, we represent a scene with discrete class labels, i.e., categorical distribution, to assign multiple objects into semantic categories. Thus, we extend discrete di… ▽ More

    Submitted 2 January, 2023; originally announced January 2023.

  8. arXiv:2212.05904  [pdf

    physics.optics

    Chiral Metafilms and Surface Enhanced Raman Scattering For Enantiomeric Discrimination of Helicoid Nanoparticles

    Authors: Martin Kartau, Anastasia Skvortsova, Victor Tabouillot, Rahul Kumar, Polina Bainovab, Vasilii Burtsev, Vaclav Svorcik, Nikolaj Gadegaard, Sang Won Im, Marie Urbanova, Oleksiy Lyutakov, Malcolm Kadodwala, Affar S. Karimullah

    Abstract: Chiral nanophotonic platforms provide a means of creating near fields with both enhanced asymmetric properties and intensities. They can be exploited for optical measurements that allow enantiomeric discrimination at detection levels greater than 6 orders of magnitude than is achieved with conventional chirally sensitive spectroscopic methods based on circularly polarized light. The optimal approa… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  9. arXiv:2207.10314  [pdf, other

    cs.CV

    Semi-Supervised Learning of Optical Flow by Flow Supervisor

    Authors: Woobin Im, Sebin Lee, Sung-Eui Yoon

    Abstract: A training pipeline for optical flow CNNs consists of a pretraining stage on a synthetic dataset followed by a fine tuning stage on a target dataset. However, obtaining ground truth flows from a target video requires a tremendous effort. This paper proposes a practical fine tuning method to adapt a pretrained model to a target dataset without ground truth flows, which has not been explored extensi… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  10. arXiv:2207.01213  [pdf

    cond-mat.mtrl-sci

    Strain and Crystallographic Identification of the Helically Concaved Surfaces of Nanoparticles

    Authors: Sungwook Choi, Sang Won Im, Ji-Hyeok Huh, Sungwon Kim, Jaeseung Kim, Yae-Chan Lim, Ryeong Myeong Kim, Jeong Hyun Han, Hyeohn Kim, Michael Sprung, Su Yong Lee, Wonsuk Cha, Ross Harder, Seungwoo Lee, Ki Tae Nam, Hyunjung Kim

    Abstract: Identifying the three-dimensional (3D) crystal-plane and strain-field distributions of nanocrystals is essential for optical, catalytic, and electronic applications. Here, we developed a methodology for visualizing the 3D information of chiral gold nanoparticles with concave gap structures by Bragg coherent X-ray diffraction imaging. The distribution of the high-Miller-index planes constituting th… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Sungwook Choi and Sang Won Im contributed equally to this work. Corresponding author. Email: [email protected], [email protected]

  11. arXiv:2106.13953  [pdf, other

    cs.CV

    In-N-Out: Towards Good Initialization for Inpainting and Outpainting

    Authors: Changho Jo, Woobin Im, Sung-Eui Yoon

    Abstract: In computer vision, recovering spatial information by filling in masked regions, e.g., inpainting, has been widely investigated for its usability and wide applicability to other various applications: image inpainting, image extrapolation, and environment map estimation. Most of them are studied separately depending on the applications. Our focus, however, is on accommodating the opposite task, e.g… ▽ More

    Submitted 17 September, 2021; v1 submitted 26 June, 2021; originally announced June 2021.

    Comments: 14 pages (10 pages without references), 7 figures

  12. arXiv:1907.05006  [pdf, other

    cs.CV

    Two-stream Spatiotemporal Feature for Video QA Task

    Authors: Chiwan Song, Woobin Im, Sung-eui Yoon

    Abstract: Understanding the content of videos is one of the core techniques for develo** various helpful applications in the real world, such as recognizing various human actions for surveillance systems or customer behavior analysis in an autonomous shop. However, understanding the content or story of the video still remains a challenging problem due to its sheer amount of data and temporal structure. In… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

    Comments: 8 pages

  13. arXiv:1704.06761  [pdf, other

    cs.CV

    Content-Based Video-Music Retrieval Using Soft Intra-Modal Structure Constraint

    Authors: Sungeun Hong, Woobin Im, Hyun S. Yang

    Abstract: Up to now, only limited research has been conducted on cross-modal retrieval of suitable music for a specified video or vice versa. Moreover, much of the existing research relies on metadata such as keywords, tags, or associated description that must be individually produced and attached posterior. This paper introduces a new content-based, cross-modal retrieval method for video and music that is… ▽ More

    Submitted 1 September, 2017; v1 submitted 22 April, 2017; originally announced April 2017.

    Comments: 13 pages, 9 figures, 4 tables, supplementary material link >> https://youtu.be/ZyINqDMo3Fg

  14. arXiv:1702.04479  [pdf, other

    cs.CV

    Recognizing Dynamic Scenes with Deep Dual Descriptor based on Key Frames and Key Segments

    Authors: Sungeun Hong, Jongbin Ryu, Woobin Im, Hyun S. Yang

    Abstract: Recognizing dynamic scenes is one of the fundamental problems in scene understanding, which categorizes moving scenes such as a forest fire, landslide, or avalanche. While existing methods focus on reliable capturing of static and dynamic information, few works have explored frame selection from a dynamic scene sequence. In this paper, we propose dynamic scene recognition using a deep dual descrip… ▽ More

    Submitted 16 February, 2017; v1 submitted 15 February, 2017; originally announced February 2017.

    Comments: 10 pages, 7 figures, 8 tables

  15. arXiv:1702.04069  [pdf, other

    cs.CV

    SSPP-DAN: Deep Domain Adaptation Network for Face Recognition with Single Sample Per Person

    Authors: Sungeun Hong, Woobin Im, Jongbin Ryu, Hyun S. Yang

    Abstract: Real-world face recognition using a single sample per person (SSPP) is a challenging task. The problem is exacerbated if the conditions under which the gallery image and the probe set are captured are completely different. To address these issues from the perspective of domain adaptation, we introduce an SSPP domain adaptation network (SSPP-DAN). In the proposed approach, domain adaptation, featur… ▽ More

    Submitted 28 April, 2018; v1 submitted 13 February, 2017; originally announced February 2017.

    Comments: Accepted to ICIP 2017 Oral, Code is available at https://github.com/csehong/SSPP-DAN

  16. arXiv:1612.08354  [pdf, other

    cs.CV cs.CL cs.LG

    Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation

    Authors: Gwangbeen Park, Woobin Im

    Abstract: We present novel method for image-text multi-modal representation learning. In our knowledge, this work is the first approach of applying adversarial learning concept to multi-modal learning and not exploiting image-text pair information to learn multi-modal feature. We only use category information in contrast with most previous methods using image-text pair information for multi-modal embedding.… ▽ More

    Submitted 26 December, 2016; originally announced December 2016.

    Comments: 8 pages, 5 figures