Skip to main content

Showing 1–19 of 19 results for author: Sugimoto, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.15879  [pdf, other

    cs.CV

    EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

    Authors: Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

    Abstract: Large language models (LLMs)-based image captioning has the capability of describing objects not explicitly observed in training data; yet novel objects occur frequently, necessitating the requirement of sustaining up-to-date object knowledge for open-world comprehension. Instead of relying on large amounts of data and/or scaling up network parameters, we introduce a highly effective retrieval-aug… ▽ More

    Submitted 7 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  2. arXiv:2304.06602  [pdf, other

    cs.CV

    A-CAP: Anticipation Captioning with Commonsense Knowledge

    Authors: Duc Minh Vo, Quoc-An Luong, Akihiro Sugimoto, Hideki Nakayama

    Abstract: Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time. In order to emulate this ability, we introduce a novel task called Anticipation Captioning, which generates a caption for an unseen oracle image using a sparsely temporally-ordered set of images. To tackle this new task, we propose a model called A-CAP, which incorporates commonse… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2023

  3. arXiv:2304.06053  [pdf, other

    cs.CV

    TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval

    Authors: Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran, Tuan-Anh Yang, Kim-Phat Tran, Nhu-Vinh Hoang, Minh-Quang Nguyen, E-Ro Nguyen, Minh-Khoi Nguyen-Nhat, Tuan-An To, Trung-Truc Huynh-Le, Nham-Tan Nguyen, Hoang-Chau Luong , et al. (8 additional authors not shown)

    Abstract: 3D object retrieval is an important yet challenging task that has drawn more and more attention in recent years. While existing approaches have made strides in addressing this issue, they are often limited to restricted settings such as image and sketch queries, which are often unfriendly interactions for common users. In order to overcome these limitations, this paper presents a novel SHREC chall… ▽ More

    Submitted 9 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted to Computers and Graphics (3DOR, Journal Track)

  4. arXiv:2304.05731  [pdf, other

    cs.CV

    SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval

    Authors: Trung-Nghia Le, Tam V. Nguyen, Minh-Quan Le, Trong-Thuan Nguyen, Viet-Tham Huynh, Trong-Le Do, Khanh-Duy Le, Mai-Khiem Tran, Nhat Hoang-Xuan, Thang-Long Nguyen-Ho, Vinh-Tiep Nguyen, Nhat-Quynh Le-Pham, Huu-Phuc Pham, Trong-Vu Hoang, Quang-Binh Nguyen, Trong-Hieu Nguyen-Mau, Tuan-Luc Huynh, Thanh-Danh Le, Ngoc-Linh Nguyen-Ha, Tuong-Vy Truong-Thuy, Truong Hoai Phong, Tuong-Nghiem Diep, Khanh-Duy Ho, Xuan-Hieu Nguyen, Thien-Phuc Tran , et al. (9 additional authors not shown)

    Abstract: The retrieval of 3D objects has gained significant importance in recent years due to its broad range of applications in computer vision, computer graphics, virtual reality, and augmented reality. However, the retrieval of 3D objects presents significant challenges due to the intricate nature of 3D models, which can vary in shape, size, and texture, and have numerous polygons and vertices. To this… ▽ More

    Submitted 9 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted to Computers & Graphics (3DOR 2023, Journal track)

  5. arXiv:2203.14499  [pdf, other

    cs.CV

    NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

    Authors: Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama

    Abstract: Novel object captioning aims at describing objects absent from training data, with the key ingredient being the provision of object vocabulary to the model. Although existing methods heavily rely on an object detection model, we view the detection step as vocabulary retrieval from an external knowledge in the form of embeddings for any object's definition from Wiktionary, where we use in the retri… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  6. ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation

    Authors: Khoa Vo, Kashu Yamazaki, Sang Truong, Minh-Triet Tran, Akihiro Sugimoto, Ngan Le

    Abstract: Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding. Despite the great achievement in TAPG, most existing works ignore the human perception of interaction between agents and the surrounding environment by applying a deep learning model as a… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted in the journal of IEEE Access Vol. 9

  7. arXiv:2203.08456  [pdf, other

    cs.CV

    PPCD-GAN: Progressive Pruning and Class-Aware Distillation for Large-Scale Conditional GANs Compression

    Authors: Duc Minh Vo, Akihiro Sugimoto, Hideki Nakayama

    Abstract: We push forward neural network compression research by exploiting a novel challenging task of large-scale conditional generative adversarial networks (GANs) compression. To this end, we propose a gradually shrinking GAN (PPCD-GAN) by introducing progressive pruning residual block (PP-Res) and class-aware distillation. The PP-Res is an extension of the conventional residual block where each convolu… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: accepted at WACV 2022

  8. arXiv:2107.08323  [pdf, other

    cs.CV

    Agent-Environment Network for Temporal Action Proposal Generation

    Authors: Viet-Khoa Vo-Ho, Ngan Le, Kashu Yamazaki, Akihiro Sugimoto, Minh-Triet Tran

    Abstract: Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos. Most of existing approaches are unable to follow the human cognitive process of understanding the video context due to lack of attention mechanism to express the concept of an action or an agent who performs the action or the interaction… ▽ More

    Submitted 16 March, 2022; v1 submitted 17 July, 2021; originally announced July 2021.

    Comments: Accepted in ICASSP 2021

  9. Anabranch Network for Camouflaged Object Segmentation

    Authors: Trung-Nghia Le, Tam V. Nguyen, Zhongliang Nie, Minh-Triet Tran, Akihiro Sugimoto

    Abstract: Camouflaged objects attempt to conceal their texture into the background and discriminating them from the background is hard even for human beings. The main objective of this paper is to explore the camouflaged object segmentation problem, namely, segmenting the camouflaged object(s) for a given image. This problem has not been well studied in spite of a wide range of potential applications includ… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Published in CVIU 2019. Project page: https://sites.google.com/view/ltnghia/research/camo

    Journal ref: Computer Vision and Image Understanding 184 (2019) 45-56

  10. arXiv:2004.14052  [pdf, other

    cs.CV

    Minimal Rolling Shutter Absolute Pose with Unknown Focal Length and Radial Distortion

    Authors: Zuzana Kukelova, Cenek Albl, Akihiro Sugimoto, Konrad Schindler, Tomas Pajdla

    Abstract: The internal geometry of most modern consumer cameras is not adequately described by the perspective projection. Almost all cameras exhibit some radial lens distortion and are equipped with an electronic rolling shutter that induces distortions when the camera moves during the image capture. When focal length has not been calibrated offline, the parameters that describe the radial and rolling shut… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  11. arXiv:2004.10534  [pdf, other

    cs.CV eess.IV

    TetraTSDF: 3D human reconstruction from a single image with a tetrahedral outer shell

    Authors: Hayato Onizuka, Zehra Hayirci, Diego Thomas, Akihiro Sugimoto, Hideaki Uchiyama, Rin-ichiro Taniguchi

    Abstract: Recovering the 3D shape of a person from its 2D appearance is ill-posed due to ambiguities. Nevertheless, with the help of convolutional neural networks (CNN) and prior knowledge on the 3D human body, it is possible to overcome such ambiguities to recover detailed 3D shapes of human bodies from single images. Current solutions, however, fail to reconstruct all the details of a person wearing loose… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

  12. arXiv:2002.11313  [pdf, ps, other

    cs.DS cs.CC

    Computational Aspects of Geometric Algebra Products of Two Homogeneous Multivectors

    Authors: Stephane Breuils, Vincent Nozick, Akihiro Sugimoto

    Abstract: Studies on time and memory costs of products in geometric algebra have been limited to cases where multivectors with multiple grades have only non-zero elements. This allows to design efficient algorithms for a generic purpose; however, it does not reflect the practical usage of geometric algebra. Indeed, in applications related to geometry, multivectors are likely to be full homogeneous, having t… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

  13. Two-Stream FCNs to Balance Content and Style for Style Transfer

    Authors: Duc Minh Vo, Akihiro Sugimoto

    Abstract: Style transfer is to render given image contents in given styles, and it has an important role in both computer vision fundamental research and industrial applications. Following the success of deep learning based approaches, this problem has been re-launched recently, but still remains a difficult task because of trade-off between preserving contents and faithful rendering of styles. Indeed, how… ▽ More

    Submitted 7 May, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: published in Machine Vision and Applications

  14. arXiv:1908.01741  [pdf, other

    cs.CV

    Visual-Relation Conscious Image Generation from Structured-Text

    Authors: Duc Minh Vo, Akihiro Sugimoto

    Abstract: We propose an end-to-end network for image generation from given structured-text that consists of the visual-relation layout module and the pyramid of GANs, namely stacking-GANs. Our visual-relation layout module uses relations among entities in the structured-text in two ways: comprehensive usage and individual usage. We comprehensively use all available relations together to localize initial bou… ▽ More

    Submitted 18 July, 2020; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: accepted at ECCV 2020

  15. arXiv:1903.02444  [pdf, ps, other

    cs.CG

    Efficient representation and manipulation of quadratic surfaces using Geometric Algebras

    Authors: Stéphane Breuils, Vincent Nozick, Laurent Fuchs, Akihiro Sugimoto

    Abstract: Quadratic surfaces gain more and more attention among the Geometric Algebra community and some frameworks were proposed in order to represent, transform, and intersect these quadratic surfaces. As far as the authors know, none of these frameworks support all the operations required to completely handle these surfaces. Some frameworks do not allow the construction of quadratic surfaces from control… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

  16. arXiv:1812.11532  [pdf, other

    cs.CV

    Linear solution to the minimal absolute pose rolling shutter problem

    Authors: Zuzana Kukelova, Cenek Albl, Akihiro Sugimoto, Tomas Pajdla

    Abstract: This paper presents new efficient solutions to the rolling shutter camera absolute pose problem. Unlike the state-of-the-art polynomial solvers, we approach the problem using simple and fast linear solvers in an iterative scheme. We present several solutions based on fixing different sets of variables and investigate the performance of them thoroughly. We design a new alternation strategy that est… ▽ More

    Submitted 30 December, 2018; originally announced December 2018.

    Comments: 14th Asian Conference on Computer Vision (ACCV 2018)

  17. arXiv:1807.01452  [pdf, other

    cs.CV

    Semantic Instance Meets Salient Object: Study on Video Semantic Salient Instance Segmentation

    Authors: Trung-Nghia Le, Akihiro Sugimoto

    Abstract: Focusing on only semantic instances that only salient in a scene gains more benefits for robot navigation and self-driving cars than looking at all objects in the whole scene. This paper pushes the envelope on salient regions in a video to decompose them into semantically meaningful components, namely, semantic salient instances. We provide the baseline for the new task of video semantic salient i… ▽ More

    Submitted 22 November, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

    Comments: accepted in WACV 2019

  18. arXiv:1708.01589  [pdf, other

    cs.CV

    Region-Based Multiscale Spatiotemporal Saliency for Video

    Authors: Trung-Nghia Le, Akihiro Sugimoto

    Abstract: Detecting salient objects from a video requires exploiting both spatial and temporal knowledge included in the video. We propose a novel region-based multiscale spatiotemporal saliency detection method for videos, where static features and dynamic features computed from the low and middle levels are combined together. Our method utilizes such combined features spatially over each frame and, at the… ▽ More

    Submitted 4 August, 2017; originally announced August 2017.

  19. Video Salient Object Detection Using Spatiotemporal Deep Features

    Authors: Trung-Nghia Le, Akihiro Sugimoto

    Abstract: This paper presents a method for detecting salient objects in videos where temporal information in addition to spatial information is fully taken into account. Following recent reports on the advantage of deep features over conventional hand-crafted features, we propose a new set of SpatioTemporal Deep (STD) features that utilize local and global contexts over frames. We also propose new SpatioTem… ▽ More

    Submitted 17 June, 2018; v1 submitted 4 August, 2017; originally announced August 2017.

    Comments: accepted at TIP