Skip to main content

Showing 1–4 of 4 results for author: Azuma, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14240  [pdf, other

    cs.CV cs.AI

    CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

    Authors: Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue

    Abstract: Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world environments by integrating visual and linguistic cues. While substantial progress has been made in understanding these interactive modalities in ground-level navigation, aerial navigation remains largely underexplored. This is primarily due to the scarcity of resources suitable for real-world, city-scale aeria… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The first two authors are equally contributed

  2. arXiv:2405.16559  [pdf, other

    cs.RO cs.CV

    Map-based Modular Approach for Zero-shot Embodied Question Answering

    Authors: Koya Sakamoto, Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe

    Abstract: Building robots capable of interacting with humans through natural language in the visual world presents a significant challenge in the field of robotics. To overcome this challenge, Embodied Question Answering (EQA) has been proposed as a benchmark task to measure the ability to identify an object navigating through a previously unseen environment in response to human-posed questions. Although so… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  3. arXiv:2305.13876  [pdf, other

    cs.CV

    Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans

    Authors: Taiki Miyanishi, Daichi Azuma, Shuhei Kurita, Motoki Kawanabe

    Abstract: We present a novel task for cross-dataset visual grounding in 3D scenes (Cross3DVG), which overcomes limitations of existing 3D visual grounding models, specifically their restricted 3D resources and consequent tendencies of overfitting a specific 3D dataset. We created RIORefer, a large-scale 3D visual grounding dataset, to facilitate Cross3DVG. It includes more than 63k diverse descriptions of 3… ▽ More

    Submitted 7 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 3DV 2024

  4. arXiv:2112.10482  [pdf, other

    cs.CV

    ScanQA: 3D Question Answering for Spatial Scene Understanding

    Authors: Daichi Azuma, Taiki Miyanishi, Shuhei Kurita, Motoaki Kawanabe

    Abstract: We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D indoor scan and answer the given textual questions about the 3D scene. Unlike the 2D-question answering of VQA, the conventional 2D-QA models suffer from problems with spatial understanding of object alignment and direction… ▽ More

    Submitted 7 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: CVPR2022. The first three authors are equally contributed. Project page: https://github.com/ATR-DBI/ScanQA