Skip to main content

Showing 1–9 of 9 results for author: Dang, R

.
  1. arXiv:2404.10241  [pdf, other

    cs.CV cs.AI

    Vision-and-Language Navigation via Causal Learning

    Authors: Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen

    Abstract: In the pursuit of robust and generalizable environment perception and language understanding, the ubiquitous challenge of dataset bias continues to plague vision-and-language navigation (VLN) agents, hindering their performance in unseen environments. This paper introduces the generalized cross-modal causal transformer (GOAT), a pioneering solution rooted in the paradigm of causal inference. By de… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2403.03405  [pdf, other

    cs.CV

    Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Ronghao Dang, Huiyi Chen, Chengju Liu, Qijun Chen

    Abstract: Vision-and-Language Navigation (VLN) has gained significant research interest in recent years due to its potential applications in real-world scenarios. However, existing VLN methods struggle with the issue of spurious associations, resulting in poor generalization with a significant performance gap between seen and unseen environments. In this paper, we tackle this challenge by proposing a unifie… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 16 pages

  3. CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge

    Authors: Xiao Lin, Minghao Zhu, Ronghao Dang, Guangliang Zhou, Shaolong Shu, Feng Lin, Chengju Liu, Qijun Chen

    Abstract: Most of existing category-level object pose estimation methods devote to learning the object category information from point cloud modality. However, the scale of 3D datasets is limited due to the high cost of 3D data collection and annotation. Consequently, the category features extracted from these limited point cloud samples may not be comprehensive. This motivates us to investigate whether we… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 14 pages, 4 figures, 9 tables

  4. arXiv:2310.05136  [pdf, other

    cs.AI cs.CV

    InstructDET: Diversifying Referring Object Detection with Generalized Instructions

    Authors: Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song

    Abstract: We propose InstructDET, a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. While deriving from referring expressions (REC), the instructions we leverage are greatly diversified to encompass common user intentions related to object detection. For one image, we produce tremendous instructions that refer to every single object and diff… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 29 pages (include Appendix) Published in ICLR

  5. arXiv:2309.00297  [pdf, other

    cs.CV

    Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning

    Authors: Minghao Zhu, Xiao Lin, Ronghao Dang, Chengju Liu, Qijun Chen

    Abstract: As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive learning, considering the trade-off between quality and cost. However, existing works align motion features at the instance level, which suffers from… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: ACM MM 2023 Camera Ready

  6. A Dual Semantic-Aware Recurrent Global-Adaptive Network For Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Jiagui Tang, Ronghao Dang, Naijia Wang, Chengju Liu, Qijun Chen

    Abstract: Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previo… ▽ More

    Submitted 29 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCAI 2023

    Journal ref: International Joint Conferences on Artificial Intelligence Organization 2023

  7. arXiv:2302.01520  [pdf, other

    cs.RO cs.AI

    Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

    Authors: Ronghao Dang, Lu Chen, Liuyi Wang, Zongtao He, Chengju Liu, Qijun Chen

    Abstract: We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. Based on the MAD paradigm, we design a multiple thinking (MT) model that leverages distinct thinking to abstract various meta-abilities. Our method decouples meta-abilities from three aspects: input… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 17 pages

  8. arXiv:2208.00553  [pdf, other

    cs.AI cs.RO

    Search for or Navigate to? Dual Adaptive Thinking for Object Navigation

    Authors: Ronghao Dang, Liuyi Wang, Zongtao He, Shuai Su, Chengju Liu, Qijun Chen

    Abstract: "Search for" or "Navigate to"? When finding an object, the two choices always come up in our subconscious mind. Before seeing the target, we search for the target based on experience. After seeing the target, we remember the target location and navigate to. However, recently methods in object navigation field almost only consider using object association to enhance "search for" phase while neglect… ▽ More

    Submitted 13 August, 2022; v1 submitted 31 July, 2022; originally announced August 2022.

    Comments: 12 pages, ready for AAAI2023

  9. arXiv:2204.04421  [pdf, other

    cs.CV cs.AI

    Unbiased Directed Object Attention Graph for Object Navigation

    Authors: Ronghao Dang, Zhuofan Shi, Liuyi Wang, Zongtao He, Chengju Liu, Qijun Chen

    Abstract: Object navigation tasks require agents to locate specific objects in unknown environments based on visual information. Previously, graph convolutions were used to implicitly explore the relationships between objects. However, due to differences in visibility among objects, it is easy to generate biases in object attention. Thus, in this paper, we propose a directed object attention (DOA) graph to… ▽ More

    Submitted 7 July, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

    Comments: 13 pages, accepted by ACM Mutimedia 2022