Skip to main content

Showing 1–8 of 8 results for author: Cheang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14540  [pdf, other

    cs.RO cs.AI cs.CV

    IRASim: Learning Interactive Real-Robot Action Simulators

    Authors: Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong

    Abstract: Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Opensource, project website: https://gen-irasim.github.io

  2. arXiv:2312.13139  [pdf, other

    cs.RO cs.CV

    Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

    Authors: Hongtao Wu, Ya **g, Chilam Cheang, Guangzeng Chen, Jiafeng Xu, Xinghang Li, Minghuan Liu, Hang Li, Tao Kong

    Abstract: Generative pre-trained models have demonstrated remarkable effectiveness in language and vision domains by learning useful representations. In this paper, we extend the scope of this effectiveness by showing that visual robot manipulation can significantly benefit from large-scale video generative pre-training. We introduce GR-1, a straightforward GPT-style model designed for multi-task language-c… ▽ More

    Submitted 21 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Project page: https://GR1-Manipulation.github.io

  3. arXiv:2311.01378  [pdf, other

    cs.RO cs.AI cs.LG

    Vision-Language Foundation Models as Effective Robot Imitators

    Authors: Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya **g, Weinan Zhang, Hua** Liu, Hang Li, Tao Kong

    Abstract: Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics manipulation. We seek a straightforward way of making use of existing vision-language models (VLMs) with simple fine-tuning on robotics data. To this end, we derive a simple and novel vision-language manipulation framework, dub… ▽ More

    Submitted 4 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Fix typos. Project page: https://roboflamingo.github.io

  4. arXiv:2305.01951  [pdf, other

    cs.CL

    Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization

    Authors: Chi Seng Cheang, Hou Pong Chan, Derek F. Wong, Xuebo Liu, Zhaocong Li, Yanming Sun, Shudong Liu, Lidia S. Chao

    Abstract: Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets. However, existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets. Hence, the strong performance of PLMs may rely on the parametric knowledge that is memorized during pre-training and fine-tuning. Moreover, the knowledge memoriz… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023

  5. arXiv:2205.04028  [pdf, other

    cs.RO cs.CV

    Learning 6-DoF Object Poses to Grasp Category-level Objects by Language Instructions

    Authors: Chilam Cheang, Haitao Lin, Yanwei Fu, Xiangyang Xue

    Abstract: This paper studies the task of any objects gras** from the known categories by free-form language instructions. This task demands the technique in computer vision, natural language processing, and robotics. We bring these disciplines together on this open challenge, which is essential to human-robot interaction. Critically, the key challenge lies in inferring the category of objects from linguis… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: accepted by ICRA2022

  6. arXiv:2205.04026  [pdf, other

    cs.RO cs.CV

    I Know What You Draw: Learning Grasp Detection Conditioned on a Few Freehand Sketches

    Authors: Haitao Lin, Chilam Cheang, Yanwei Fu, Xiangyang Xue

    Abstract: In this paper, we are interested in the problem of generating target grasps by understanding freehand sketches. The sketch is useful for the persons who cannot formulate language and the cases where a textual description is not available on the fly. However, very few works are aware of the usability of this novel interactive way between humans and robots. To this end, we propose a method to genera… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: accepted by ICRA2022

  7. Complex Network Analysis of the Bitcoin Transaction Network

    Authors: Bishenghui Tao, Hong-Ning Dai, Jia**g Wu, Ivan Wang-Hei Ho, Zibin Zheng, Chak Fong Cheang

    Abstract: In this brief, we conduct a complex-network analysis of the Bitcoin transaction network. In particular, we design a new sampling method, namely random walk with flying-back (RWFB), to conduct effective data sampling. We then conduct a comprehensive analysis of the Bitcoin network in terms of the degree distribution, clustering coefficient, the shortest-path length, connected component, centrality,… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: 6 pages, 4 figures

    MSC Class: 05C40; 05C81; 05C82; 05C90 ACM Class: H.3.3; E.1

    Journal ref: IEEE Transactions on Circuits and Systems II: Express Briefs, 2022

  8. arXiv:2106.14193  [pdf, other

    cs.CV cs.RO

    SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation

    Authors: Haitao Lin, Zichang Liu, Chilam Cheang, Yanwei Fu, Guodong Guo, Xiangyang Xue

    Abstract: Given a single scene image, this paper proposes a method of Category-level 6D Object Pose and Size Estimation (COPSE) from the point cloud of the target object, without external real pose-annotated training data. Specifically, beyond the visual cues in RGB images, we rely on the shape information predominately from the depth (D) channel. The key idea is to explore the shape alignment of each insta… ▽ More

    Submitted 11 April, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

    Comments: accepted by CVPR2022