Skip to main content

Showing 1–50 of 65 results for author: Cai, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16026  [pdf

    physics.med-ph cs.LG eess.IV

    CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

    Authors: Jiawen Wang, Pei Cai, Ziyan Wang, Huabin Zhang, Jianpan Huang

    Abstract: Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same ne… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.11633  [pdf, other

    cs.CV

    DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

    Authors: Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao

    Abstract: Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, large models still perform poorly on multi-page scientific document extract… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Homepage of DocGenome: https://unimodal4reasoning.github.io/DocGenome_page 22 pages, 11 figures

  3. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang **, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  5. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang **, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  6. arXiv:2404.01359  [pdf

    quant-ph cs.AI cs.NE

    Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification

    Authors: Zuyu Xu, Kang Shen, Pengnian Cai, Tao Yang, Yuanming Hu, Shixian Chen, Yunlai Zhu, Zuheng Wu, Yuehua Dai, Jun Wang, Fei Yang

    Abstract: The recent emergence of the hybrid quantum-classical neural network (HQCNN) architecture has garnered considerable attention due to the potential advantages associated with integrating quantum principles to enhance various facets of machine learning algorithms and computations. However, the current investigated serial structure of HQCNN, wherein information sequentially passes from one network to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  7. arXiv:2403.10101  [pdf, other

    cs.RO

    Agile and Safe Trajectory Planning for Quadruped Navigation with Motion Anisotropy Awareness

    Authors: Wentao Zhang, Shaohang Xu, Peiyuan Cai, Lijun Zhu

    Abstract: Quadruped robots demonstrate robust and agile movements in various terrains; however, their navigation autonomy is still insufficient. One of the challenges is that the motion capabilities of the quadruped robot are anisotropic along different directions, which significantly affects the safety of quadruped robot navigation. This paper proposes a navigation framework that takes into account the mot… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  8. arXiv:2402.03830  [pdf, other

    cs.CV

    OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

    Authors: Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi

    Abstract: With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is ex… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 10 pages, 9 figures

  9. arXiv:2402.03047  [pdf, other

    cs.CV cs.LG

    PFDM: Parser-Free Virtual Try-on via Diffusion Model

    Authors: Yunfang Niu, Dong Yi, Lingxiang Wu, Zhiwei Liu, Pengxiang Cai, **qiao Wang

    Abstract: Virtual try-on can significantly improve the garment shop** experiences in both online and in-store scenarios, attracting broad interest in computer vision. However, to achieve high-fidelity try-on performance, most state-of-the-art methods still rely on accurate segmentation masks, which are often produced by near-perfect parsers or manual labeling. To overcome the bottleneck, we propose a pars… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE ICASSP 2024

  10. arXiv:2402.01246  [pdf, other

    cs.RO eess.SY

    LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving

    Authors: Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao

    Abstract: The emergence of Multimodal Large Language Models ((M)LLMs) has ushered in new avenues in artificial intelligence, particularly for autonomous driving by offering enhanced understanding and reasoning capabilities. This paper introduces LimSim++, an extended version of LimSim designed for the application of (M)LLMs in autonomous driving. Acknowledging the limitations of existing simulation platform… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by 35th IEEE Intelligent Vehicles Symposium (IV 2024)

  11. arXiv:2401.00722  [pdf, other

    cs.CV

    BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation

    Authors: Libin Lan, Pengzhou Cai, Lu Jiang, Xiaojuan Liu, Yongmei Li, Yudong Zhang

    Abstract: Accurate medical image segmentation is essential for clinical quantification, disease diagnosis, treatment planning and many other applications. Both convolution-based and transformer-based u-shaped architectures have made significant success in various medical image segmentation tasks. The former can efficiently learn local information of images while requiring much more image-specific inductive… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, 9 tables code: https://github.com/Caipengzhou/BRAU-Netplusplus

  12. arXiv:2312.13156  [pdf, other

    cs.CE cs.AI

    AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model

    Authors: Lening Wang, Yilong Ren, Han Jiang, Pinlong Cai, Daocheng Fu, Tianqi Wang, Zhiyong Cui, Haiyang Yu, Xuesong Wang, Hanchu Zhou, Helai Huang, Yinhai Wang

    Abstract: Traffic accidents, being a significant contributor to both human casualties and property damage, have long been a focal point of research for many scholars in the field of traffic safety. However, previous studies, whether focusing on static environmental assessments or dynamic driving analyses, as well as pre-accident predictions or post-accident rule analyses, have typically been conducted in is… ▽ More

    Submitted 28 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 21 pages, 19 figures

  13. arXiv:2312.08177  [pdf

    cs.CV

    Advanced Image Segmentation Techniques for Neural Activity Detection via C-fos Immediate Early Gene Expression

    Authors: Peilin Cai

    Abstract: This paper investigates the application of advanced image segmentation techniques to analyze C-fos immediate early gene expression, a crucial marker for neural activity. Due to the complexity and high variability of neural circuits, accurate segmentation of C-fos images is paramount for the development of new insights into neural function. Amidst this backdrop, this research aims to improve accura… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  14. arXiv:2312.04316  [pdf, other

    cs.RO cs.AI cs.CV

    Towards Knowledge-driven Autonomous Driving

    Authors: Xin Li, Yeqi Bai, Pinlong Cai, Licheng Wen, Daocheng Fu, Bo Zhang, Xuemeng Yang, Xinyu Cai, Tao Ma, Jianfei Guo, Xing Gao, Min Dou, Yikang Li, Botian Shi, Yong Liu, Liang He, Yu Qiao

    Abstract: This paper explores the emerging knowledge-driven autonomous driving technologies. Our investigation highlights the limitations of current autonomous driving systems, in particular their sensitivity to data bias, difficulty in handling long-tail scenarios, and lack of interpretability. Conversely, knowledge-driven methods with the abilities of cognition, generalization and life-long learning emerg… ▽ More

    Submitted 27 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

  15. arXiv:2312.03408  [pdf, other

    cs.CV

    Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

    Authors: Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, **gdong Wang, Futang Zhu, Chun**g Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

    Abstract: With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. Current autonomous driving datasets can broadly be categorized into two generations. The first-generation autonomous driving datasets are characterized by relatively sim… ▽ More

    Submitted 22 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: This article is a simplified English translation of corresponding Chinese article. Please refer to Chinese version for the complete content

  16. arXiv:2312.02519  [pdf, other

    cs.AI cs.LG

    Creative Agents: Empowering Agents with Imagination for Creative Tasks

    Authors: Chi Zhang, Penglin Cai, Yuhui Fu, Haoqi Yuan, Zongqing Lu

    Abstract: We study building embodied agents for open-ended creative tasks. While existing methods build instruction-following agents that can perform diverse open-ended tasks, none of them demonstrates creativity -- the ability to give novel and diverse task solutions implicit in the language instructions. This limitation comes from their inability to convert abstract language instructions into concrete tas… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: The first two authors contribute equally

  17. arXiv:2311.05332  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

    Authors: Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

    Abstract: The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of… ▽ More

    Submitted 28 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

  18. arXiv:2310.00289  [pdf, other

    eess.IV cs.CV

    Pubic Symphysis-Fetal Head Segmentation Using Pure Transformer with Bi-level Routing Attention

    Authors: Pengzhou Cai, Jiang Lu, Yanxin Li, Libin Lan

    Abstract: In this paper, we propose a method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task. The method adopts a U-Net-like pure Transformer architecture with bi-level routing attention and skip connections, which effectively learns local-global semantic information. The proposed BRAU-Net was evaluated on transperineal Ultrasound images dataset from the pubic symphysis-fetal head… ▽ More

    Submitted 7 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  19. arXiv:2309.16292  [pdf, other

    cs.RO cs.CL

    DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

    Authors: Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao

    Abstract: Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an int… ▽ More

    Submitted 21 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Published as a conference paper at ICLR 2024

  20. arXiv:2309.06719  [pdf, other

    cs.AI cs.HC

    TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models

    Authors: Siyao Zhang, Daocheng Fu, Zhao Zhang, Bin Yu, Pinlong Cai

    Abstract: With the promotion of chatgpt to the public, Large language models indeed showcase remarkable common sense, reasoning, and planning skills, frequently providing insightful guidance. These capabilities hold significant promise for their application in urban traffic management and control. However, LLMs struggle with addressing traffic issues, especially processing numerical data and interacting wit… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  21. arXiv:2308.16008  [pdf, other

    cs.RO cs.AI cs.LG

    EnsembleFollower: A Hybrid Car-Following Framework Based On Reinforcement Learning and Hierarchical Planning

    Authors: Xu Han, Xianda Chen, Meixin Zhu, Pinlong Cai, Jianshan Zhou, Xiaowen Chu

    Abstract: Car-following models have made significant contributions to our understanding of longitudinal driving behavior. However, they often exhibit limited accuracy and flexibility, as they cannot fully capture the complexity inherent in car-following processes, or may falter in unseen scenarios due to their reliance on confined driving skills present in training data. It is worth noting that each car-fol… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: 12 pages, 10 figures

  22. arXiv:2308.12797  [pdf, other

    cs.RO cs.MA eess.SY

    TrafficMCTS: A Closed-Loop Traffic Flow Generation Framework with Group-Based Monte Carlo Tree Search

    Authors: Licheng Wen, Ze Fu, Pinlong Cai, Daocheng Fu, Song Mao, Botian Shi

    Abstract: Digital twins for intelligent transportation systems are currently attracting great interests, in which generating realistic, diverse, and human-like traffic flow in simulations is a formidable challenge. Current approaches often hinge on predefined driver models, objective optimization, or reliance on pre-recorded driving datasets, imposing limitations on their scalability, versatility, and adapt… ▽ More

    Submitted 31 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  23. arXiv:2308.03253  [pdf, other

    cs.CL cs.AI

    PaniniQA: Enhancing Patient Education Through Interactive Question Answering

    Authors: Pengshan Cai, Zonghai Yao, Fei Liu, Dakuo Wang, Meghan Reilly, Huixue Zhou, Lingxi Li, Yi Cao, Alok Kapoor, Adarsha Bajracharya, Dan Berlowitz, Hong Yu

    Abstract: Patient portal allows discharged patients to access their personalized discharge instructions in electronic health records (EHRs). However, many patients have difficulty understanding or memorizing their discharge instructions. In this paper, we present PaniniQA, a patient-centric interactive question answering system designed to help patients understand their discharge instructions. PaniniQA firs… ▽ More

    Submitted 20 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Accepted to TACL 2023. Equal contribution for the first two authors. This arXiv version is a pre-MIT Press publication version

  24. arXiv:2307.07162  [pdf, other

    cs.RO cs.CL

    Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

    Authors: Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao

    Abstract: In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios. We argue that traditional optimization-based and modular autonomous driving (AD) systems face inherent performance limitations when dealing with long-tail corner cases. To… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  25. arXiv:2307.06648  [pdf, other

    eess.SY cs.RO

    LimSim: A Long-term Interactive Multi-scenario Traffic Simulator

    Authors: Licheng Wen, Daocheng Fu, Song Mao, Pinlong Cai, Min Dou, Yikang Li, Yu Qiao

    Abstract: With the growing popularity of digital twin and autonomous driving in transportation, the demand for simulation systems capable of generating high-fidelity and reliable scenarios is increasing. Existing simulation systems suffer from a lack of support for different types of scenarios, and the vehicle models used in these systems are too simplistic. Thus, such systems fail to represent driving styl… ▽ More

    Submitted 26 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted by 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)

  26. arXiv:2306.17456  [pdf, other

    cs.RO cs.HC

    Human-like Decision-making at Unsignalized Intersection using Social Value Orientation

    Authors: Yan Tong, Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li

    Abstract: With the commercial application of automated vehicles (AVs), the sharing of roads between AVs and human-driven vehicles (HVs) becomes a common occurrence in the future. While research has focused on improving the safety and reliability of autonomous driving, it's also crucial to consider collaboration between AVs and HVs. Human-like interaction is a required capability for AVs, especially at commo… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  27. arXiv:2306.15136  [pdf, other

    cs.RO cs.AI

    What Truly Matters in Trajectory Prediction for Autonomous Driving?

    Authors: Phong Tran, Haoran Wu, Cunjun Yu, Panpan Cai, Sifa Zheng, David Hsu

    Abstract: Trajectory prediction plays a vital role in the performance of autonomous driving systems, and prediction accuracy, such as average displacement error (ADE) or final displacement error (FDE), is widely used as a performance metric. However, a significant disparity exists between the accuracy of predictors on fixed datasets and driving performance when the predictors are used downstream for vehicle… ▽ More

    Submitted 6 November, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  28. arXiv:2305.10640  [pdf, other

    cs.CV

    Learning Restoration is Not Enough: Transfering Identical Map** for Single-Image Shadow Removal

    Authors: Xiaoguang Li, Qing Guo, **** Cai, Wei Feng, Ivor Tsang, Song Wang

    Abstract: Shadow removal is to restore shadow regions to their shadow-free counterparts while leaving non-shadow regions unchanged. State-of-the-art shadow removal methods train deep neural networks on collected shadow & shadow-free image pairs, which are desired to complete two distinct tasks via shared weights, i.e., data restoration for shadow regions and identical map** for non-shadow regions. We find… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  29. arXiv:2305.03308  [pdf

    eess.SP cs.LG

    Tiny-PPG: A Lightweight Deep Neural Network for Real-Time Detection of Motion Artifacts in Photoplethysmogram Signals on Edge Devices

    Authors: Yali Zheng, Chen Wu, Peizheng Cai, Zhiqiang Zhong, Hongda Huang, Yuqi Jiang

    Abstract: Photoplethysmogram (PPG) signals are easily contaminated by motion artifacts in real-world settings, despite their widespread use in Internet-of-Things (IoT) based wearable and smart health devices for cardiovascular health monitoring. This study proposed a lightweight deep neural network, called Tiny-PPG, for accurate and real-time PPG artifact segmentation on IoT edge devices. The model was trai… ▽ More

    Submitted 10 October, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  30. arXiv:2303.16563  [pdf, other

    cs.LG cs.AI

    Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

    Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu

    Abstract: We study building multi-task agents in open-world environments. Without human demonstrations, learning to accomplish long-horizon tasks in a large open-world environment with reinforcement learning (RL) is extremely inefficient. To tackle this challenge, we convert the multi-task learning problem into learning basic skills and planning over the skills. Using the popular open-world game Minecraft a… ▽ More

    Submitted 4 December, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: 24 pages, presented in Foundation Models for Decision Making Workshop at NeurIPS 2023

  31. Parametric Surface Constrained Upsampler Network for Point Cloud

    Authors: **** Cai, Zhenyao Wu, Xinyi Wu, Song Wang

    Abstract: Designing a point cloud upsampler, which aims to generate a clean and dense point cloud given a sparse point representation, is a fundamental and challenging problem in computer vision. A line of attempts achieves this goal by establishing a point-to-point map** function via deep neural networks. However, these approaches are prone to produce outlier points due to the lack of explicit surface-le… ▽ More

    Submitted 3 December, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Update Supplementary Files

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 37, 1 (Jun. 2023), 250-258

  32. arXiv:2303.06768  [pdf, other

    cs.AI cs.RO

    The Planner Optimization Problem: Formulations and Frameworks

    Authors: Yiyuan Lee, Katie Lee, Panpan Cai, David Hsu, Lydia E. Kavraki

    Abstract: Identifying internal parameters for planning is crucial to maximizing the performance of a planner. However, automatically tuning internal parameters which are conditioned on the problem instance is especially challenging. A recent line of work focuses on learning planning parameter generators, but lack a consistent problem definition and software framework. This work proposes the unified planner… ▽ More

    Submitted 14 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: 4 pages (+2 pages references, +6 pages appendix)

  33. arXiv:2302.06803  [pdf, other

    cs.RO cs.MA

    Bringing Diversity to Autonomous Vehicles: An Interpretable Multi-vehicle Decision-making and Planning Framework

    Authors: Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li

    Abstract: With the development of autonomous driving, it is becoming increasingly common for autonomous vehicles (AVs) and human-driven vehicles (HVs) to travel on the same roads. Existing single-vehicle planning algorithms on board struggle to handle sophisticated social interactions in the real world. Decisions made by these methods are difficult to understand for humans, raising the risk of crashes and m… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    ACM Class: I.2.9

  34. arXiv:2209.11422  [pdf, other

    cs.LG cs.RO

    LEADER: Learning Attention over Driving Behaviors for Planning under Uncertainty

    Authors: Mohamad H. Danesh, Panpan Cai, David Hsu

    Abstract: Uncertainty on human behaviors poses a significant challenge to autonomous driving in crowded urban environments. The partially observable Markov decision processes (POMDPs) offer a principled framework for planning under uncertainty, often leveraging Monte Carlo sampling to achieve online performance for complex tasks. However, sampling also raises safety concerns by potentially missing critical… ▽ More

    Submitted 29 October, 2022; v1 submitted 23 September, 2022; originally announced September 2022.

    Comments: CoRL 2022 (oral)

  35. arXiv:2207.06300  [pdf, other

    cs.CL cs.AI cs.IR

    Re2G: Retrieve, Rerank, Generate

    Authors: Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, Alfio Gliozzo

    Abstract: As demonstrated by GPT-3 and T5, transformers grow in capability as parameter spaces become larger and larger. However, for tasks that require a large amount of knowledge, non-parametric memory allows models to grow dramatically with a sub-linear increase in computational cost and GPU memory requirements. Recent models such as RAG and REALM have introduced retrieval into conditional generation. Th… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: Accepted at NAACL 2022

  36. arXiv:2205.14748  [pdf, other

    cs.CL

    Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition

    Authors: Pengshan Cai, Hui Wan, Fei Liu, Mo Yu, Hong Yu, Sachindra Joshi

    Abstract: We propose novel AI-empowered chat bots for learning as conversation where a user does not read a passage but gains information and knowledge through conversation with a teacher bot. Our information-acquisition-oriented dialogue system employs a novel adaptation of reinforced self-play so that the system can be transferred to various domains without in-domain dialogue data, and can carry out conve… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

    Comments: 10 pages, accepted by NAACL 2022

  37. arXiv:2109.08473  [pdf, other

    cs.RO cs.AI eess.SY

    Carl-Lead: Lidar-based End-to-End Autonomous Driving with Contrastive Deep Reinforcement Learning

    Authors: Peide Cai, Sukai Wang, Hengli Wang, Ming Liu

    Abstract: Autonomous driving in urban crowds at unregulated intersections is challenging, where dynamic occlusions and uncertain behaviors of other vehicles should be carefully considered. Traditional methods are heuristic and based on hand-engineered rules and parameters, but scale poorly in new situations. Therefore, they require high labor cost to design and maintain rules in all foreseeable scenarios. R… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: 8 pages, 6 figures, submitted to RA-L with ICRA presentation option

  38. arXiv:2109.07717  [pdf, other

    cs.RO

    R-PCC: A Baseline for Range Image-based Point Cloud Compression

    Authors: Sukai Wang, Jianhao Jiao, Peide Cai, Ming Liu

    Abstract: In autonomous vehicles or robots, point clouds from LiDAR can provide accurate depth information of objects compared with 2D images, but they also suffer a large volume of data, which is inconvenient for data storage or transmission. In this paper, we propose a Range image-based Point Cloud Compression method, R-PCC, which can reconstruct the point cloud with uniform or non-uniform accuracy loss.… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: Submitted to ICRA2022

  39. arXiv:2108.05030  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    DQ-GAT: Towards Safe and Efficient Autonomous Driving with Deep Q-Learning and Graph Attention Networks

    Authors: Peide Cai, Hengli Wang, Yuxiang Sun, Ming Liu

    Abstract: Autonomous driving in multi-agent dynamic traffic scenarios is challenging: the behaviors of road users are uncertain and are hard to model explicitly, and the ego-vehicle should apply complicated negotiation skills with them, such as yielding, merging and taking turns, to achieve both safe and efficient driving in various settings. Traditional planning methods are largely rule-based and scale poo… ▽ More

    Submitted 18 June, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2022

  40. arXiv:2107.14599  [pdf, other

    cs.CV cs.RO

    SNE-RoadSeg+: Rethinking Depth-Normal Translation and Deep Supervision for Freespace Detection

    Authors: Hengli Wang, Rui Fan, Peide Cai, Ming Liu

    Abstract: Freespace detection is a fundamental component of autonomous driving perception. Recently, deep convolutional neural networks (DCNNs) have achieved impressive performance for this task. In particular, SNE-RoadSeg, our previously proposed method based on a surface normal estimator (SNE) and a data-fusion DCNN (RoadSeg), has achieved impressive performance in freespace detection. However, SNE-RoadSe… ▽ More

    Submitted 19 September, 2021; v1 submitted 30 July, 2021; originally announced July 2021.

    Comments: Fix a mistake in Equation 3. 7 pages, 5 figures and 2 tables. This paper is accepted by IROS 2021

  41. arXiv:2107.08325  [pdf, other

    cs.RO cs.AI eess.SY

    Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning

    Authors: Peide Cai, Hengli Wang, Huaiyang Huang, Yuxuan Liu, Ming Liu

    Abstract: Autonomous car racing is a challenging task in the robotic control area. Traditional modular methods require accurate map**, localization and planning, which makes them computationally inefficient and sensitive to environmental changes. Recently, deep-learning-based end-to-end systems have shown promising results for autonomous driving/racing. However, they are commonly implemented by supervised… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: 8 pages, 8 figures. IEEE Robotics and Automation Letters (RA-L) & IROS 2021

  42. arXiv:2104.12861  [pdf, other

    cs.CV cs.RO

    Learning Interpretable End-to-End Vision-Based Motion Planning for Autonomous Driving with Optical Flow Distillation

    Authors: Hengli Wang, Peide Cai, Yuxiang Sun, Lujia Wang, Ming Liu

    Abstract: Recently, deep-learning based approaches have achieved impressive performance for autonomous driving. However, end-to-end vision-based methods typically have limited interpretability, making the behaviors of the deep networks difficult to explain. Hence, their potential applications could be limited in practice. To address this problem, we propose an interpretable end-to-end vision-based motion pl… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

    Comments: 7 pages, 5 figures and 1 table. This paper is accepted by ICRA 2021. arXiv admin note: text overlap with arXiv:2104.08862

  43. arXiv:2104.08862  [pdf, other

    cs.RO cs.CV

    End-to-End Interactive Prediction and Planning with Optical Flow Distillation for Autonomous Driving

    Authors: Hengli Wang, Peide Cai, Rui Fan, Yuxiang Sun, Ming Liu

    Abstract: With the recent advancement of deep learning technology, data-driven approaches for autonomous car prediction and planning have achieved extraordinary performance. Nevertheless, most of these approaches follow a non-interactive prediction and planning paradigm, hypothesizing that a vehicle's behaviors do not affect others. The approaches based on such a non-interactive philosophy typically perform… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

    Comments: 10 pages, 5 figures and 4 tables. This paper is accepted by CVPRW 2021

  44. PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching

    Authors: Hengli Wang, Rui Fan, Peide Cai, Ming Liu

    Abstract: Supervised learning with deep convolutional neural networks (DCNNs) has seen huge adoption in stereo matching. However, the acquisition of large-scale datasets with well-labeled ground truth is cumbersome and labor-intensive, making supervised learning-based approaches often hard to implement in practice. To overcome this drawback, we propose a robust and effective self-supervised stereo matching… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: 8 pages, 8 figures and 2 tables. This paper is accepted by IEEE RA-L with ICRA 2021

  45. arXiv:2101.03834  [pdf, other

    cs.RO

    Closing the Planning-Learning Loop with Application to Autonomous Driving

    Authors: Panpan Cai, David Hsu

    Abstract: Real-time planning under uncertainty is critical for robots operating in complex dynamic environments. Consider, for example, an autonomous robot vehicle driving in dense, unregulated urban traffic of cars, motorcycles, buses, etc. The robot vehicle has to plan in both short and long terms, in order to interact with many traffic participants with uncertain intentions and drive effectively. Plannin… ▽ More

    Submitted 9 August, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Conditionally accepted by T-RO

  46. Learning Collision-Free Space Detection from Stereo Images: Homography Matrix Brings Better Data Augmentation

    Authors: Rui Fan, Hengli Wang, Peide Cai, ** Wu, Mohammud Junaid Bocus, Lei Qiao, Ming Liu

    Abstract: Collision-free space detection is a critical component of autonomous vehicle perception. The state-of-the-art algorithms are typically based on supervised learning. The performance of such approaches is always dependent on the quality and amount of labeled training data. Additionally, it remains an open challenge to train deep convolutional neural networks (DCNNs) using only a small quantity of tr… ▽ More

    Submitted 12 March, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: accepted to IEEE/ASME Transactions on Mechatronics

  47. arXiv:2011.06775  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    DiGNet: Learning Scalable Self-Driving Policies for Generic Traffic Scenarios with Graph Neural Networks

    Authors: Peide Cai, Hengli Wang, Yuxiang Sun, Ming Liu

    Abstract: Traditional decision and planning frameworks for self-driving vehicles (SDVs) scale poorly in new scenarios, thus they require tedious hand-tuning of rules and parameters to maintain acceptable performance in all foreseeable cases. Recently, self-driving methods based on deep learning have shown promising results with better generalization capability but less hand engineering effort. However, most… ▽ More

    Submitted 29 July, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: IROS 2021, 6 pages

  48. arXiv:2011.05767  [pdf, other

    cs.RO

    Simulating Autonomous Driving in Massive Mixed Urban Traffic

    Authors: Yuanfu Luo, Panpan Cai, Yiyuan Lee, David Hsu

    Abstract: Autonomous driving in an unregulated urban crowd is an outstanding challenge, especially, in the presence of many aggressive, high-speed traffic participants. This paper presents SUMMIT, a high-fidelity simulator that facilitates the development and testing of crowd-driving algorithms. SUMMIT simulates dense, unregulated urban traffic at any worldwide locations as supported by the OpenStreetMap. T… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Journal extension of the ICRA2020 paper (arXiv:1911.04074)

  49. arXiv:2011.03813  [pdf, other

    cs.RO cs.AI cs.LG

    MAGIC: Learning Macro-Actions for Online POMDP Planning

    Authors: Yiyuan Lee, Panpan Cai, David Hsu

    Abstract: The partially observable Markov decision process (POMDP) is a principled general framework for robot decision making under uncertainty, but POMDP planning suffers from high computational complexity, when long-term planning is required. While temporally-extended macro-actions help to cut down the effective planning horizon and significantly improve computational efficiency, how do we acquire good m… ▽ More

    Submitted 1 July, 2021; v1 submitted 7 November, 2020; originally announced November 2020.

    Comments: 9 pages (+ 2 page references, + 2 page appendix)

  50. SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection

    Authors: Rui Fan, Hengli Wang, Peide Cai, Ming Liu

    Abstract: Freespace detection is an essential component of visual perception for self-driving cars. The recent efforts made in data-fusion convolutional neural networks (CNNs) have significantly improved semantic driving scene segmentation. Freespace can be hypothesized as a ground plane, on which the points have similar surface normals. Hence, in this paper, we first introduce a novel module, named surface… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: ECCV 2020