Skip to main content

Showing 1–10 of 10 results for author: Gou, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19875  [pdf, other

    cs.CV

    InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding

    Authors: Kirolos Ataallah, Chenhui Gou, Eslam Abdelrahman, Khushbu Pahwa, Jian Ding, Mohamed Elhoseiny

    Abstract: Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averagin… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 16 page ,17 figures

  2. arXiv:2406.12846  [pdf, other

    cs.CV

    DrVideo: Document Retrieval Based Long Video Understanding

    Authors: Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai

    Abstract: Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: difficulty in locating key information and performing long-range reasoning. Thus, we propose DrVideo, a document-retrieval-based system designed for long… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 11 pages

  3. arXiv:2405.09931  [pdf, other

    cs.CV

    Learning from Observer Gaze:Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition

    Authors: Yuchen Zhou, Linkai Liu, Chao Gou

    Abstract: Most existing attention prediction research focuses on salient instances like humans and objects. However, the more complex interaction-oriented attention, arising from the comprehension of interactions between instances by human observers, remains largely unexplored. This is equally crucial for advancing human-machine interaction and human-centered artificial intelligence. To bridge this gap, we… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024. Project HomePage: https://yuchen2199.github.io/Interactive-Gaze/

  4. arXiv:2404.01686  [pdf, other

    cs.CV

    JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

    Authors: Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

    Abstract: Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data from multiple sensors and are required to recognize numerous objects and their movements in complex human-crowded settings. Traditional benchmarks, w… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  5. arXiv:2403.10520  [pdf, other

    cs.CV cs.LG eess.IV

    Strong and Controllable Blind Image Decomposition

    Authors: Zeyu Zhang, Junlin Han, Chenhui Gou, Hongdong Li, Liang Zheng

    Abstract: Blind image decomposition aims to decompose all components present in an image, typically used to restore a multi-degraded input image. While fully recovering the clean image is appealing, in some scenarios, users might want to retain certain degradations, such as watermarks, for copyright protection. To address this need, we add controllability to the blind image decomposition process, allowing u… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/Zhangzeyu97/CBD.git

  6. arXiv:2402.01217  [pdf, other

    cs.CV

    ID-NeRF: Indirect Diffusion-guided Neural Radiance Fields for Generalizable View Synthesis

    Authors: Yaokun Li, Chao Gou, Guang Tan

    Abstract: Implicit neural representations, represented by Neural Radiance Fields (NeRF), have dominated research in 3D computer vision by virtue of high-quality visual results and data-driven benefits. However, their realistic applications are hindered by the need for dense inputs and per-scene optimization. To solve this problem, previous methods implement generalizable NeRFs by extracting local features f… ▽ More

    Submitted 18 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  7. arXiv:2210.07124  [pdf, other

    cs.CV

    RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

    Authors: Jian Wang, Chenhui Gou, Qiman Wu, Haocheng Feng, Junyu Han, Errui Ding, **gdong Wang

    Abstract: Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transformer. We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation, which achieves better trade-off between performa… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: NeurIPS2022

  8. arXiv:1903.04855  [pdf, other

    cs.CV

    Parallel Medical Imaging for Intelligent Medical Image Analysis: Concepts, Methods, and Applications

    Authors: Chao Gou, Tianyu Shen, Wenbo Zheng, Huadan Xue, Hui Yu, Qiang Ji, Zhengyu **, Fei-Yue Wang

    Abstract: There has been much progress in data-driven artificial intelligence technology for medical image analysis in the last decades. However, it still remains challenging due to its distinctive complexity of acquiring and annotating image data, extracting medical domain knowledge, and explaining the diagnostic decision for medical image analysis. In this paper, we propose a data-knowledge-driven framewo… ▽ More

    Submitted 29 June, 2021; v1 submitted 12 March, 2019; originally announced March 2019.

  9. arXiv:1802.04979  [pdf

    cs.CV

    M4CD: A Robust Change Detection Method for Intelligent Visual Surveillance

    Authors: Kunfeng Wang, Chao Gou, Fei-Yue Wang

    Abstract: In this paper, we propose a robust change detection method for intelligent visual surveillance. This method, named M4CD, includes three major steps. Firstly, a sample-based background model that integrates color and texture cues is built and updated over time. Secondly, multiple heterogeneous features (including brightness variation, chromaticity variation, and texture variation) are extracted by… ▽ More

    Submitted 14 February, 2018; originally announced February 2018.

  10. arXiv:1709.08130  [pdf, other

    cs.CV

    Simultaneous Facial Landmark Detection, Pose and Deformation Estimation under Facial Occlusion

    Authors: Yue Wu, Chao Gou, Qiang Ji

    Abstract: Facial landmark detection, head pose estimation, and facial deformation analysis are typical facial behavior analysis tasks in computer vision. The existing methods usually perform each task independently and sequentially, ignoring their interactions. To tackle this problem, we propose a unified framework for simultaneous facial landmark detection, head pose estimation, and facial deformation anal… ▽ More

    Submitted 23 September, 2017; originally announced September 2017.

    Comments: International Conference on Computer Vision and Pattern Recognition, 2017