Skip to main content

Showing 1–6 of 6 results for author: Lei, S W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.20087  [pdf, other

    cs.CV

    Too Large; Data Reduction for Vision-Language Pre-Training

    Authors: Alex **peng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou

    Abstract: This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major s… ▽ More

    Submitted 18 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: ICCV2023. Code: https://github.com/showlab/datacentric.vlp

  2. arXiv:2208.12037  [pdf, other

    cs.CV

    Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

    Authors: Stan Weixian Lei, Difei Gao, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou

    Abstract: VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in develo** advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study… ▽ More

    Submitted 29 August, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: 18 pages, 13 figures

  3. arXiv:2204.00486  [pdf, other

    cs.CV

    GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

    Authors: Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou

    Abstract: Cognitive science has shown that humans perceive videos in terms of events separated by the state changes of dominant subjects. State changes trigger new events and are one of the most useful among the large amount of redundant information perceived. However, previous research focuses on the overall understanding of segments without evaluating the fine-grained status changes inside. In this paper,… ▽ More

    Submitted 10 August, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: In Proceedings of the European Conference on Computer Vision 2022 [ECCV 2022]

  4. arXiv:2203.04203  [pdf, other

    cs.CV

    AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

    Authors: Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou

    Abstract: A long-standing goal of intelligent assistants such as AR glasses/robots has been to assist users in affordance-centric real-world scenarios, such as "how can I run the microwave for 1 minute?". However, there is still no clear task definition and suitable benchmarks. In this paper, we define a new task called Affordance-centric Question-driven Task Completion, where the AI assistant should learn… ▽ More

    Submitted 20 July, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted by ECCV 2022. Equal contribution: Benita Wong, Joya Chen, You Wu; Corresponding author: Mike Zheng Shou

  5. arXiv:2111.15050  [pdf, other

    cs.CV

    AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

    Authors: Stan Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou

    Abstract: It is still a pipe dream that personal AI assistants on the phone and AR glasses can assist our daily life in addressing our questions like ``how to adjust the date for this watch?'' and ``how to set its heating duration? (while pointing at an oven)''. The queries used in conventional tasks (i.e. Video Question Answering, Video Retrieval, Moment Localization) are often factoid and based on pure te… ▽ More

    Submitted 10 October, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: 20 pages, 12 figures

  6. arXiv:2101.10511  [pdf, other

    cs.CV

    Generic Event Boundary Detection: A Benchmark for Event Segmentation

    Authors: Mike Zheng Shou, Stan Weixian Lei, Weiyao Wang, Deepti Ghadiyaram, Matt Feiszli

    Abstract: This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks. Conventional work in temporal video segmentation and action detection focuses on localizing pre-defined action categories and thus does not scale to generic videos. Cognitive Science has known since last century that humans consistently segmen… ▽ More

    Submitted 19 August, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

    Comments: ICCV 2021