Skip to main content

Showing 1–30 of 30 results for author: Shuai, B

.
  1. arXiv:2404.05136  [pdf, other

    cs.CV cs.AI

    Self-Supervised Multi-Object Tracking with Path Consistency

    Authors: Zijia Lu, Bing Shuai, Yanbei Chen, Zhenlin Xu, Davide Modolo

    Abstract: In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision. Our key idea is that, to track a object through frames, we can obtain multiple different association results from a model by varying the frames it can observe, i.e., skip** frames in observation. As the differences in observations do not alter the identi… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  2. arXiv:2309.11445  [pdf, other

    cs.CV

    SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

    Authors: Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

    Abstract: We present SkeleTR, a new framework for skeleton-based action recognition. In contrast to prior work, which focuses mainly on controlled environments, we target more general scenarios that typically involve a variable number of people and various forms of interaction between people. SkeleTR works with a two-stage paradigm. It first models the intra-person skeleton dynamics for each skeleton sequen… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  3. arXiv:2309.00233  [pdf, other

    cs.CV

    Object-Centric Multiple Object Tracking

    Authors: Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

    Abstract: Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art model… ▽ More

    Submitted 5 September, 2023; v1 submitted 31 August, 2023; originally announced September 2023.

    Comments: ICCV 2023 camera-ready version

  4. arXiv:2308.14602  [pdf

    eess.SY cs.LG

    Recent Progress in Energy Management of Connected Hybrid Electric Vehicles Using Reinforcement Learning

    Authors: Min Hua, Bin Shuai, Quan Zhou, **hai Wang, Yinglong He, Hongming Xu

    Abstract: The growing adoption of hybrid electric vehicles (HEVs) presents a transformative opportunity for revolutionizing transportation energy systems. The shift towards electrifying transportation aims to curb environmental concerns related to fossil fuel consumption. This necessitates efficient energy management systems (EMS) to optimize energy efficiency. The evolution of EMS from HEVs to connected hy… ▽ More

    Submitted 23 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

  5. arXiv:2308.11185  [pdf, other

    cs.CV

    MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation

    Authors: Najmeh Sadoughi, Xinyu Li, Avijit Vajpayee, David Fan, Bing Shuai, Hector Santos-Villalobos, Vimal Bhat, Rohith MV

    Abstract: Previous research has studied the task of segmenting cinematic videos into scenes and into narrative acts. However, these studies have overlooked the essential task of multimodal alignment and fusion for effectively and efficiently processing long-form videos (>60min). In this paper, we introduce Multimodal alignmEnt aGgregation and distillAtion (MEGA) for cinematic long-video segmentation. MEGA t… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 accepted

  6. arXiv:2307.10229  [pdf, other

    cs.RO cs.HC

    Study on the Impacts of Hazardous Behaviors on Autonomous Vehicle Collision Rates Based on Humanoid Scenario Generation in CARLA

    Authors: Longfei Mo, Min Hua, Hongyu Sun, Hongming Xu, Bin Shuai, Quan Zhou

    Abstract: Testing of function safety and Safety Of The Intended Functionality (SOTIF) is important for autonomous vehicles (AVs). It is hard to test the AV's hazard response in the real world because it would involve hazards to passengers and other road users. This paper studied on virtual testing of AV on the CARLA platform and proposed a Humanoid Scenario Generation (HSG) scheme to investigate the impacts… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

  7. arXiv:2303.08981  [pdf, other

    cs.RO eess.SY

    Optimal Energy Management of Plug-in Hybrid Vehicles Through Exploration-to-Exploitation Ratio Control in Ensemble Reinforcement Learning

    Authors: Bin Shuai, Min Hua, Yanfei Li, Shi** Shuai, Hongming Xu, Quan Zhou

    Abstract: Develo** intelligent energy management systems with high adaptability and superiority is necessary and significant for Hybrid Electric Vehicles (HEVs). This paper proposed an ensemble learning-based scheme based on a learning automata module (LAM) to enhance vehicle energy efficiency. Two parallel base learners following two exploration-to-exploitation ratios (E2E) methods are used to generate a… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  8. arXiv:2303.05665  [pdf, other

    cs.RO

    A Systematic Survey of Control Techniques and Applications in Connected and Automated Vehicles

    Authors: Wei Liu, Min Hua, Zhiyun Deng, Zonglin Meng, Yanjun Huang, Chuan Hu, Shunhui Song, Letian Gao, Changsheng Liu, Bin Shuai, Amir Khajepour, Lu Xiong, Xin Xia

    Abstract: Vehicle control is one of the most critical challenges in autonomous vehicles (AVs) and connected and automated vehicles (CAVs), and it is paramount in vehicle safety, passenger comfort, transportation efficiency, and energy saving. This survey attempts to provide a comprehensive and thorough overview of the current state of vehicle control technology, focusing on the evolution from vehicle state… ▽ More

    Submitted 11 April, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  9. arXiv:2211.02175  [pdf, other

    cs.CV

    Large Scale Real-World Multi-Person Tracking

    Authors: Bing Shuai, Alessandro Bergamo, Uta Buechler, Andrew Berneshawi, Alyssa Boden, Joseph Tighe

    Abstract: This paper presents a new large scale multi-person tracking dataset -- \texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. The lack of large scale training and test data for this task has limited the community's ability to understand the performance of their tracking systems… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: ECCV 2022

  10. arXiv:2210.00129  [pdf, other

    cs.CV

    An In-depth Study of Stochastic Backpropagation

    Authors: Jun Fang, Mingze Xu, Hao Chen, Bing Shuai, Zhuowen Tu, Joseph Tighe

    Abstract: In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks. During backward propagation, SBP calculates the gradients by only using a subset of feature maps to save the GPU memory and computational cost. We interpret SBP as an efficient way to implement stochastic gradient decent by… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  11. arXiv:2203.05553  [pdf, other

    cs.CV

    Transfer of Representations to Video Label Propagation: Implementation Factors Matter

    Authors: Daniel McKee, Zitong Zhan, Bing Shuai, Davide Modolo, Joseph Tighe, Svetlana Lazebnik

    Abstract: This work studies feature representations for dense label propagation in video, with a focus on recently proposed methods that learn video correspondence using self-supervised signals such as colorization or temporal cycle consistency. In the literature, these methods have been evaluated with an array of inconsistent settings, making it difficult to discern trends or compare performance fairly. St… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

  12. arXiv:2108.08836  [pdf, other

    cs.CV

    Multi-Object Tracking with Hallucinated and Unlabeled Videos

    Authors: Daniel McKee, Bing Shuai, Andrew Berneshawi, Manchen Wang, Davide Modolo, Svetlana Lazebnik, Joseph Tighe

    Abstract: In this paper, we explore learning end-to-end deep neural trackers without tracking annotations. This is important as large-scale training data is essential for training deep neural trackers while tracking annotations are expensive to acquire. In place of tracking annotations, we first hallucinate videos from images with bounding box annotations using zoom-in/out motion transformations to obtain f… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  13. arXiv:2105.11595  [pdf, other

    cs.CV

    SiamMOT: Siamese Multi-Object Tracking

    Authors: Bing Shuai, Andrew Berneshawi, Xinyu Li, Davide Modolo, Joseph Tighe

    Abstract: In this paper, we focus on improving online multi-object tracking (MOT). In particular, we introduce a region-based Siamese Multi-Object Tracking network, which we name SiamMOT. SiamMOT includes a motion model that estimates the instance's movement between two frames such that detected instances are associated. To explore how the motion modelling affects its tracking capability, we present two var… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Journal ref: CVPR2021

  14. arXiv:2104.11746  [pdf, other

    cs.CV

    VidTr: Video Transformer Without Convolutions

    Authors: Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe

    Abstract: We introduce Video Transformer (VidTr) with separable-attention for video classification. Comparing with commonly used 3D networks, VidTr is able to aggregate spatio-temporal information via stacked attentions and provide better performance with higher efficiency. We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw… ▽ More

    Submitted 15 October, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: ICCV 2021 Accepted

  15. arXiv:2012.08041  [pdf, other

    cs.CV

    NUTA: Non-uniform Temporal Aggregation for Action Recognition

    Authors: Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe

    Abstract: In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video. These methods typically uniformly sample a segment of an input clip (along the temporal dimension). However, not all parts of a video are equally important to determine the action in the clip. In this work, we focus instead on learni… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

  16. arXiv:2007.11040  [pdf, other

    cs.CV

    Directional Temporal Modeling for Action Recognition

    Authors: Xinyu Li, Bing Shuai, Joseph Tighe

    Abstract: Many current activity recognition models use 3D convolutional neural networks (e.g. I3D, I3D-NL) to generate local spatial-temporal features. However, such features do not encode clip-level ordered temporal information. In this paper, we introduce a channel independent directional convolution (CIDC) operation, which learns to model the temporal evolution among local features. By applying multiple… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  17. arXiv:2004.07786  [pdf, other

    cs.CV

    Multi-Object Tracking with Siamese Track-RCNN

    Authors: Bing Shuai, Andrew G. Berneshawi, Davide Modolo, Joseph Tighe

    Abstract: Multi-object tracking systems often consist of a combination of a detector, a short term linker, a re-identification feature extractor and a solver that takes the output from these separate components and makes a final prediction. Differently, this work aims to unify all these in a single tracking system. Towards this, we propose Siamese Track-RCNN, a two stage detect-and-track framework which con… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  18. arXiv:2003.13759  [pdf, other

    cs.CV

    Understanding the impact of mistakes on background regions in crowd counting

    Authors: Davide Modolo, Bing Shuai, Rahul Rama Varior, Joseph Tighe

    Abstract: Every crowd counting researcher has likely observed their model output wrong positive predictions on image regions not containing any person. But how often do these mistakes happen? Are our models negatively affected by this? In this paper we analyze this problem in depth. In order to understand its magnitude, we present an extensive analysis on five of the most important crowd counting datasets.… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  19. arXiv:1909.02651  [pdf, other

    cs.CV

    Semantic Correlation Promoted Shape-Variant Context for Segmentation

    Authors: Henghui Ding, Xudong Jiang, Bing Shuai, Ai Qun Liu, Gang Wang

    Abstract: Context is essential for semantic segmentation. Due to the diverse shapes of objects and their complex layout in various scene images, the spatial scales and shapes of contexts for different objects have very large variation. It is thus ineffective or inefficient to aggregate various context information from a predefined fixed region. In this work, we propose to generate a scale- and shape-variant… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

    Comments: CVPR 2019, Oral

  20. arXiv:1901.06026  [pdf, other

    cs.CV

    Multi-Scale Attention Network for Crowd Counting

    Authors: Rahul Rama Varior, Bing Shuai, Joseph Tighe, Davide Modolo

    Abstract: In crowd counting datasets, people appear at different scales, depending on their distance from the camera. To address this issue, we propose a novel multi-branch scale-aware attention network that exploits the hierarchical structure of convolutional neural networks and generates, in a single forward pass, multi-scale density predictions from different layers of the architecture. To aggregate thes… ▽ More

    Submitted 25 July, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

  21. arXiv:1810.08476  [pdf, ps, other

    cs.CV

    Improving Fast Segmentation With Teacher-student Learning

    Authors: Jiafeng Xie, Bing Shuai, Jian-Fang Hu, **gyang Lin, Wei-Shi Zheng

    Abstract: Recently, segmentation neural networks have been significantly improved by demonstrating very promising accuracies on public benchmarks. However, these models are very heavy and generally suffer from low inference speed, which limits their application scenarios in practice. Meanwhile, existing fast segmentation models usually fail to obtain satisfactory segmentation accuracies on public benchmarks… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

    Comments: 13 pages, 3 figures, conference

    Journal ref: BMVC 2018

  22. Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

    Authors: Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang

    Abstract: This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and Depth maps. It simultaneously performs training of two recurrent neural networks (RNNs) that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality f… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

    Comments: 15 pages, 13 figures, IEEE TMM 2017

    Journal ref: IEEE Transactions on Multimedia 2017

  23. arXiv:1611.08986  [pdf, other

    cs.CV

    Improving Fully Convolution Network for Semantic Segmentation

    Authors: Bing Shuai, Ting Liu, Gang Wang

    Abstract: Fully Convolution Networks (FCN) have achieved great success in dense prediction tasks including semantic segmentation. In this paper, we start from discussing FCN by understanding its architecture limitations in building a strong segmentation network. Next, we present our Improved Fully Convolution Network (IFCN). In contrast to FCN, IFCN introduces a context network that progressively expands th… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

  24. arXiv:1607.08381  [pdf, other

    cs.CV

    A Siamese Long Short-Term Memory Architecture for Human Re-Identification

    Authors: Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, Gang Wang

    Abstract: Matching pedestrians across multiple camera views known as human re-identification (re-identification) is a challenging problem in visual surveillance. In the existing works concentrating on feature extraction, representations are formed locally and independent of other regions. We present a novel siamese Long Short-Term Memory (LSTM) architecture that can process image regions sequentially and en… ▽ More

    Submitted 28 July, 2016; originally announced July 2016.

  25. arXiv:1605.04502  [pdf, other

    cs.CV

    Joint Learning of Siamese CNNs and Temporally Constrained Metrics for Tracklet Association

    Authors: Bing Wang, Li Wang, Bing Shuai, Zhen Zuo, Ting Liu, Kap Luk Chan, Gang Wang

    Abstract: In this paper, we study the challenging problem of multi-object tracking in a complex scene captured by a single camera. Different from the existing tracklet association-based tracking methods, we propose a novel and efficient way to obtain discriminative appearance-based tracklet affinity models. Our proposed method jointly learns the convolutional neural networks (CNNs) and temporally constraine… ▽ More

    Submitted 25 September, 2016; v1 submitted 15 May, 2016; originally announced May 2016.

  26. Scene Parsing with Integration of Parametric and Non-parametric Models

    Authors: Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang

    Abstract: We adopt Convolutional Neural Networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-… ▽ More

    Submitted 20 April, 2016; originally announced April 2016.

    Comments: 13 Pages, 6 figures, IEEE Transactions on Image Processing (T-IP) 2016

  27. arXiv:1512.07108  [pdf, other

    cs.CV cs.LG cs.NE

    Recent Advances in Convolutional Neural Networks

    Authors: Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

    Abstract: In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths… ▽ More

    Submitted 19 October, 2017; v1 submitted 22 December, 2015; originally announced December 2015.

    Comments: Pattern Recognition, Elsevier

  28. Learning Contextual Dependencies with Convolutional Hierarchical Recurrent Neural Networks

    Authors: Zhen Zuo, Bing Shuai, Gang Wang, Xiao Liu, Xingxing Wang, Bing Wang

    Abstract: Existing deep convolutional neural networks (CNNs) have shown their great success on image classification. CNNs mainly consist of convolutional and pooling layers, both of which are performed on local image areas without considering the dependencies among different image regions. However, such dependencies are very important for generating explicit image representation. In contrast, recurrent neur… ▽ More

    Submitted 7 February, 2016; v1 submitted 13 September, 2015; originally announced September 2015.

  29. arXiv:1509.00552  [pdf, other

    cs.CV

    DAG-Recurrent Neural Networks For Scene Labeling

    Authors: Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang

    Abstract: In image labeling, local representations for image units are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded. In this paper, we introduce recurrent neural networks (RNNs) to address this issue. Specifically, directed acyclic graph RNNs (DAG-RNNs) are proposed to process DAG-structured images, which enables the network to mod… ▽ More

    Submitted 23 November, 2015; v1 submitted 1 September, 2015; originally announced September 2015.

  30. arXiv:1508.05306  [pdf, other

    cs.CV

    Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification

    Authors: Zhen Zuo, Gang Wang, Bing Shuai, Lifan Zhao, Qingxiong Yang

    Abstract: In order to encode the class correlation and class specific information in image representation, we propose a new local feature learning approach named Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to hierarchically learn feature transformation filter banks to transform raw pixel image patches to features. The learned filter banks are expected to: (1) encode common visual… ▽ More

    Submitted 21 August, 2015; originally announced August 2015.

    Comments: Pattern Recognition, Elsevier, 2015