Skip to main content

Showing 1–14 of 14 results for author: Zia, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.06462  [pdf, other

    cs.CV

    Action Segmentation Using 2D Skeleton Heatmaps and Multi-Modality Fusion

    Authors: Syed Waleed Hyder, Muhammad Usama, Anas Zafar, Muhammad Naufil, Fawad Javed Fateh, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

    Abstract: This paper presents a 2D skeleton-based action segmentation method with applications in fine-grained human activity recognition. In contrast with state-of-the-art methods which directly take sequences of 3D skeleton coordinates as inputs and apply Graph Convolutional Networks (GCNs) for spatiotemporal feature learning, our main idea is to use sequences of 2D skeleton heatmaps as inputs and employ… ▽ More

    Submitted 25 April, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted to ICRA 2024

  2. arXiv:2305.19480  [pdf, other

    cs.CV

    Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion

    Authors: Quoc-Huy Tran, Muhammad Ahmed, Murad Popattia, M. Hassan Ahmed, Andrey Konin, M. Zeeshan Zia

    Abstract: This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications. In contrast with the state-of-the-art method of CASA, where sequences of 3D skeleton coordinates are taken directly as input, our key idea is to use sequences of 2D skeleton heatmaps as input. Unlike CASA which performs self-attention in the t… ▽ More

    Submitted 26 April, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

  3. arXiv:2305.19478  [pdf, other

    cs.CV

    Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

    Authors: Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia

    Abstract: This paper presents an unsupervised transformer-based framework for temporal activity segmentation which leverages not only frame-level cues but also segment-level cues. This is in contrast with previous methods which often rely on frame-level information only. Our approach begins with a frame-level prediction module which estimates framewise action classes via a transformer encoder. The frame-lev… ▽ More

    Submitted 26 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to WACV 2024

  4. arXiv:2206.15031  [pdf, other

    cs.CV

    Timestamp-Supervised Action Segmentation with Graph Convolutional Networks

    Authors: Hamza Khan, Sanjay Haresh, Awais Ahmed, Shakeeb Siddiqui, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

    Abstract: We introduce a novel approach for temporal activity segmentation with timestamp supervision. Our main contribution is a graph convolutional network, which is learned in an end-to-end manner to exploit both frame features and connections between neighboring frames to generate dense framewise labels from sparse timestamp labels. The generated dense framewise labels can then be used to train the segm… ▽ More

    Submitted 2 August, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Accepted to IROS 2022

  5. arXiv:2105.13353  [pdf, other

    cs.CV

    Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering

    Authors: Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran

    Abstract: We present a novel approach for unsupervised activity segmentation which uses video frame clustering as a pretext task and simultaneously performs representation learning and online clustering. This is in contrast with prior works where representation learning and clustering are often performed sequentially. We leverage temporal information in videos by employing temporal optimal transport. In par… ▽ More

    Submitted 17 August, 2023; v1 submitted 27 May, 2021; originally announced May 2021.

    Comments: Presented at CVPR 2022

  6. arXiv:2103.17260  [pdf, other

    cs.CV

    Learning by Aligning Videos in Time

    Authors: Sanjay Haresh, Sateesh Kumar, Huseyin Coskun, Shahram Najam Syed, Andrey Konin, Muhammad Zeeshan Zia, Quoc-Huy Tran

    Abstract: We present a self-supervised approach for learning video representations using temporal video alignment as a pretext task, while exploiting both frame-level and video-level information. We leverage a novel combination of temporal alignment loss and temporal regularization terms, which can be used as supervision signals for training an encoder network. Specifically, the temporal alignment loss (i.e… ▽ More

    Submitted 17 August, 2023; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: Presented at CVPR 2021

  7. arXiv:2004.05261  [pdf, other

    cs.CV

    Towards Anomaly Detection in Dashcam Videos

    Authors: Sanjay Haresh, Sateesh Kumar, M. Zeeshan Zia, Quoc-Huy Tran

    Abstract: Inexpensive sensing and computation, as well as insurance innovations, have made smart dashboard cameras ubiquitous. Increasingly, simple model-driven computer vision algorithms focused on lane departures or safe following distances are finding their way into these devices. Unfortunately, the long-tailed distribution of road hazards means that these hand-crafted pipelines are inadequate for driver… ▽ More

    Submitted 11 May, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

    Comments: To appear at IV 2020

  8. Domain-Specific Priors and Meta Learning for Few-Shot First-Person Action Recognition

    Authors: Huseyin Coskun, Zeeshan Zia, Bugra Tekin, Federica Bogo, Nassir Navab, Federico Tombari, Harpreet Sawhney

    Abstract: The lack of large-scale real datasets with annotations makes transfer learning a necessity for video activity understanding. We aim to develop an effective method for few-shot transfer learning for first-person action classification. We leverage independently trained local visual cues to learn representations that can be transferred from a source domain, which provides primitive action labels, to… ▽ More

    Submitted 7 December, 2021; v1 submitted 22 July, 2019; originally announced July 2019.

    Comments: Paper has been accepted in Transactions on Pattern Analysis and Machine Intelligence

    Journal ref: year = {5555}, volume = {}, number = {01}, issn = {1939-3539}, pages = {1-1},

  9. arXiv:1803.07231  [pdf, other

    cs.CV

    Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences

    Authors: Mohammed E. Fathy, Quoc-Huy Tran, M. Zeeshan Zia, Paul Vernaza, Manmohan Chandraker

    Abstract: Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), es… ▽ More

    Submitted 1 August, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: Accepted to ECCV 2018

  10. arXiv:1801.03399  [pdf, other

    cs.CV

    Deep Supervision with Intermediate Concepts

    Authors: Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker

    Abstract: Recent data-driven approaches to scene interpretation predominantly pose inference as an end-to-end black-box map**, commonly performed by a Convolutional Neural Network (CNN). However, decades of work on perceptual organization in both human and machine vision suggests that there are often intermediate representations that are intrinsic to an inference task, and which provide essential structur… ▽ More

    Submitted 20 July, 2018; v1 submitted 8 January, 2018; originally announced January 2018.

    Comments: Submitted to TPAMI, first revision. arXiv admin note: text overlap with arXiv:1612.02699

  11. arXiv:1612.02699  [pdf, other

    cs.CV

    Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

    Authors: Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker

    Abstract: Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deepl… ▽ More

    Submitted 20 April, 2017; v1 submitted 8 December, 2016; originally announced December 2016.

    Comments: Accepted in CVPR 2017

  12. Comparative Design Space Exploration of Dense and Semi-Dense SLAM

    Authors: M. Zeeshan Zia, Luigi Nardi, Andrew Jack, Emanuele Vespa, Bruno Bodin, Paul H. J. Kelly, Andrew J. Davison

    Abstract: SLAM has matured significantly over the past few years, and is beginning to appear in serious commercial products. While new SLAM systems are being proposed at every conference, evaluation is often restricted to qualitative visualizations or accuracy estimation against a ground truth. This is due to the lack of benchmarking methodologies which can holistically and quantitatively evaluate these sys… ▽ More

    Submitted 3 March, 2016; v1 submitted 15 September, 2015; originally announced September 2015.

    Comments: IEEE International Conference on Robotics and Automation 2016

  13. Towards Scene Understanding with Detailed 3D Object Representations

    Authors: M. Zeeshan Zia, Michael Stark, Konrad Schindler

    Abstract: Current approaches to semantic image and scene understanding typically employ rather simple object representations such as 2D or 3D bounding boxes. While such coarse models are robust and allow for reliable object detection, they discard much of the information about objects' 3D shape and pose, and thus do not lend themselves well to higher-level reasoning. Here, we propose to base scene understan… ▽ More

    Submitted 18 November, 2014; originally announced November 2014.

    Comments: International Journal of Computer Vision (appeared online on 4 November 2014). Online version: http://link.springer.com/article/10.1007/s11263-014-0780-y

  14. arXiv:1410.2167  [pdf, other

    cs.RO cs.CV cs.DC cs.PF

    Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

    Authors: Luigi Nardi, Bruno Bodin, M. Zeeshan Zia, John Mawer, Andy Nisbet, Paul H. J. Kelly, Andrew J. Davison, Mikel Luján, Michael F. P. O'Boyle, Graham Riley, Nigel Topham, Steve Furber

    Abstract: Real-time dense computer vision and SLAM offer great potential for a new level of scene modelling, tracking and real environmental interaction for many types of robot, but their high computational requirements mean that use on mass market embedded platforms is challenging. Meanwhile, trends in low-cost, low-power processing are towards massive parallelism and heterogeneity, making it difficult for… ▽ More

    Submitted 26 February, 2015; v1 submitted 8 October, 2014; originally announced October 2014.

    Comments: 8 pages, ICRA 2015 conference paper

    Journal ref: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7140009 IEEE Xplore 2015