Skip to main content

Showing 1–20 of 20 results for author: Wildes, R P

.
  1. arXiv:2404.02233  [pdf, other

    cs.CV

    Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

    Authors: Matthew Kowal, Richard P. Wildes, Konstantinos G. Derpanis

    Abstract: Understanding what deep network models capture in their learned representations is a fundamental challenge in computer vision. We present a new methodology to understanding such vision models, the Visual Concept Connectome (VCC), which discovers human interpretable concepts and their interlayer connections in a fully unsupervised manner. Our approach simultaneously reveals fine-grained concepts at… ▽ More

    Submitted 10 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 (Highlight)

  2. arXiv:2403.12710  [pdf, other

    cs.CV cs.LG

    Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition

    Authors: Filip Ilic, He Zhao, Thomas Pock, Richard P. Wildes

    Abstract: Concerns for the privacy of individuals captured in public imagery have led to privacy-preserving action recognition. Existing approaches often suffer from issues arising through obfuscation being applied globally and a lack of interpretability. Global obfuscation hides privacy sensitive regions, but also contextual regions important for action recognition. Lack of interpretability erodes trust in… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  3. arXiv:2310.12296  [pdf, other

    cs.CV

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Authors: Rezaul Karim, Richard P. Wildes

    Abstract: Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  4. arXiv:2304.13265  [pdf, other

    cs.CV

    StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

    Authors: Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson

    Abstract: Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step lo… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: CVPR'23

  5. arXiv:2304.05930  [pdf, other

    cs.CV

    A Unified Multiscale Encoder-Decoder Transformer for Video Segmentation

    Authors: Rezaul Karim, He Zhao, Richard P. Wildes, Mennatullah Siam

    Abstract: In this paper, we present an end-to-end trainable unified multiscale encoder-decoder transformer that is focused on dense prediction tasks in video. The presented Multiscale Encoder-Decoder Video Transformer (MED-VT) uses multiscale representation throughout and employs an optional input beyond video (e.g., audio), when available, for multimodal processing (MED-VT++). Multiscale representation at… ▽ More

    Submitted 26 February, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Extension of CVPR'23 paper for journal submission

  6. arXiv:2211.01783  [pdf, other

    cs.CV

    Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

    Authors: Matthew Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce, Richard P. Wildes, Konstantinos G. Derpanis

    Abstract: There is limited understanding of the information captured by deep spatiotemporal models in their intermediate representations. For example, while evidence suggests that action recognition algorithms are heavily influenced by visual appearance in single frames, no quantitative methodology exists for evaluating such static bias in the latent representation compared to bias toward dynamics. We tackl… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.02846

  7. arXiv:2208.04897  [pdf, other

    cs.CV

    Sports Video Analysis on Large-Scale Data

    Authors: Dekun Wu, He Zhao, Xingce Bao, Richard P. Wildes

    Abstract: This paper investigates the modeling of automated machine description on sports video, which has seen much progress recently. Nevertheless, state-of-the-art approaches fall quite short of capturing how human experts analyze sports scenes. There are several major reasons: (1) The used dataset is collected from non-official providers, which naturally creates a gap between models trained on those dat… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  8. arXiv:2207.06261  [pdf, other

    cs.CV cs.LG

    Is Appearance Free Action Recognition Possible?

    Authors: Filip Ilic, Thomas Pock, Richard P. Wildes

    Abstract: Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absenc… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  9. arXiv:2206.02846  [pdf, other

    cs.CV

    A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

    Authors: Matthew Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce, Richard P. Wildes, Konstantinos G. Derpanis

    Abstract: Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: CVPR 2022

  10. arXiv:2205.02300  [pdf, other

    cs.CV

    P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

    Authors: He Zhao, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Richard P. Wildes, Allan D. Jepson

    Abstract: In this paper, we study the problem of procedure planning in instructional videos. Here, an agent must produce a plausible sequence of actions that can transform the environment from a given start to a desired goal state. When learning procedure planning from instructional videos, most recent work leverages intermediate visual observations as supervision, which requires expensive annotation effort… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted as an oral paper at CVPR 2022

  11. arXiv:2203.14308  [pdf, other

    cs.CV

    Temporal Transductive Inference for Few-Shot Video Object Segmentation

    Authors: Mennatullah Siam, Konstantinos G. Derpanis, Richard P. Wildes

    Abstract: Few-shot video object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. In this paper, we present a simple but effective temporal transductive inference (TTI) approach that leverages temporal consistency in the unlabelled video frames during few-shot inference. Key to our approach is the use of both global and local tem… ▽ More

    Submitted 16 July, 2023; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: IJCV submission under review

  12. arXiv:2107.05140  [pdf, other

    cs.CV

    Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction

    Authors: He Zhao, Richard P. Wildes

    Abstract: Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations. Action prediction is a major sub-area of video predictive understanding and is the focus of this review. This sub-area has two major subdivisions: early action recognition and future action prediction. Early… ▽ More

    Submitted 16 July, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

  13. arXiv:2107.05122  [pdf, other

    cs.CV

    Interpretable Deep Feature Propagation for Early Action Recognition

    Authors: He Zhao, Richard P. Wildes

    Abstract: Early action recognition (action prediction) from limited preliminary observations plays a critical role for streaming vision systems that demand real-time inference, as video actions often possess elongated temporal spans which cause undesired latency. In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space. There are three key… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

  14. arXiv:2105.12661  [pdf, other

    cs.CV

    Detecting Biological Locomotion in Video: A Computational Approach

    Authors: Soo Min Kang, Richard P. Wildes

    Abstract: Animals locomote for various reasons: to search for food, find suitable habitat, pursue prey, escape from predators, or seek a mate. The grand scale of biodiversity contributes to the great locomotory design and mode diversity. Various creatures make use of legs, wings, fins and other means to move through the world. In this report, we refer to the locomotion of general biological species as biolo… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  15. arXiv:2011.14665  [pdf, other

    cs.CV

    Why Convolutional Networks Learn Oriented Bandpass Filters: Theory and Empirical Support

    Authors: Isma Hadji, Richard P. Wildes

    Abstract: It has been repeatedly observed that convolutional architectures when applied to image understanding tasks learn oriented bandpass filters. A standard explanation of this result is that these filters reflect the structure of the images that they have been exposed to during training: Natural images typically are locally composed of oriented contours at various scales and oriented bandpass filters a… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  16. arXiv:1803.08834  [pdf, other

    cs.CV

    What Do We Understand About Convolutional Networks?

    Authors: Isma Hadji, Richard P. Wildes

    Abstract: This document will review the most prominent proposals using multilayer convolutional architectures. Importantly, the various components of a typical convolutional network will be discussed through a review of different approaches that base their design decisions on biological findings and/or sound theoretical bases. In addition, the different attempts at understanding ConvNets via visualizations… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

  17. arXiv:1801.01415  [pdf, other

    cs.CV

    What have we learned from deep representations for action recognition?

    Authors: Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes, Andrew Zisserman

    Abstract: As the success of deep models has led to their deployment in all areas of computer vision, it is increasingly important to understand how these representations work and what they are capturing. In this paper, we shed light on deep spatiotemporal representations by visualizing what two-stream models have learned in order to recognize actions in video. We show that local detectors for appearance and… ▽ More

    Submitted 4 January, 2018; originally announced January 2018.

    Comments: This document is best viewed in Adobe Reader where figures play on click. Supplementary material can be downloaded at http://feichtenhofer.github.io/action_vis.pdf

  18. arXiv:1708.06690  [pdf, other

    cs.CV

    A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition

    Authors: Isma Hadji, Richard P. Wildes

    Abstract: This paper presents a novel hierarchical spatiotemporal orientation representation for spacetime image analysis. It is designed to combine the benefits of the multilayer architecture of ConvNets and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions a… ▽ More

    Submitted 22 August, 2017; originally announced August 2017.

    Comments: accepted at ICCV 2017

  19. arXiv:1611.02155  [pdf, other

    cs.CV

    Spatiotemporal Residual Networks for Video Action Recognition

    Authors: Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes

    Abstract: Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by introduc… ▽ More

    Submitted 7 November, 2016; originally announced November 2016.

    Comments: NIPS 2016

  20. arXiv:1610.06906  [pdf, other

    cs.CV

    Review of Action Recognition and Detection Methods

    Authors: Soo Min Kang, Richard P. Wildes

    Abstract: In computer vision, action recognition refers to the act of classifying an action that is present in a given video and action detection involves locating actions of interest in space and/or time. Videos, which contain photometric information (e.g. RGB, intensity values) in a lattice structure, contain information that can assist in identifying the action that has been imaged. The process of action… ▽ More

    Submitted 1 November, 2016; v1 submitted 21 October, 2016; originally announced October 2016.

    Report number: EECS-2016-04