Activity Detection in Long Surgical Videos using Spatio-Temporal Models

Sharghi, Aidean; He, Zooey; Mohareri, Omid

Computer Science > Computer Vision and Pattern Recognition

arXiv:2205.02805v1 (cs)

[Submitted on 5 May 2022 (this version), latest version 7 Sep 2022 (v3)]

Title:Activity Detection in Long Surgical Videos using Spatio-Temporal Models

Authors:Aidean Sharghi, Zooey He, Omid Mohareri

View PDF

Abstract:Automatic activity detection is an important component for develo** technologies that enable next generation surgical devices and workflow monitoring systems. In many application, the videos of interest are long and include several activities; hence, the deep models designed for such purposes consist of a backbone and a temporal sequence modeling architecture. In this paper, we investigate both the state-of-the-art activity recognition and temporal models to find the architectures that yield the highest performance. We first benchmark these models on a large-scale activity recognition dataset in the operating room with over 800 full-length surgical videos. However, since most other medical applications lack such a large dataset, we further evaluate our models on the Cholec80 surgical phase segmentation dataset, consisting of only 40 training videos. For backbone architectures, we investigate both 3D ConvNets and most recent transformer-based models; for temporal modeling, we include temporal ConvNets, RNNs, and transformer models for a comprehensive and thorough study. We show that even in the case of limited labeled data, we can outperform the existing work by benefiting from models pre-trained on other tasks.

Comments:	10 pages, excluding references
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2205.02805 [cs.CV]
	(or arXiv:2205.02805v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2205.02805

Submission history

From: Aidean Sharghi [view email]
[v1] Thu, 5 May 2022 17:34:33 UTC (37 KB)
[v2] Sun, 4 Sep 2022 23:27:38 UTC (5,701 KB)
[v3] Wed, 7 Sep 2022 02:14:23 UTC (5,702 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Activity Detection in Long Surgical Videos using Spatio-Temporal Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Activity Detection in Long Surgical Videos using Spatio-Temporal Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators