-
Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos
Authors:
Zhimin Shao,
Jialang Xu,
Danail Stoyanov,
Evangelos B. Mazomenos,
Yueming **
Abstract:
Despite significant advancements in robotic systems and surgical data science, ensuring safe and optimal execution in robot-assisted minimally invasive surgery (RMIS) remains a complex challenge. Current surgical error detection methods involve two parts: identifying surgical gestures and then detecting errors within each gesture clip. These methods seldom consider the rich contextual and semantic…
▽ More
Despite significant advancements in robotic systems and surgical data science, ensuring safe and optimal execution in robot-assisted minimally invasive surgery (RMIS) remains a complex challenge. Current surgical error detection methods involve two parts: identifying surgical gestures and then detecting errors within each gesture clip. These methods seldom consider the rich contextual and semantic information inherent in surgical videos, limiting their performance due to reliance on accurate gesture identification. Motivated by the chain-of-thought prompting in natural language processing, this letter presents a novel and real-time end-to-end error detection framework, Chain-of-Thought (COG) prompting, leveraging contextual information from surgical videos. This encompasses two reasoning modules designed to mimic the decision-making processes of expert surgeons. Concretely, we first design a Gestural-Visual Reasoning module, which utilizes transformer and attention architectures for gesture prompting, while the second, a Multi-Scale Temporal Reasoning module, employs a multi-stage temporal convolutional network with both slow and fast paths for temporal information extraction. We extensively validate our method on the public benchmark RMIS dataset JIGSAWS. Our method encapsulates the reasoning processes inherent to surgical activities enabling it to outperform the state-of-the-art by 4.6% in F1 score, 4.6% in Accuracy, and 5.9% in Jaccard index while processing each frame in 6.69 milliseconds on average, demonstrating the great potential of our approach in enhancing the safety and efficacy of RMIS procedures and surgical education. The code will be available.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery
Authors:
Jialang Xu,
Nazir Sirajudeen,
Matthew Boal,
Nader Francis,
Danail Stoyanov,
Evangelos Mazomenos
Abstract:
Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical e…
▽ More
Automated detection of surgical errors can improve robotic-assisted surgery. Despite promising progress, existing methods still face challenges in capturing rich temporal context to establish long-term dependencies while maintaining computational efficiency. In this paper, we propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection, facilitating efficient long sequence modelling with linear complexity. SEDMamba enhances selective SSM with bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos. The bottleneck mechanism compresses and restores features within their spatial dimension, thereby reducing computational complexity. FCTF utilizes multiple dilated 1D convolutional layers to merge temporal information across diverse scale ranges, accommodating errors of varying durations. Besides, we deploy an established observational clinical human reliability assessment tool (OCHRA) to annotate the errors of suturing tasks in an open-source radical prostatectomy dataset (SAR-RARP50), constructing the first frame-level in-vivo surgical error detection dataset to support error detection in real-world scenarios. Experimental results demonstrate that our SEDMamba outperforms state-of-the-art methods with at least 1.82% AUC and 3.80% AP performance gain with significantly reduced computational complexity.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Shifted-Windows Transformers for the Detection of Cerebral Aneurysms in Microsurgery
Authors:
**fan Zhou,
William Muirhead,
Simon C. Williams,
Danail Stoyanov,
Hani J. Marcus,
Evangelos B. Mazomenos
Abstract:
Purpose: Microsurgical Aneurysm Clip** Surgery (MACS) carries a high risk for intraoperative aneurysm rupture. Automated recognition of instances when the aneurysm is exposed in the surgical video would be a valuable reference point for neuronavigation, indicating phase transitioning and more importantly designating moments of high risk for rupture. This article introduces the MACS dataset conta…
▽ More
Purpose: Microsurgical Aneurysm Clip** Surgery (MACS) carries a high risk for intraoperative aneurysm rupture. Automated recognition of instances when the aneurysm is exposed in the surgical video would be a valuable reference point for neuronavigation, indicating phase transitioning and more importantly designating moments of high risk for rupture. This article introduces the MACS dataset containing 16 surgical videos with frame-level expert annotations and proposes a learning methodology for surgical scene understanding identifying video frames with the aneurysm present in the operating microscope's field-of-view. Methods: Despite the dataset imbalance (80% no presence, 20% presence) and developed without explicit annotations, we demonstrate the applicability of Transformer-based deep learning architectures (MACSSwin-T, vidMACSSwin-T) to detect the aneurysm and classify MACS frames accordingly. We evaluate the proposed models in multiple-fold cross-validation experiments with independent sets and in an unseen set of 15 images against 10 human experts (neurosurgeons). Results: Average (across folds) accuracy of 80.8% (range 78.5%-82.4%) and 87.1% (range 85.1%-91.3%) is obtained for the image- and video-level approach respectively, demonstrating that the models effectively learn the classification task. Qualitative evaluation of the models' class activation maps show these to be localized on the aneurysm's actual location. Depending on the decision threshold, MACSWin-T achieves 66.7% to 86.7% accuracy in the unseen images, compared to 82% of human raters, with moderate to strong correlation.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Objective Surgical Skills Assessment and Tool Localization: Results from the MICCAI 2021 SimSurgSkill Challenge
Authors:
Aneeq Zia,
Kiran Bhattacharyya,
Xi Liu,
Ziheng Wang,
Max Berniker,
Satoshi Kondo,
Emanuele Colleoni,
Dimitris Psychogyios,
Yueming **,
**fan Zhou,
Evangelos Mazomenos,
Lena Maier-Hein,
Danail Stoyanov,
Stefanie Speidel,
Anthony Jarc
Abstract:
Timely and effective feedback within surgical training plays a critical role in develo** the skills required to perform safe and efficient surgery. Feedback from expert surgeons, while especially valuable in this regard, is challenging to acquire due to their typically busy schedules, and may be subject to biases. Formal assessment procedures like OSATS and GEARS attempt to provide objective mea…
▽ More
Timely and effective feedback within surgical training plays a critical role in develo** the skills required to perform safe and efficient surgery. Feedback from expert surgeons, while especially valuable in this regard, is challenging to acquire due to their typically busy schedules, and may be subject to biases. Formal assessment procedures like OSATS and GEARS attempt to provide objective measures of skill, but remain time-consuming. With advances in machine learning there is an opportunity for fast and objective automated feedback on technical skills. The SimSurgSkill 2021 challenge (hosted as a sub-challenge of EndoVis at MICCAI 2021) aimed to promote and foster work in this endeavor. Using virtual reality (VR) surgical tasks, competitors were tasked with localizing instruments and predicting surgical skill. Here we summarize the winning approaches and how they performed. Using this publicly available dataset and results as a springboard, future work may enable more efficient training of surgeons with advances in surgical data science. The dataset can be accessed from https://console.cloud.google.com/storage/browser/isi-simsurgskill-2021.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Towards a Computed-Aided Diagnosis System in Colonoscopy: Automatic Polyp Segmentation Using Convolution Neural Networks
Authors:
Patrick Brandao,
Odysseas Zisimopoulos,
Evangelos Mazomenos,
Gastone Ciuti,
Jorge Bernal,
Marco Visentini-Scarzanella,
Arianna Menciassi,
Paolo Dario,
Anastasios Koulaouzidis,
Alberto Arezzo,
David J Hawkes,
Danail Stoyanov
Abstract:
Early diagnosis is essential for the successful treatment of bowel cancers including colorectal cancer (CRC) and capsule endoscopic imaging with robotic actuation can be a valuable diagnostic tool when combined with automated image analysis. We present a deep learning rooted detection and segmentation framework for recognizing lesions in colonoscopy and capsule endoscopy images. We restructure est…
▽ More
Early diagnosis is essential for the successful treatment of bowel cancers including colorectal cancer (CRC) and capsule endoscopic imaging with robotic actuation can be a valuable diagnostic tool when combined with automated image analysis. We present a deep learning rooted detection and segmentation framework for recognizing lesions in colonoscopy and capsule endoscopy images. We restructure established convolution architectures, such as VGG and ResNets, by converting them into fully-connected convolution networks (FCNs), fine-tune them and study their capabilities for polyp segmentation and detection. We additionally use Shape from-Shading (SfS) to recover depth and provide a richer representation of the tissue's structure in colonoscopy images. Depth is incorporated into our network models as an additional input channel to the RGB information and we demonstrate that the resulting network yields improved performance. Our networks are tested on publicly available datasets and the most accurate segmentation model achieved a mean segmentation IU of 47.78% and 56.95% on the ETIS-Larib and CVC-Colon datasets, respectively. For polyp detection, the top performing models we propose surpass the current state of the art with detection recalls superior to 90% for all datasets tested. To our knowledge, we present the first work to use FCNs for polyp segmentation in addition to proposing a novel combination of SfS and RGB that boosts performance
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Automated Performance Assessment in Transoesophageal Echocardiography with Convolutional Neural Networks
Authors:
Evangelos B. Mazomenos,
Kamakshi Bansal,
Bruce Martin,
Andrew Smith,
Susan Wright,
Danail Stoyanov
Abstract:
Transoesophageal echocardiography (TEE) is a valuable diagnostic and monitoring imaging modality. Proper image acquisition is essential for diagnosis, yet current assessment techniques are solely based on manual expert review. This paper presents a supervised deep learn ing framework for automatically evaluating and grading the quality of TEE images. To obtain the necessary dataset, 38 participant…
▽ More
Transoesophageal echocardiography (TEE) is a valuable diagnostic and monitoring imaging modality. Proper image acquisition is essential for diagnosis, yet current assessment techniques are solely based on manual expert review. This paper presents a supervised deep learn ing framework for automatically evaluating and grading the quality of TEE images. To obtain the necessary dataset, 38 participants of varied experience performed TEE exams with a high-fidelity virtual reality (VR) platform. Two Convolutional Neural Network (CNN) architectures, AlexNet and VGG, structured to perform regression, were finetuned and validated on manually graded images from three evaluators. Two different scoring strategies, a criteria-based percentage and an overall general impression, were used. The developed CNN models estimate the average score with a root mean square accuracy ranging between 84%-93%, indicating the ability to replicate expert valuation. Proposed strategies for automated TEE assessment can have a significant impact on the training process of new TEE operators, providing direct feedback and facilitating the development of the necessary dexterous skills.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Widening siamese architectures for stereo matching
Authors:
Patrick Brandao,
Evangelos Mazomenos,
Danail Stoyanov
Abstract:
Computational stereo is one of the classical problems in computer vision. Numerous algorithms and solutions have been reported in recent years focusing on develo** methods for computing similarity, aggregating it to obtain spatial support and finally optimizing an energy function to find the final disparity. In this paper, we focus on the feature extraction component of stereo matching architect…
▽ More
Computational stereo is one of the classical problems in computer vision. Numerous algorithms and solutions have been reported in recent years focusing on develo** methods for computing similarity, aggregating it to obtain spatial support and finally optimizing an energy function to find the final disparity. In this paper, we focus on the feature extraction component of stereo matching architecture and we show standard CNNs operation can be used to improve the quality of the features used to find point correspondences. Furthermore, we propose a simple space aggregation that hugely simplifies the correlation learning problem. Our results on benchmark data are compelling and show promising potential even without refining the solution.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
A Statistical Index for Early Diagnosis of Ventricular Arrhythmia from the Trend Analysis of ECG Phase-portraits
Authors:
Grazia Cappiello,
Saptarshi Das,
Evangelos B. Mazomenos,
Koushik Maharatna,
George Koulaouzidis,
John Morgan,
Paolo Emilio Puddu
Abstract:
In this paper, we propose a novel statistical index for the early diagnosis of ventricular arrhythmia (VA) using the time delay phase-space reconstruction (PSR) technique, from the electrocardiogram (ECG) signal. Patients with two classes of fatal VA - with preceding ventricular premature beats (VPBs) and with no VPBs have been analysed using extensive simulations. Three subclasses of VA with VPBs…
▽ More
In this paper, we propose a novel statistical index for the early diagnosis of ventricular arrhythmia (VA) using the time delay phase-space reconstruction (PSR) technique, from the electrocardiogram (ECG) signal. Patients with two classes of fatal VA - with preceding ventricular premature beats (VPBs) and with no VPBs have been analysed using extensive simulations. Three subclasses of VA with VPBs viz. ventricular tachycardia (VT), ventricular fibrillation (VF) and VT followed by VF are analyzed using the proposed technique. Measures of descriptive statistics like mean (μ), standard deviation (σ), coefficient of variation (CV = σ/μ), skewness (γ) and kurtosis (\{beta}) in phase-space diagrams are studied for a sliding window of 10 beats of ECG signal using the box-counting technique. Subsequently, a hybrid prediction index which is composed of a weighted sum of CV and kurtosis has been proposed for predicting the impending arrhythmia before its actual occurrence. The early diagnosis involves crossing the upper bound of a hybrid index which is capable of predicting an impending arrhythmia 356 ECG beats, on average (with 192 beats standard deviation) before its onset when tested with 32 VA patients (both with and without VPBs). The early diagnosis result is also verified using a leave out cross-validation (LOOCV) scheme with 96.88% sensitivity, 100% specificity and 98.44% accuracy.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
A novel approach for the diagnosis of ventricular tachycardia based on phase space reconstruction of ECG
Authors:
George Koulaouzidis,
Saptarshi Das,
Grazia Cappiello,
Evangelos B. Mazomenos,
Koushik Maharatna,
John Morgan
Abstract:
Ventricular arrhythmias comprise a group of disorders which manifest clinically in a variety of ways from ventricular premature beats (VPB) and no sustained ventricular tachycardia (in healthy subjects) to sudden cardiac death due to ventricular tachyarrhythmia in patients with and/or without structural heart disease. Ventricular fibrillation (VF) and ventricular tachycardia (VT) are the most comm…
▽ More
Ventricular arrhythmias comprise a group of disorders which manifest clinically in a variety of ways from ventricular premature beats (VPB) and no sustained ventricular tachycardia (in healthy subjects) to sudden cardiac death due to ventricular tachyarrhythmia in patients with and/or without structural heart disease. Ventricular fibrillation (VF) and ventricular tachycardia (VT) are the most common electrical mechanisms for cardiac arrest. Accurate and automatic recognition of these arrhythmias from electrocardiography (ECG) is a crucial task for medical professionals. The purpose of this research is to develop a new index for the differential diagnosis of normal sinus rhythm (SR) and ventricular arrhythmias, based on phase space reconstruction (PSR).
△ Less
Submitted 20 October, 2014;
originally announced October 2014.