-
The Endoscapes Dataset for Surgical Scene Segmentation, Object Detection, and Critical View of Safety Assessment: Official Splits and Benchmark
Authors:
Aditya Murali,
Deepak Alapatt,
Pietro Mascagni,
Armine Vardazaryan,
Alain Garcia,
Nariaki Okamoto,
Guido Costamagna,
Didier Mutter,
Jacques Marescaux,
Bernard Dallemagne,
Nicolas Padoy
Abstract:
This technical report provides a detailed overview of Endoscapes, a dataset of laparoscopic cholecystectomy (LC) videos with highly intricate annotations targeted at automated assessment of the Critical View of Safety (CVS). Endoscapes comprises 201 LC videos with frames annotated sparsely but regularly with segmentation masks, bounding boxes, and CVS assessment by three different clinical experts…
▽ More
This technical report provides a detailed overview of Endoscapes, a dataset of laparoscopic cholecystectomy (LC) videos with highly intricate annotations targeted at automated assessment of the Critical View of Safety (CVS). Endoscapes comprises 201 LC videos with frames annotated sparsely but regularly with segmentation masks, bounding boxes, and CVS assessment by three different clinical experts. Altogether, there are 11090 frames annotated with CVS and 1933 frames annotated with tool and anatomy bounding boxes from the 201 videos, as well as an additional 422 frames from 50 of the 201 videos annotated with tool and anatomy segmentation masks. In this report, we provide detailed dataset statistics (size, class distribution, dataset splits, etc.) and a comprehensive performance benchmark for instance segmentation, object detection, and CVS prediction. The dataset and model checkpoints are publically available at https://github.com/CAMMA-public/Endoscapes.
△ Less
Submitted 26 January, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Encoding Surgical Videos as Latent Spatiotemporal Graphs for Object and Anatomy-Driven Reasoning
Authors:
Aditya Murali,
Deepak Alapatt,
Pietro Mascagni,
Armine Vardazaryan,
Alain Garcia,
Nariaki Okamoto,
Didier Mutter,
Nicolas Padoy
Abstract:
Recently, spatiotemporal graphs have emerged as a concise and elegant manner of representing video clips in an object-centric fashion, and have shown to be useful for downstream tasks such as action recognition. In this work, we investigate the use of latent spatiotemporal graphs to represent a surgical video in terms of the constituent anatomical structures and tools and their evolving properties…
▽ More
Recently, spatiotemporal graphs have emerged as a concise and elegant manner of representing video clips in an object-centric fashion, and have shown to be useful for downstream tasks such as action recognition. In this work, we investigate the use of latent spatiotemporal graphs to represent a surgical video in terms of the constituent anatomical structures and tools and their evolving properties over time. To build the graphs, we first predict frame-wise graphs using a pre-trained model, then add temporal edges between nodes based on spatial coherence and visual and semantic similarity. Unlike previous approaches, we incorporate long-term temporal edges in our graphs to better model the evolution of the surgical scene and increase robustness to temporary occlusions. We also introduce a novel graph-editing module that incorporates prior knowledge and temporal coherence to correct errors in the graph, enabling improved downstream task performance. Using our graph representations, we evaluate two downstream tasks, critical view of safety prediction and surgical phase recognition, obtaining strong results that demonstrate the quality and flexibility of the learned representations. Code is available at github.com/CAMMA-public/SurgLatentGraph.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
Authors:
Chinedu Innocent Nwoye,
Tong Yu,
Saurav Sharma,
Aditya Murali,
Deepak Alapatt,
Armine Vardazaryan,
Kun Yuan,
Jonas Hajek,
Wolfgang Reiter,
Amine Yamlahi,
Finn-Henri Smidt,
Xiaoyang Zou,
Guoyan Zheng,
Bruno Oliveira,
Helena R. Torres,
Satoshi Kondo,
Satoshi Kasai,
Felix Holm,
Ege Özsoy,
Shuangchun Gui,
Han Li,
Sista Raviteja,
Rachana Sathish,
Pranav Poudel,
Binod Bhattarai
, et al. (24 additional authors not shown)
Abstract:
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor…
▽ More
Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery.
△ Less
Submitted 14 July, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Preserving Privacy in Surgical Video Analysis Using Artificial Intelligence: A Deep Learning Classifier to Identify Out-of-Body Scenes in Endoscopic Videos
Authors:
Joël L. Lavanchy,
Armine Vardazaryan,
Pietro Mascagni,
AI4SafeChole Consortium,
Didier Mutter,
Nicolas Padoy
Abstract:
Objective: To develop and validate a deep learning model for the identification of out-of-body images in endoscopic videos. Background: Surgical video analysis facilitates education and research. However, video recordings of endoscopic surgeries can contain privacy-sensitive information, especially if out-of-body scenes are recorded. Therefore, identification of out-of-body scenes in endoscopic vi…
▽ More
Objective: To develop and validate a deep learning model for the identification of out-of-body images in endoscopic videos. Background: Surgical video analysis facilitates education and research. However, video recordings of endoscopic surgeries can contain privacy-sensitive information, especially if out-of-body scenes are recorded. Therefore, identification of out-of-body scenes in endoscopic videos is of major importance to preserve the privacy of patients and operating room staff. Methods: A deep learning model was trained and evaluated on an internal dataset of 12 different types of laparoscopic and robotic surgeries. External validation was performed on two independent multicentric test datasets of laparoscopic gastric bypass and cholecystectomy surgeries. All images extracted from the video datasets were annotated as inside or out-of-body. Model performance was evaluated compared to human ground truth annotations measuring the receiver operating characteristic area under the curve (ROC AUC). Results: The internal dataset consisting of 356,267 images from 48 videos and the two multicentric test datasets consisting of 54,385 and 58,349 images from 10 and 20 videos, respectively, were annotated. Compared to ground truth annotations, the model identified out-of-body images with 99.97% ROC AUC on the internal test dataset. Mean $\pm$ standard deviation ROC AUC on the multicentric gastric bypass dataset was 99.94$\pm$0.07% and 99.71$\pm$0.40% on the multicentric cholecystectomy dataset, respectively. Conclusion: The proposed deep learning model can reliably identify out-of-body images in endoscopic videos. The trained model is publicly shared. This facilitates privacy preservation in surgical video analysis.
△ Less
Submitted 7 June, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Biomedical image analysis competitions: The state of current participation practice
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Patrick Godau,
Veronika Cheplygina,
Michal Kozubek,
Sharib Ali,
Anubha Gupta,
Jan Kybic,
Alison Noble,
Carlos Ortiz de Solórzano,
Samiksha Pachade,
Caroline Petitjean,
Daniel Sage,
Donglai Wei,
Elizabeth Wilden,
Deepak Alapatt,
Vincent Andrearczyk,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano
, et al. (331 additional authors not shown)
Abstract:
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,…
▽ More
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
△ Less
Submitted 12 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Real-Time Artificial Intelligence Assistance for Safe Laparoscopic Cholecystectomy: Early-Stage Clinical Evaluation
Authors:
Pietro Mascagni,
Deepak Alapatt,
Alfonso Lapergola,
Armine Vardazaryan,
Jean-Paul Mazellier,
Bernard Dallemagne,
Didier Mutter,
Nicolas Padoy
Abstract:
Artificial intelligence is set to be deployed in operating rooms to improve surgical care. This early-stage clinical evaluation shows the feasibility of concurrently attaining real-time, high-quality predictions from several deep neural networks for endoscopic video analysis deployed for assistance during three laparoscopic cholecystectomies.
Artificial intelligence is set to be deployed in operating rooms to improve surgical care. This early-stage clinical evaluation shows the feasibility of concurrently attaining real-time, high-quality predictions from several deep neural networks for endoscopic video analysis deployed for assistance during three laparoscopic cholecystectomies.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Latent Graph Representations for Critical View of Safety Assessment
Authors:
Aditya Murali,
Deepak Alapatt,
Pietro Mascagni,
Armine Vardazaryan,
Alain Garcia,
Nariaki Okamoto,
Didier Mutter,
Nicolas Padoy
Abstract:
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then…
▽ More
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. Prior works have approached this task by including semantic segmentation as an intermediate step, using predicted segmentation masks to then predict the CVS. While these methods are effective, they rely on extremely expensive ground-truth segmentation annotations and tend to fail when the predicted segmentation is incorrect, limiting generalization. In this work, we propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network. Our graph representations explicitly encode semantic information - object location, class information, geometric relations - to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors. Finally, to address annotation cost, we propose to train our method using only bounding box annotations, incorporating an auxiliary image reconstruction objective to learn fine-grained object boundaries. We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.
△ Less
Submitted 19 December, 2023; v1 submitted 8 December, 2022;
originally announced December 2022.
-
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition
Authors:
Chinedu Innocent Nwoye,
Deepak Alapatt,
Tong Yu,
Armine Vardazaryan,
Fangfang Xia,
Zixuan Zhao,
Tong Xia,
Fucang Jia,
Yuxuan Yang,
Hao Wang,
Derong Yu,
Guoyan Zheng,
Xiaotian Duan,
Neil Getty,
Ricardo Sanchez-Matilla,
Maria Robu,
Li Zhang,
Huabin Chen,
Jiacheng Wang,
Liansheng Wang,
Bokai Zhang,
Beerend Gerats,
Sista Raviteja,
Rachana Sathish,
Rong Tao
, et al. (37 additional authors not shown)
Abstract:
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in…
▽ More
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
△ Less
Submitted 29 December, 2022; v1 submitted 10 April, 2022;
originally announced April 2022.
-
Temporally Constrained Neural Networks (TCNN): A framework for semi-supervised video semantic segmentation
Authors:
Deepak Alapatt,
Pietro Mascagni,
Armine Vardazaryan,
Alain Garcia,
Nariaki Okamoto,
Didier Mutter,
Jacques Marescaux,
Guido Costamagna,
Bernard Dallemagne,
Nicolas Padoy
Abstract:
A major obstacle to building models for effective semantic segmentation, and particularly video semantic segmentation, is a lack of large and well annotated datasets. This bottleneck is particularly prohibitive in highly specialized and regulated fields such as medicine and surgery, where video semantic segmentation could have important applications but data and expert annotations are scarce. In t…
▽ More
A major obstacle to building models for effective semantic segmentation, and particularly video semantic segmentation, is a lack of large and well annotated datasets. This bottleneck is particularly prohibitive in highly specialized and regulated fields such as medicine and surgery, where video semantic segmentation could have important applications but data and expert annotations are scarce. In these settings, temporal clues and anatomical constraints could be leveraged during training to improve performance. Here, we present Temporally Constrained Neural Networks (TCNN), a semi-supervised framework used for video semantic segmentation of surgical videos. In this work, we show that autoencoder networks can be used to efficiently provide both spatial and temporal supervisory signals to train deep learning models. We test our method on a newly introduced video dataset of laparoscopic cholecystectomy procedures, Endoscapes, and an adaptation of a public dataset of cataract surgeries, CaDIS. We demonstrate that lower-dimensional representations of predicted masks can be leveraged to provide a consistent improvement on both sparsely labeled datasets with no additional computational cost at inference time. Further, the TCNN framework is model-agnostic and can be used in conjunction with other model design choices with minimal additional complexity.
△ Less
Submitted 27 December, 2021;
originally announced December 2021.
-
Comparative Validation of Machine Learning Algorithms for Surgical Workflow and Skill Analysis with the HeiChole Benchmark
Authors:
Martin Wagner,
Beat-Peter Müller-Stich,
Anna Kisilenko,
Duc Tran,
Patrick Heger,
Lars Mündermann,
David M Lubotsky,
Benjamin Müller,
Tornike Davitashvili,
Manuela Capek,
Annika Reinke,
Tong Yu,
Armine Vardazaryan,
Chinedu Innocent Nwoye,
Nicolas Padoy,
Xinyang Liu,
Eung-Joo Lee,
Constantin Disch,
Hans Meine,
Tong Xia,
Fucang Jia,
Satoshi Kondo,
Wolfgang Reiter,
Yueming **,
Yonghao Long
, et al. (16 additional authors not shown)
Abstract:
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported fo…
▽ More
PURPOSE: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center dataset. In this work we investigated the generalizability of phase recognition algorithms in a multi-center setting including more difficult recognition tasks such as surgical action and surgical skill. METHODS: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 hours was created. Labels included annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 teams submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. RESULTS: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n=9 teams), for instrument presence detection between 38.5% and 63.8% (n=8 teams), but for action recognition only between 21.8% and 23.3% (n=5 teams). The average absolute error for skill assessment was 0.78 (n=1 team). CONCLUSION: Surgical workflow and skill analysis are promising technologies to support the surgical team, but are not solved yet, as shown by our comparison of algorithms. This novel benchmark can be used for comparable evaluation and validation of future work.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Surgical data science for safe cholecystectomy: a protocol for segmentation of hepatocystic anatomy and assessment of the critical view of safety
Authors:
Pietro Mascagni,
Deepak Alapatt,
Alain Garcia,
Nariaki Okamoto,
Armine Vardazaryan,
Guido Costamagna,
Bernard Dallemagne,
Nicolas Padoy
Abstract:
Minimally invasive image-guided surgery heavily relies on vision. Deep learning models for surgical video analysis could therefore support visual tasks such as assessing the critical view of safety (CVS) in laparoscopic cholecystectomy (LC), potentially contributing to surgical safety and efficiency. However, the performance, reliability and reproducibility of such models are deeply dependent on t…
▽ More
Minimally invasive image-guided surgery heavily relies on vision. Deep learning models for surgical video analysis could therefore support visual tasks such as assessing the critical view of safety (CVS) in laparoscopic cholecystectomy (LC), potentially contributing to surgical safety and efficiency. However, the performance, reliability and reproducibility of such models are deeply dependent on the quality of data and annotations used in their development. Here, we present a protocol, checklists, and visual examples to promote consistent annotation of hepatocystic anatomy and CVS criteria. We believe that sharing annotation guidelines can help build trustworthy multicentric datasets for assessing generalizability of performance, thus accelerating the clinical translation of deep learning models for surgical video analysis.
△ Less
Submitted 20 September, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Weakly-Supervised Learning for Tool Localization in Laparoscopic Videos
Authors:
Armine Vardazaryan,
Didier Mutter,
Jacques Marescaux,
Nicolas Padoy
Abstract:
Surgical tool localization is an essential task for the automatic analysis of endoscopic videos. In the literature, existing methods for tool localization, tracking and segmentation require training data that is fully annotated, thereby limiting the size of the datasets that can be used and the generalization of the approaches. In this work, we propose to circumvent the lack of annotated data with…
▽ More
Surgical tool localization is an essential task for the automatic analysis of endoscopic videos. In the literature, existing methods for tool localization, tracking and segmentation require training data that is fully annotated, thereby limiting the size of the datasets that can be used and the generalization of the approaches. In this work, we propose to circumvent the lack of annotated data with weak supervision. We propose a deep architecture, trained solely on image level annotations, that can be used for both tool presence detection and localization in surgical videos. Our architecture relies on a fully convolutional neural network, trained end-to-end, enabling us to localize surgical tools without explicit spatial annotations. We demonstrate the benefits of our approach on a large public dataset, Cholec80, which is fully annotated with binary tool presence information and of which 5 videos have been fully annotated with bounding boxes and tool centers for the evaluation.
△ Less
Submitted 18 July, 2018; v1 submitted 14 June, 2018;
originally announced June 2018.