Search | arXiv e-print repository

Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery

Authors: Sebastian Bodenstedt, Max Allan, Anthony Agustinos, Xiaofei Du, Luis Garcia-Peraza-Herrera, Hannes Kenngott, Thomas Kurmann, Beat Müller-Stich, Sebastien Ourselin, Daniil Pakhomov, Raphael Sznitman, Marvin Teichmann, Martin Thoma, Tom Vercauteren, Sandrine Voros, Martin Wagner, Pamela Wochner, Lena Maier-Hein, Danail Stoyanov, Stefanie Speidel

Abstract: Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are… ▽ More Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are common image data sets for consistent evaluation and benchmarking of algorithms against each other. The paper presents a comparative validation study of different vision-based methods for instrument segmentation and tracking in the context of robotic as well as conventional laparoscopic surgery. The contribution of the paper is twofold: we introduce a comprehensive validation data set that was provided to the study participants and present the results of the comparative validation study. Based on the results of the validation study, we arrive at the conclusion that modern deep learning approaches outperform other methods in instrument segmentation tasks, but the results are still not perfect. Furthermore, we show that merging results from different methods actually significantly increases accuracy in comparison to the best stand-alone method. On the other hand, the results of the instrument tracking task show that this is still an open challenge, especially during challenging scenarios in conventional laparoscopic surgery. △ Less

Submitted 7 May, 2018; originally announced May 2018.

arXiv:1801.07779 [pdf, other]

The WiLI benchmark dataset for written language identification

Authors: Martin Thoma

Abstract: This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification. WiLI-2018 is a publicly available, free of charge dataset of short text extracts from Wikipedia. It contains 1000 paragraphs of 235 languages, totaling in 23500 paragraphs. WiLI is a classification dataset: Given an unknown paragraph written in one dominant language, it has to be decided w… ▽ More This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification. WiLI-2018 is a publicly available, free of charge dataset of short text extracts from Wikipedia. It contains 1000 paragraphs of 235 languages, totaling in 23500 paragraphs. WiLI is a classification dataset: Given an unknown paragraph written in one dominant language, it has to be decided which language it is. △ Less

Submitted 23 January, 2018; originally announced January 2018.

Comments: {"pages": 12, "figures": 4, "language": "English", "author-ORCiD": ["https://orcid.org/0000-0002-6517-1690"]}

arXiv:1707.09725 [pdf, other]

Analysis and Optimization of Convolutional Neural Network Architectures

Authors: Martin Thoma

Abstract: Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare.… ▽ More Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology construction is provided. A novel way to visualize classification errors with confusion matrices was developed. Based on this method, hierarchical classifiers are described and evaluated. Additionally, some results are confirmed and quantified for CIFAR-100. For example, the positive impact of smaller batch sizes, averaging ensembles, data augmentation and test-time transformations on the accuracy. Other results, such as the positive impact of learned color transformation on the test accuracy could not be confirmed. A model which has only one million learned parameters for an input size of 32x32x3 and 100 classes and which beats the state of the art on the benchmark dataset Asirra, GTSRB, HASYv2 and STL-10 was developed. △ Less

Submitted 31 July, 2017; originally announced July 2017.

Comments: Master's thesis. 73 pages + 24 pages appendix; 39 figures; 33 tables

arXiv:1701.08380 [pdf, other]

The HASYv2 dataset

Authors: Martin Thoma

Abstract: This paper describes the HASYv2 dataset. HASY is a publicly available, free of charge dataset of single symbols similar to MNIST. It contains 168233 instances of 369 classes. HASY contains two challenges: A classification challenge with 10 pre-defined folds for 10-fold cross-validation and a verification challenge. This paper describes the HASYv2 dataset. HASY is a publicly available, free of charge dataset of single symbols similar to MNIST. It contains 168233 instances of 369 classes. HASY contains two challenges: A classification challenge with 10 pre-defined folds for 10-fold cross-validation and a verification challenge. △ Less

Submitted 29 January, 2017; originally announced January 2017.

arXiv:1602.06541 [pdf, other]

A Survey of Semantic Segmentation

Authors: Martin Thoma

Abstract: This survey gives an overview over different techniques used for pixel-level semantic segmentation. Metrics and datasets for the evaluation of segmentation algorithms and traditional approaches for segmentation such as unsupervised methods, Decision Forests and SVMs are described and pointers to the relevant papers are given. Recently published approaches with convolutional neural networks are men… ▽ More This survey gives an overview over different techniques used for pixel-level semantic segmentation. Metrics and datasets for the evaluation of segmentation algorithms and traditional approaches for segmentation such as unsupervised methods, Decision Forests and SVMs are described and pointers to the relevant papers are given. Recently published approaches with convolutional neural networks are mentioned and typical problematic situations for segmentation algorithms are examined. A taxonomy of segmentation algorithms is given. △ Less

Submitted 11 May, 2016; v1 submitted 21 February, 2016; originally announced February 2016.

Comments: Fixed typo in accuracy metrics formula; added value range of accuracy metrics; consistent naming of variables

arXiv:1601.03642 [pdf, other]

Creativity in Machine Learning

Authors: Martin Thoma

Abstract: Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summ… ▽ More Recent machine learning techniques can be modified to produce creative results. Those results did not exist before; it is not a trivial combination of the data which was fed into the machine learning system. The obtained results come in multiple forms: As images, as text and as audio. This paper gives a high level overview of how they are created and gives some examples. It is meant to be a summary of the current work and give people who are new to machine learning some starting points. △ Less

Submitted 12 January, 2016; originally announced January 2016.

Comments: 5 pages, 4 figures

arXiv:1512.04469 [pdf, other]

Über die Klassifizierung von Knoten in dynamischen Netzwerken mit Inhalt

Authors: Martin Thoma

Abstract: This paper explains the DYCOS-Algorithm as it was introduced in by Aggarwal and Li in 2011. It operates on graphs whichs nodes are partially labeled and automatically adds missing labels to nodes. To do so, the DYCOS algorithm makes use of the structure of the graph as well as content which is assigned to the node. Aggarwal and Li measured in an experimental analysis that DYCOS adds the missing la… ▽ More This paper explains the DYCOS-Algorithm as it was introduced in by Aggarwal and Li in 2011. It operates on graphs whichs nodes are partially labeled and automatically adds missing labels to nodes. To do so, the DYCOS algorithm makes use of the structure of the graph as well as content which is assigned to the node. Aggarwal and Li measured in an experimental analysis that DYCOS adds the missing labels to a Graph with 19396 nodes of which 14814 are labeled and another Graph with 806635 nodes of which 18999 are labeld on one core of an Intel Xeon 2.5 GHz CPU with 32 G RAM within less than a minute. Additionally, extensions of the DYCOS algorithm are proposed. ----- In dieser Arbeit wird der DYCOS-Algorithmus, wie er 2011 von Aggarwal und Li vorgestellt wurde, erklärt. Er arbeitet auf Graphen, deren Knoten teilweise mit Beschriftungen versehen sind und ergänzt automatisch Beschriftungen für Knoten, die bisher noch keine Beschriftung haben. Dieser Vorgang wird "Klassifizierung" genannt. Dazu verwendet er die Struktur des Graphen sowie textuelle Informationen, die den Knoten zugeordnet sind. Die von Aggarwal und Li beschriebene experimentelle Analyse ergab, dass er auch auf dynamischen Graphen mit 19396 bzw. 806635 Knoten, von denen nur 14814 bzw. 18999 beschriftet waren, innerhalb von weniger als einer Minute auf einem Kern einer Intel Xeon 2.5 GHz CPU mit 32 G RAM ausgeführt werden kann. Zusätzlich wird die Veröffentlichung von Aggarwal und Li kritisch erörtert und und es werden mögliche Erweiterungen des DYCOS-Algorithmus vorgeschlagen. △ Less

Submitted 23 November, 2015; originally announced December 2015.

Comments: in German. This term paper was handed in on 17.01.2014

arXiv:1511.09030 [pdf, other]

doi 10.5445/IR/1000048047

On-line Recognition of Handwritten Mathematical Symbols

Authors: Martin Thoma

Abstract: Finding the name of an unknown symbol is often hard, but writing the symbol is easy. This bachelor's thesis presents multiple systems that use the pen trajectory to classify handwritten symbols. Five preprocessing steps, one data augmentation algorithm, five features and five variants for multilayer Perceptron training were evaluated using 166898 recordings which were collected with two crowdsourc… ▽ More Finding the name of an unknown symbol is often hard, but writing the symbol is easy. This bachelor's thesis presents multiple systems that use the pen trajectory to classify handwritten symbols. Five preprocessing steps, one data augmentation algorithm, five features and five variants for multilayer Perceptron training were evaluated using 166898 recordings which were collected with two crowdsourcing projects. The evaluation results of these 21 experiments were used to create an optimized recognizer which has a TOP1 error of less than 17.5% and a TOP3 error of 4.0%. This is an improvement of 18.5% for the TOP1 error and 29.7% for the TOP3 error. △ Less

Submitted 29 November, 2015; originally announced November 2015.

arXiv:1511.00513 [pdf, other]

Pixel-wise Segmentation of Street with Neural Networks

Authors: Sebastian Bittel, Vitali Kaiser, Marvin Teichmann, Martin Thoma

Abstract: Pixel-wise street segmentation of photographs taken from a drivers perspective is important for self-driving cars and can also support other object recognition tasks. A framework called SST was developed to examine the accuracy and execution time of different neural networks. The best neural network achieved an $F_1$-score of 89.5% with a simple feedforward neural network which trained to solve a… ▽ More Pixel-wise street segmentation of photographs taken from a drivers perspective is important for self-driving cars and can also support other object recognition tasks. A framework called SST was developed to examine the accuracy and execution time of different neural networks. The best neural network achieved an $F_1$-score of 89.5% with a simple feedforward neural network which trained to solve a regression task. △ Less

Submitted 2 November, 2015; originally announced November 2015.

Showing 1–9 of 9 results for author: Thoma, M