On the Value of PHH3 for Mitotic Figure Detection on H&E-stained Images
Authors:
Jonathan Ganz,
Christian Marzahl,
Jonas Ammeling,
Barbara Richter,
Chloé Puget,
Daniela Denk,
Elena A. Demeter,
Flaviu A. Tabaran,
Gabriel Wasinger,
Karoline Lipnik,
Marco Tecilla,
Matthew J. Valentine,
Michael J. Dark,
Niklas Abele,
Pompei Bolfa,
Ramona Erber,
Robert Klopfleisch,
Sophie Merz,
Taryn A. Donovan,
Samir Jabari,
Christof A. Bertram,
Katharina Breininger,
Marc Aubreville
Abstract:
The count of mitotic figures (MFs) observed in hematoxylin and eosin (H&E)-stained slides is an important prognostic marker as it is a measure for tumor cell proliferation. However, the identification of MFs has a known low inter-rater agreement. Deep learning algorithms can standardize this task, but they require large amounts of annotated data for training and validation. Furthermore, label nois…
▽ More
The count of mitotic figures (MFs) observed in hematoxylin and eosin (H&E)-stained slides is an important prognostic marker as it is a measure for tumor cell proliferation. However, the identification of MFs has a known low inter-rater agreement. Deep learning algorithms can standardize this task, but they require large amounts of annotated data for training and validation. Furthermore, label noise introduced during the annotation process may impede the algorithm's performance. Unlike H&E, the mitosis-specific antibody phospho-histone H3 (PHH3) specifically highlights MFs. Counting MFs on slides stained against PHH3 leads to higher agreement among raters and has therefore recently been used as a ground truth for the annotation of MFs in H&E. However, as PHH3 facilitates the recognition of cells indistinguishable from H&E stain alone, the use of this ground truth could potentially introduce noise into the H&E-related dataset, impacting model performance. This study analyzes the impact of PHH3-assisted MF annotation on inter-rater reliability and object level agreement through an extensive multi-rater experiment. We found that the annotators' object-level agreement increased when using PHH3-assisted labeling. Subsequently, MF detectors were evaluated on the resulting datasets to investigate the influence of PHH3-assisted labeling on the models' performance. Additionally, a novel dual-stain MF detector was developed to investigate the interpretation-shift of PHH3-assisted labels used in H&E, which clearly outperformed single-stain detectors. However, the PHH3-assisted labels did not have a positive effect on solely H&E-based models. The high performance of our dual-input detector reveals an information mismatch between the H&E and PHH3-stained images as the cause of this effect.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
Nuclear Pleomorphism in Canine Cutaneous Mast Cell Tumors: Comparison of Reproducibility and Prognostic Relevance between Estimates, Manual Morphometry and Algorithmic Morphometry
Authors:
Andreas Haghofer,
Eda Parlak,
Alexander Bartel,
Taryn A. Donovan,
Charles-Antoine Assenmacher,
Pompei Bolfa,
Michael J. Dark,
Andrea Fuchs-Baumgartinger,
Andrea Klang,
Kathrin Jäger,
Robert Klopfleisch,
Sophie Merz,
Barbara Richter,
F. Yvonne Schulman,
Hannah Janout,
Jonathan Ganz,
Josef Scharinger,
Marc Aubreville,
Stephan M. Winkler,
Matti Kiupel,
Christof A. Bertram
Abstract:
Variation in nuclear size and shape is an important criterion of malignancy for many tumor types; however, categorical estimates by pathologists have poor reproducibility. Measurements of nuclear characteristics (morphometry) can improve reproducibility, but manual methods are time consuming. The aim of this study was to explore the limitations of estimates and develop alternative morphometric sol…
▽ More
Variation in nuclear size and shape is an important criterion of malignancy for many tumor types; however, categorical estimates by pathologists have poor reproducibility. Measurements of nuclear characteristics (morphometry) can improve reproducibility, but manual methods are time consuming. The aim of this study was to explore the limitations of estimates and develop alternative morphometric solutions for canine cutaneous mast cell tumors (ccMCT). We assessed the following nuclear evaluation methods for measurement accuracy, reproducibility, and prognostic utility: 1) anisokaryosis (karyomegaly) estimates by 11 pathologists; 2) gold standard manual morphometry of at least 100 nuclei; 3) practicable manual morphometry with stratified sampling of 12 nuclei by 9 pathologists; and 4) automated morphometry using a deep learning-based segmentation algorithm. The study dataset comprised 96 ccMCT with available outcome information. The study dataset comprised 96 ccMCT with available outcome information. Inter-rater reproducibility of karyomegaly estimates was low ($κ$ = 0.226), while it was good (ICC = 0.654) for practicable morphometry of the standard deviation (SD) of nuclear size. As compared to gold standard manual morphometry (AUC = 0.839, 95% CI: 0.701 - 0.977), the prognostic value (tumor-specific survival) of SDs of nuclear area for practicable manual morphometry (12 nuclei) and automated morphometry were high with an area under the ROC curve (AUC) of 0.868 (95% CI: 0.737 - 0.991) and 0.943 (95% CI: 0.889 - 0.996), respectively. This study supports the use of manual morphometry with stratified sampling of 12 nuclei and algorithmic morphometry to overcome the poor reproducibility of estimates.
△ Less
Submitted 23 May, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.