CHAOS Challenge -- Combined (CT-MR) Healthy Abdominal Organ Segmentation
Authors:
A. Emre Kavur,
N. Sinem Gezer,
Mustafa Barış,
Sinem Aslan,
Pierre-Henri Conze,
Vladimir Groza,
Duc Duy Pham,
Soumick Chatterjee,
Philipp Ernst,
Savaş Özkan,
Bora Baydar,
Dmitry Lachinov,
Shuo Han,
Josef Pauli,
Fabian Isensee,
Matthias Perkonigg,
Rachana Sathish,
Ronnie Rajan,
Debdoot Sheet,
Gurbandurdy Dovletov,
Oliver Speck,
Andreas Nürnberger,
Klaus H. Maier-Hein,
Gözde Bozdağı Akar,
Gözde Ünal
, et al. (2 additional authors not shown)
Abstract:
Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) have introduced new state-of-the-art segmentation systems. In order to expand the knowledge on these topics, the CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge has been organized in conjunction with IEEE…
▽ More
Segmentation of abdominal organs has been a comprehensive, yet unresolved, research field for many years. In the last decade, intensive developments in deep learning (DL) have introduced new state-of-the-art segmentation systems. In order to expand the knowledge on these topics, the CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation challenge has been organized in conjunction with IEEE International Symposium on Biomedical Imaging (ISBI), 2019, in Venice, Italy. CHAOS provides both abdominal CT and MR data from healthy subjects for single and multiple abdominal organ segmentation. Five different but complementary tasks have been designed to analyze the capabilities of current approaches from multiple perspectives. The results are investigated thoroughly, compared with manual annotations and interactive methods. The analysis shows that the performance of DL models for single modality (CT / MR) can show reliable volumetric analysis performance (DICE: 0.98 $\pm$ 0.00 / 0.95 $\pm$ 0.01) but the best MSSD performance remain limited (21.89 $\pm$ 13.94 / 20.85 $\pm$ 10.63 mm). The performances of participating models decrease significantly for cross-modality tasks for the liver (DICE: 0.88 $\pm$ 0.15 MSSD: 36.33 $\pm$ 21.97 mm) and all organs (DICE: 0.85 $\pm$ 0.21 MSSD: 33.17 $\pm$ 38.93 mm). Despite contrary examples on different applications, multi-tasking DL models designed to segment all organs seem to perform worse compared to organ-specific ones (performance drop around 5\%). Besides, such directions of further research for cross-modality segmentation would significantly support real-world clinical applications. Moreover, having more than 1500 participants, another important contribution of the paper is the analysis on shortcomings of challenge organizations such as the effects of multiple submissions and peeking phenomena.
△ Less
Submitted 7 January, 2021; v1 submitted 17 January, 2020;
originally announced January 2020.