-
CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models
Authors:
Fan Fan,
Georgia Martinez,
Thomas Desilvio,
John Shin,
Yijiang Chen,
Bangchen Wang,
Takaya Ozeki,
Maxime W. Lafarge,
Viktor H. Koelzer,
Laura Barisoni,
Anant Madabhushi,
Satish E. Viswanath,
Andrew Janowczyk
Abstract:
Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder, an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream medical image…
▽ More
Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder, an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
PatchSorter: A High Throughput Deep Learning Digital Pathology Tool for Object Labeling
Authors:
Cedric Walker,
Tasneem Talawalla,
Robert Toth,
Akhil Ambekar,
Kien Rea,
Oswin Chamian,
Fan Fan,
Sabina Berezowska,
Sven Rottenberg,
Anant Madabhushi,
Marie Maillard,
Laura Barisoni,
Hugo Mark Horlings,
Andrew Janowczyk
Abstract:
The discovery of patterns associated with diagnosis, prognosis, and therapy response in digital pathology images often requires intractable labeling of large quantities of histological objects. Here we release an open-source labeling tool, PatchSorter, which integrates deep learning with an intuitive web interface. Using >100,000 objects, we demonstrate a >7x improvement in labels per second over…
▽ More
The discovery of patterns associated with diagnosis, prognosis, and therapy response in digital pathology images often requires intractable labeling of large quantities of histological objects. Here we release an open-source labeling tool, PatchSorter, which integrates deep learning with an intuitive web interface. Using >100,000 objects, we demonstrate a >7x improvement in labels per second over unaided labeling, with minimal impact on labeling accuracy, thus enabling high-throughput labeling of large datasets.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Panoptic segmentation with highly imbalanced semantic labels
Authors:
Josef Lorenz Rumberger,
Elias Baumann,
Peter Hirsch,
Andrew Janowczyk,
Inti Zlobec,
Dagmar Kainmueller
Abstract:
We describe here the panoptic segmentation method we devised for our participation in the CoNIC: Colon Nuclei Identification and Counting Challenge at ISBI 2022. Key features of our method are a weighted loss specifically engineered for semantic segmentation of highly imbalanced cell types, and a state-of-the art nuclei instance segmentation model, which we combine in a Hovernet-like architecture.
We describe here the panoptic segmentation method we devised for our participation in the CoNIC: Colon Nuclei Identification and Counting Challenge at ISBI 2022. Key features of our method are a weighted loss specifically engineered for semantic segmentation of highly imbalanced cell types, and a state-of-the art nuclei instance segmentation model, which we combine in a Hovernet-like architecture.
△ Less
Submitted 19 April, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
MRQy: An Open-Source Tool for Quality Control of MR Imaging Data
Authors:
Amir Reza Sadri,
Andrew Janowczyk,
Ren Zou,
Ruchika Verma,
Niha Beig,
Jacob Antunes,
Anant Madabhushi,
Pallavi Tiwari,
Satish E. Viswanath
Abstract:
We sought to develop a quantitative tool to quickly determine relative differences in MRI volumes both within and between large MR imaging cohorts (such as available in The Cancer Imaging Archive (TCIA)), in order to help determine the generalizability of radiomics and machine learning schemes to unseen datasets. The tool is intended to help quantify presence of (a) site- or scanner-specific varia…
▽ More
We sought to develop a quantitative tool to quickly determine relative differences in MRI volumes both within and between large MR imaging cohorts (such as available in The Cancer Imaging Archive (TCIA)), in order to help determine the generalizability of radiomics and machine learning schemes to unseen datasets. The tool is intended to help quantify presence of (a) site- or scanner-specific variations in image resolution, field-of-view, or image contrast, or (b) imaging artifacts such as noise, motion, inhomogeneity, ringing, or aliasing; which can adversely affect relative image quality between data cohorts. We present MRQy, a new open-source quality control tool to (a) interrogate MRI cohorts for site- or equipment-based differences, and (b) quantify the impact of MRI artifacts on relative image quality; to help determine how to correct for these variations prior to model development. MRQy extracts a series of quality measures (e.g. noise ratios, variation metrics, entropy and energy criteria) and MR image metadata (e.g. voxel resolution, image dimensions) for subsequent interrogation via a specialized HTML5 based front-end designed for real-time filtering and trend visualization. MRQy was used to evaluate (a) n=133 brain MRIs from TCIA (7 sites), and (b) n=104 rectal MRIs (3 local sites). MRQy measures revealed significant site-specific variations in both cohorts, indicating potential batch effects. Marked differences in specific MRQy measures were also able to identify outlier MRI datasets that needed to be corrected for common MR imaging artifacts. MRQy is designed to be a standalone, unsupervised tool that can be efficiently run on a standard desktop computer. It has been made freely accessible at \url{http://github.com/ccipd/MRQy} for wider community use and feedback.
△ Less
Submitted 17 August, 2020; v1 submitted 9 April, 2020;
originally announced April 2020.