-
Functional Programming Paradigm of Python for Scientific Computation Pipeline Integration
Authors:
Chen Zhang,
Lecheng Jia,
Wei Zhang,
Ning Wen
Abstract:
The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data control system to facilitate the integration of varying libraries. This integration is of profound significance in accelerating prototype verification, optimising alg…
▽ More
The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data control system to facilitate the integration of varying libraries. This integration is of profound significance in accelerating prototype verification, optimising algorithm performance and minimising maintenance costs. This paper presents a novel functional programming (FP) paradigm based on the Python architecture and associated suites in programming practice, designed for the integration of pipelines of different data map** operations. In particular, the solution is intended for the integration of scientific computation flows, which affords a robust yet flexible solution for the aforementioned challenges.
△ Less
Submitted 3 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Secret Keepers: The Impact of LLMs on Linguistic Markers of Personal Traits
Authors:
Zhivar Sourati,
Meltem Ozcan,
Colin McDaniel,
Alireza Ziabari,
Nuan Wen,
Ala Tak,
Fred Morstatter,
Morteza Dehghani
Abstract:
Prior research has established associations between individuals' language usage and their personal traits; our linguistic patterns reveal information about our personalities, emotional states, and beliefs. However, with the increasing adoption of Large Language Models (LLMs) as writing assistants in everyday writing, a critical question emerges: are authors' linguistic patterns still predictive of…
▽ More
Prior research has established associations between individuals' language usage and their personal traits; our linguistic patterns reveal information about our personalities, emotional states, and beliefs. However, with the increasing adoption of Large Language Models (LLMs) as writing assistants in everyday writing, a critical question emerges: are authors' linguistic patterns still predictive of their personal traits when LLMs are involved in the writing process? We investigate the impact of LLMs on the linguistic markers of demographic and psychological traits, specifically examining three LLMs - GPT3.5, Llama 2, and Gemini - across six different traits: gender, age, political affiliation, personality, empathy, and morality. Our findings indicate that although the use of LLMs slightly reduces the predictive power of linguistic patterns over authors' personal traits, the significant changes are infrequent, and the use of LLMs does not fully diminish the predictive power of authors' linguistic patterns over their personal traits. We also note that some theoretically established lexical-based linguistic markers lose their reliability as predictors when LLMs are used in the writing process. Our findings have important implications for the study of linguistic markers of personal traits in the age of LLMs.
△ Less
Submitted 3 April, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
Can Language Model Moderators Improve the Health of Online Discourse?
Authors:
Hyundong Cho,
Shuai Liu,
Taiwei Shi,
Darpan Jain,
Basem Rizk,
Yuyang Huang,
Zixun Lu,
Nuan Wen,
Jonathan Gratch,
Emilio Ferrara,
Jonathan May
Abstract:
Conversational moderation of online communities is crucial to maintaining civility for a constructive environment, but it is challenging to scale and harmful to moderators. The inclusion of sophisticated natural language generation modules as a force multiplier to aid human moderators is a tantalizing prospect, but adequate evaluation approaches have so far been elusive. In this paper, we establis…
▽ More
Conversational moderation of online communities is crucial to maintaining civility for a constructive environment, but it is challenging to scale and harmful to moderators. The inclusion of sophisticated natural language generation modules as a force multiplier to aid human moderators is a tantalizing prospect, but adequate evaluation approaches have so far been elusive. In this paper, we establish a systematic definition of conversational moderation effectiveness grounded on moderation literature and establish design criteria for conducting realistic yet safe evaluation. We then propose a comprehensive evaluation framework to assess models' moderation capabilities independently of human intervention. With our framework, we conduct the first known study of language models as conversational moderators, finding that appropriately prompted models that incorporate insights from social science can provide specific and fair feedback on toxic behavior but struggle to influence users to increase their levels of respect and cooperation.
△ Less
Submitted 6 May, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways
Authors:
Mingyu Derek Ma,
Alexander K. Taylor,
Nuan Wen,
Yanchen Liu,
Po-Nien Kung,
Wenna Qin,
Shicheng Wen,
Azure Zhou,
Diyi Yang,
Xuezhe Ma,
Nanyun Peng,
Wei Wang
Abstract:
We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users,…
▽ More
We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users, we construct communities among users and develop the propagation forecasting capability, enabling tracing and understanding of how information is disseminated at a higher level.
△ Less
Submitted 20 February, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Biomedical image analysis competitions: The state of current participation practice
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Patrick Godau,
Veronika Cheplygina,
Michal Kozubek,
Sharib Ali,
Anubha Gupta,
Jan Kybic,
Alison Noble,
Carlos Ortiz de Solórzano,
Samiksha Pachade,
Caroline Petitjean,
Daniel Sage,
Donglai Wei,
Elizabeth Wilden,
Deepak Alapatt,
Vincent Andrearczyk,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano
, et al. (331 additional authors not shown)
Abstract:
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,…
▽ More
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
△ Less
Submitted 12 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
Authors:
Sarik Ghazarian,
Nuan Wen,
Aram Galstyan,
Nanyun Peng
Abstract:
Automatic evaluation metrics are essential for the rapid development of open-domain dialogue systems as they facilitate hyper-parameter tuning and comparison between models. Although recently proposed trainable conversation-level metrics have shown encouraging results, the quality of the metrics is strongly dependent on the quality of training data. Prior works mainly resort to heuristic text-leve…
▽ More
Automatic evaluation metrics are essential for the rapid development of open-domain dialogue systems as they facilitate hyper-parameter tuning and comparison between models. Although recently proposed trainable conversation-level metrics have shown encouraging results, the quality of the metrics is strongly dependent on the quality of training data. Prior works mainly resort to heuristic text-level manipulations (e.g. utterances shuffling) to bootstrap incoherent conversations (negative examples) from coherent dialogues (positive examples). Such approaches are insufficient to appropriately reflect the incoherence that occurs in interactions between advanced dialogue models and humans. To tackle this problem, we propose DEAM, a Dialogue coherence Evaluation metric that relies on Abstract Meaning Representation (AMR) to apply semantic-level Manipulations for incoherent (negative) data generation. AMRs naturally facilitate the injection of various types of incoherence sources, such as coreference inconsistency, irrelevancy, contradictions, and decrease engagement, at the semantic level, thus resulting in more natural incoherent samples. Our experiments show that DEAM achieves higher correlations with human judgments compared to baseline methods on several dialog datasets by significant margins. We also show that DEAM can distinguish between coherent and incoherent dialogues generated by baseline manipulations, whereas those baseline models cannot detect incoherent examples generated by DEAM. Our results demonstrate the potential of AMR-based semantic manipulations for natural negative example generation.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences
Authors:
Shikhar Singh,
Nuan Wen,
Yu Hou,
Pegah Alipoormolabashi,
Te-Lin Wu,
Xuezhe Ma,
Nanyun Peng
Abstract:
Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets. However, the reliability and comprehensiveness of these benchmarks towards assessing model's commonsense reasoning ability remains unclear. To this end, we introdu…
▽ More
Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets. However, the reliability and comprehensiveness of these benchmarks towards assessing model's commonsense reasoning ability remains unclear. To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs. We propose a pairwise accuracy metric to reliably measure an agent's ability to perform commonsense reasoning over a given situation. The dataset is crowdsourced and enhanced with an adversarial model-in-the-loop setup to incentivize challenging samples. To facilitate a systematic analysis of commonsense capabilities, we design our dataset along the dimensions of knowledge domains, reasoning scenarios and numeracy. Experimental results demonstrate that our strongest baseline (UnifiedQA-3B), after fine-tuning, achieves ~71% standard accuracy and ~51% pairwise accuracy, well below human performance (~95% for both metrics). The dataset is available at https://github.com/PlusLabNLP/Com2Sense.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
EventPlus: A Temporal Event Understanding Pipeline
Authors:
Mingyu Derek Ma,
Jiao Sun,
Mu Yang,
Kung-Hsiang Huang,
Nuan Wen,
Shikhar Singh,
Rujun Han,
Nanyun Peng
Abstract:
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction. Event information, especially event temporal knowledge, is a type of common sense knowledge that helps people understand how stories evolve and pro…
▽ More
We present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction. Event information, especially event temporal knowledge, is a type of common sense knowledge that helps people understand how stories evolve and provides predictive hints for future events. EventPlus as the first comprehensive temporal event understanding pipeline provides a convenient tool for users to quickly obtain annotations about events and their temporal information for any user-provided document. Furthermore, we show EventPlus can be easily adapted to other domains (e.g., biomedical domain). We make EventPlus publicly available to facilitate event-related information extraction and downstream applications.
△ Less
Submitted 25 April, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Accurate Prostate Cancer Detection and Segmentation on Biparametric MRI using Non-local Mask R-CNN with Histopathological Ground Truth
Authors:
Zhenzhen Dai,
Ivan Jambor,
Pekka Taimen,
Milan Pantelic,
Mohamed Elshaikh,
Craig Rogers,
Otto Ettala,
Peter Boström,
Hannu Aronen,
Harri Merisaari,
Ning Wen
Abstract:
Purpose: We aimed to develop deep machine learning (DL) models to improve the detection and segmentation of intraprostatic lesions (IL) on bp-MRI by using whole amount prostatectomy specimen-based delineations. We also aimed to investigate whether transfer learning and self-training would improve results with small amount labelled data.
Methods: 158 patients had suspicious lesions delineated on…
▽ More
Purpose: We aimed to develop deep machine learning (DL) models to improve the detection and segmentation of intraprostatic lesions (IL) on bp-MRI by using whole amount prostatectomy specimen-based delineations. We also aimed to investigate whether transfer learning and self-training would improve results with small amount labelled data.
Methods: 158 patients had suspicious lesions delineated on MRI based on bp-MRI, 64 patients had ILs delineated on MRI based on whole mount prostatectomy specimen sections, 40 patients were unlabelled. A non-local Mask R-CNN was proposed to improve the segmentation accuracy. Transfer learning was investigated by fine-tuning a model trained using MRI-based delineations with prostatectomy-based delineations. Two label selection strategies were investigated in self-training. The performance of models was evaluated by 3D detection rate, dice similarity coefficient (DSC), 95 percentile Hausdrauff (95 HD, mm) and true positive ratio (TPR).
Results: With prostatectomy-based delineations, the non-local Mask R-CNN with fine-tuning and self-training significantly improved all evaluation metrics. For the model with the highest detection rate and DSC, 80.5% (33/41) of lesions in all Gleason Grade Groups (GGG) were detected with DSC of 0.548[0.165], 95 HD of 5.72[3.17] and TPR of 0.613[0.193]. Among them, 94.7% (18/19) of lesions with GGG > 2 were detected with DSC of 0.604[0.135], 95 HD of 6.26[3.44] and TPR of 0.580[0.190].
Conclusion: DL models can achieve high prostate cancer detection and segmentation accuracy on bp-MRI based on annotations from histologic images. To further improve the performance, more data with annotations of both MRI and whole amount prostatectomy specimens are required.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Improvement of Multiparametric MR Image Segmentation by Augmenting the Data with Generative Adversarial Networks for Glioma Patients
Authors:
Eric Carver,
Zhenzhen Dai,
Evan Liang,
James Snyder,
Ning Wen
Abstract:
Every year thousands of patients are diagnosed with a glioma, a type of malignant brain tumor. Physicians use MR images as a key tool in the diagnosis and treatment of these patients. Neural networks show great potential to aid physicians in the medical image analysis. This study investigates the use of varying amounts of synthetic brain T1-weighted (T1), post-contrast T1-weighted (T1Gd), T2-weigh…
▽ More
Every year thousands of patients are diagnosed with a glioma, a type of malignant brain tumor. Physicians use MR images as a key tool in the diagnosis and treatment of these patients. Neural networks show great potential to aid physicians in the medical image analysis. This study investigates the use of varying amounts of synthetic brain T1-weighted (T1), post-contrast T1-weighted (T1Gd), T2-weighted (T2), and T2 Fluid Attenuated Inversion Recovery (FLAIR) MR images created by a generative adversarial network to overcome the lack of annotated medical image data in training separate 2D U-Nets to segment enhancing tumor, peritumoral edema, and necrosis (non-enhancing tumor core) regions on gliomas. These synthetic MR images were assessed quantitively (SSIM=0.79) and qualitatively by a physician who found that the synthetic images seem stronger for delineation of structural boundaries but struggle more when gradient is significant, (e.g. edema signal in T2 modalities). Multiple 2D U-Nets were trained with original BraTS data and differing subsets of a quarter, half, three-quarters, and all synthetic MR images. There was not an obvious correlation between the improvement of values of the metrics in separate validation dataset for each structure and amount of synthetic data added, there is a strong correlation between the amount of synthetic data added and the number of best overall validation metrics. In summary, this study showed ability to generate high quality synthetic Flair, T2, T1, and T1CE MR images using the GAN. Using the synthetic MR images showed encouraging results to improve the U-Net segmentation performance which has the potential to address the scarcity of readily available medical images.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
Segmentation of the Prostatic Gland and the Intraprostatic Lesions on Multiparametic MRI Using Mask-RCNN
Authors:
Zhenzhen Dai,
Eric Carver,
Chang Liu,
Joon Lee,
Aharon Feldman,
Weiwei Zong,
Milan Pantelic,
Mohamed Elshaikh,
Ning Wen
Abstract:
Prostate cancer (PCa) is the most common cancer in men in the United States. Multiparametic magnetic resonance imaging (mp-MRI) has been explored by many researchers to targeted prostate biopsies and radiation therapy. However, assessment on mp-MRI can be subjective, development of computer-aided diagnosis systems to automatically delineate the prostate gland and the intraprostratic lesions (ILs)…
▽ More
Prostate cancer (PCa) is the most common cancer in men in the United States. Multiparametic magnetic resonance imaging (mp-MRI) has been explored by many researchers to targeted prostate biopsies and radiation therapy. However, assessment on mp-MRI can be subjective, development of computer-aided diagnosis systems to automatically delineate the prostate gland and the intraprostratic lesions (ILs) becomes important to facilitate with radiologists in clinical practice. In this paper, we first study the implementation of the Mask-RCNN model to segment the prostate and ILs. We trained and evaluated models on 120 patients from two different cohorts of patients. We also used 2D U-Net and 3D U-Net as benchmarks to segment the prostate and compared the model's performance. The contour variability of ILs using the algorithm was also benchmarked against the interobserver variability between two different radiation oncologists on 19 patients. Our results indicate that the Mask-RCNN model is able to reach state-of-art performance in the prostate segmentation and outperforms several competitive baselines in ILs segmentation.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
A Deep Dive into Understanding Tumor Foci Classification using Multiparametric MRI Based on Convolutional Neural Network
Authors:
Weiwei Zong,
Joon Lee,
Chang Liu,
Eric Carver,
Aharon Feldman,
Branislava Janic,
Mohamed Elshaikh,
Milan Pantelic,
David Hearshen,
Indrin Chetty,
Benjamin Movsas,
Ning Wen
Abstract:
Deep learning models have had a great success in disease classifications using large data pools of skin cancer images or lung X-rays. However, data scarcity has been the roadblock of applying deep learning models directly on prostate multiparametric MRI (mpMRI). Although model interpretation has been heavily studied for natural images for the past few years, there has been a lack of interpretation…
▽ More
Deep learning models have had a great success in disease classifications using large data pools of skin cancer images or lung X-rays. However, data scarcity has been the roadblock of applying deep learning models directly on prostate multiparametric MRI (mpMRI). Although model interpretation has been heavily studied for natural images for the past few years, there has been a lack of interpretation of deep learning models trained on medical images. This work designs a customized workflow for the small and imbalanced data set of prostate mpMRI where features were extracted from a deep learning model and then analyzed by a traditional machine learning classifier. In addition, this work contributes to revealing how deep learning models interpret mpMRI for prostate cancer patients stratification.
△ Less
Submitted 14 May, 2020; v1 submitted 28 March, 2019;
originally announced March 2019.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.