-
Machine Learning-Assisted Recurrence Prediction for Early-Stage Non-Small-Cell Lung Cancer Patients
Authors:
Adrianna Janik,
Maria Torrente,
Luca Costabello,
Virginia Calvo,
Brian Walsh,
Carlos Camps,
Sameh K. Mohamed,
Ana L. Ortega,
Vít Nováček,
Bartomeu Massutí,
Pasquale Minervini,
M. Rosario Garcia Campelo,
Edel del Barco,
Joaquim Bosch-Barrera,
Ernestina Menasalvas,
Mohan Timilsina,
Mariano Provencio
Abstract:
Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients?
Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from t…
▽ More
Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients?
Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from the Spanish Lung Cancer Group data (65.7 average age, 24.8% females, 75.2% males) we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHAP local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. Results: Machine learning models trained on tabular data exhibit a 76% accuracy for the Random Forest model at predicting relapse evaluated with a 10-fold cross-validation (model was trained 10 times with different independent sets of patients in test, train and validation sets, the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a 200-patient, held-out test set, calibrated on a held-out set of 100 patients. Conclusions: Our results show that machine learning models trained on tabular and graph data can enable objective, personalised and reproducible prediction of relapse and therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer. Keywords: Non-Small-Cell Lung Cancer, Tumor Recurrence Prediction, Machine Learning
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Lung Cancer Concept Annotation from Spanish Clinical Narratives
Authors:
Marjan Najafabadipour,
Juan Manuel Tuñas,
Alejandro Rodríguez-González,
Ernestina Menasalvas
Abstract:
Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patients history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns ar…
▽ More
Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patients history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns are locked within free text (unstructured) portions of clinical documents and consequence in limiting health professionals to extract useful information from them and to finally perform Query and Answering (QA) process in an accurate way. The Information Extraction (IE) process requires Natural Language Processing (NLP) techniques to assign semantics to these patterns. Therefore, in this paper, we analyze the design of annotators for specific lung cancer concepts that can be integrated over Apache Unstructured Information Management Architecture (UIMA) framework. In addition, we explain the details of generation and storage of annotation outcomes.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Combining complex networks and data mining: why and how
Authors:
M. Zanin,
D. Papo,
P. A. Sousa,
E. Menasalvas,
A. Nicchi,
E. Kubik,
S. Boccaletti
Abstract:
The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theor…
▽ More
The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.
△ Less
Submitted 19 May, 2016; v1 submitted 29 April, 2016;
originally announced April 2016.
-
Information content: assessing meso-scale structures in complex networks
Authors:
Massimiliano Zanin,
Pedro A. Sousa,
Ernestina Menasalvas
Abstract:
We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identification of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of…
▽ More
We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identification of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of modularity, bipartite and core-periphery configurations, or motifs. Results corresponding to a large set of real networks are used to validate its ability to detect non-trivial topological patterns.
△ Less
Submitted 17 May, 2014; v1 submitted 21 January, 2014;
originally announced January 2014.