-
iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine
Authors:
Anastasia Krithara,
Fotis Aisopos,
Vassiliki Rentoumi,
Anastasios Nentidis,
Konstantinos Bougatiotis,
Maria-Esther Vidal,
Ernestina Menasalvas,
Alejandro Rodriguez-Gonzalez,
Eleftherios G. Samaras,
Peter Garrard,
Maria Torrente,
Mariano Provencio Pulla,
Nikos Dimakopoulos,
Rui Mauricio,
Jordi Rambla De Argila,
Gian Gaetano Tartaglia,
George Paliouras
Abstract:
The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable info…
▽ More
The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Machine Learning-Assisted Recurrence Prediction for Early-Stage Non-Small-Cell Lung Cancer Patients
Authors:
Adrianna Janik,
Maria Torrente,
Luca Costabello,
Virginia Calvo,
Brian Walsh,
Carlos Camps,
Sameh K. Mohamed,
Ana L. Ortega,
Vít Nováček,
Bartomeu Massutí,
Pasquale Minervini,
M. Rosario Garcia Campelo,
Edel del Barco,
Joaquim Bosch-Barrera,
Ernestina Menasalvas,
Mohan Timilsina,
Mariano Provencio
Abstract:
Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients?
Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from t…
▽ More
Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients?
Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from the Spanish Lung Cancer Group data (65.7 average age, 24.8% females, 75.2% males) we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHAP local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. Results: Machine learning models trained on tabular data exhibit a 76% accuracy for the Random Forest model at predicting relapse evaluated with a 10-fold cross-validation (model was trained 10 times with different independent sets of patients in test, train and validation sets, the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a 200-patient, held-out test set, calibrated on a held-out set of 100 patients. Conclusions: Our results show that machine learning models trained on tabular and graph data can enable objective, personalised and reproducible prediction of relapse and therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer. Keywords: Non-Small-Cell Lung Cancer, Tumor Recurrence Prediction, Machine Learning
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Lung Cancer Concept Annotation from Spanish Clinical Narratives
Authors:
Marjan Najafabadipour,
Juan Manuel Tuñas,
Alejandro Rodríguez-González,
Ernestina Menasalvas
Abstract:
Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patients history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns ar…
▽ More
Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patients history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns are locked within free text (unstructured) portions of clinical documents and consequence in limiting health professionals to extract useful information from them and to finally perform Query and Answering (QA) process in an accurate way. The Information Extraction (IE) process requires Natural Language Processing (NLP) techniques to assign semantics to these patterns. Therefore, in this paper, we analyze the design of annotators for specific lung cancer concepts that can be integrated over Apache Unstructured Information Management Architecture (UIMA) framework. In addition, we explain the details of generation and storage of annotation outcomes.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Understanding diseases as increased heterogeneity: a complex network computational framework
Authors:
Massimiliano Zanin,
Juan Manuel Tuñas,
Ernestina Menasalvas
Abstract:
Due to the complexity of the human body, most diseases present a high inter-personal variability in the way they manifest, i.e. in their phenotype, which has important clinical repercussions - as for instance the difficulty in defining objective diagnostic rules. We here explore the hypothesis that signs and symptoms used to define a disease should be understood in terms of the dispersion (as oppo…
▽ More
Due to the complexity of the human body, most diseases present a high inter-personal variability in the way they manifest, i.e. in their phenotype, which has important clinical repercussions - as for instance the difficulty in defining objective diagnostic rules. We here explore the hypothesis that signs and symptoms used to define a disease should be understood in terms of the dispersion (as opposed to the average) of physical observables. To that end, we propose a computational framework, based on complex networks theory, to map groups of subjects to a network structure, based on their pairwise phenotypical similarity. We demonstrate that the resulting structure can be used to improve the performance of classification algorithms, especially in the case of a limited number of instances, both with synthetic and real data sets. Beyond providing an alternative conceptual understanding of diseases, the proposed framework could be of special relevance in the growing field of personalised, or N-to-1, medicine.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
From the difference of structures to the structure of the difference
Authors:
Massimiliano Zanin,
Ernestina Menasalvas,
Xiaoqian Sun,
Sebastian Wandelt
Abstract:
When dealing with evolving or multi-dimensional complex systems, network theory provides with elegant ways of describing their constituting components, through respectively time-varying and multi-layer complex networks. Nevertheless, the analysis of how these components are related is still an open problem. We here propose a framework for analysing the evolution of a (complex) system, by describin…
▽ More
When dealing with evolving or multi-dimensional complex systems, network theory provides with elegant ways of describing their constituting components, through respectively time-varying and multi-layer complex networks. Nevertheless, the analysis of how these components are related is still an open problem. We here propose a framework for analysing the evolution of a (complex) system, by describing the structure created by the difference between multiple networks by means of the Information Content metric. As opposed to other approaches, as for instance the use of global overlap or entropies, the proposed one allows to understand if the observed changes are due to random noise, or to structural (targeted) modifications. We validate the framework by means of sets of synthetic networks, as well as networks representing real technological, social and biological evolving systems. We further propose a way of reconstructing network correlograms, which allow to convert the system's evolution to the frequency domain.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Combining complex networks and data mining: why and how
Authors:
M. Zanin,
D. Papo,
P. A. Sousa,
E. Menasalvas,
A. Nicchi,
E. Kubik,
S. Boccaletti
Abstract:
The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theor…
▽ More
The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.
△ Less
Submitted 19 May, 2016; v1 submitted 29 April, 2016;
originally announced April 2016.
-
Information content: assessing meso-scale structures in complex networks
Authors:
Massimiliano Zanin,
Pedro A. Sousa,
Ernestina Menasalvas
Abstract:
We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identification of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of…
▽ More
We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identification of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of modularity, bipartite and core-periphery configurations, or motifs. Results corresponding to a large set of real networks are used to validate its ability to detect non-trivial topological patterns.
△ Less
Submitted 17 May, 2014; v1 submitted 21 January, 2014;
originally announced January 2014.
-
Parenclitic networks: a multilayer description of heterogeneous and static data-sets
Authors:
Massimiliano Zanin,
Joaquín Medina Alcazar,
Jesus Vicente Carbajosa,
David Papo,
M. Gomez Paez,
Pedro Sousa,
Ernestina Menasalvas,
Stefano Boccaletti
Abstract:
Describing a complex system is in many ways a problem akin to identifying an object, in that it involves defining boundaries, constituent parts and their relationships by the use of grou** laws. Here we propose a novel method which extends the use of complex networks theory to a generalized class of non-Gestaltic systems, taking the form of collections of isolated, possibly heterogeneous, scalar…
▽ More
Describing a complex system is in many ways a problem akin to identifying an object, in that it involves defining boundaries, constituent parts and their relationships by the use of grou** laws. Here we propose a novel method which extends the use of complex networks theory to a generalized class of non-Gestaltic systems, taking the form of collections of isolated, possibly heterogeneous, scalars, e.g. sets of biomedical tests. The ability of the method to unveil relevant information is illustrated for the case of gene expression in the response to osmotic stress of {\it Arabidopsis thaliana}. The most important genes turn out to be the nodes with highest centrality in appropriately reconstructed networks. The method allows predicting a set of 15 genes whose relationship with such stress was previously unknown in the literature. The validity of such predictions is demonstrated by means of a target experiment, in which the predicted genes are one by one artificially induced, and the growth of the corresponding phenotypes turns out to feature statistically significant differences when compared to that of the wild-type.
△ Less
Submitted 14 August, 2013; v1 submitted 6 April, 2013;
originally announced April 2013.