Search | arXiv e-print repository

doi 10.1109/CBMS.2019.00032

iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine

Authors: Anastasia Krithara, Fotis Aisopos, Vassiliki Rentoumi, Anastasios Nentidis, Konstantinos Bougatiotis, Maria-Esther Vidal, Ernestina Menasalvas, Alejandro Rodriguez-Gonzalez, Eleftherios G. Samaras, Peter Garrard, Maria Torrente, Mariano Provencio Pulla, Nikos Dimakopoulos, Rui Mauricio, Jordi Rambla De Argila, Gian Gaetano Tartaglia, George Paliouras

Abstract: The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable info… ▽ More The vision of IASIS project is to turn the wave of big biomedical data heading our way into actionable knowledge for decision makers. This is achieved by integrating data from disparate sources, including genomics, electronic health records and bibliography, and applying advanced analytics methods to discover useful patterns. The goal is to turn large amounts of available data into actionable information to authorities for planning public health activities and policies. The integration and analysis of these heterogeneous sources of information will enable the best decisions to be made, allowing for diagnosis and treatment to be personalised to each individual. The project offers a common representation schema for the heterogeneous data sources. The iASiS infrastructure is able to convert clinical notes into usable data, combine them with genomic data, related bibliography, image data and more, and create a global knowledge base. This facilitates the use of intelligent methods in order to discover useful patterns across different resources. Using semantic integration of data gives the opportunity to generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data, thus providing more insights and opportunities. Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 6 pages, 2 figures, accepted at 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)

Journal ref: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), Cordoba, Spain, 2019, pp. 106-111

arXiv:2211.09856 [pdf, other]

Machine Learning-Assisted Recurrence Prediction for Early-Stage Non-Small-Cell Lung Cancer Patients

Authors: Adrianna Janik, Maria Torrente, Luca Costabello, Virginia Calvo, Brian Walsh, Carlos Camps, Sameh K. Mohamed, Ana L. Ortega, Vít Nováček, Bartomeu Massutí, Pasquale Minervini, M. Rosario Garcia Campelo, Edel del Barco, Joaquim Bosch-Barrera, Ernestina Menasalvas, Mohan Timilsina, Mariano Provencio

Abstract: Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients? Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from t… ▽ More Background: Stratifying cancer patients according to risk of relapse can personalize their care. In this work, we provide an answer to the following research question: How to utilize machine learning to estimate probability of relapse in early-stage non-small-cell lung cancer patients? Methods: For predicting relapse in 1,387 early-stage (I-II), non-small-cell lung cancer (NSCLC) patients from the Spanish Lung Cancer Group data (65.7 average age, 24.8% females, 75.2% males) we train tabular and graph machine learning models. We generate automatic explanations for the predictions of such models. For models trained on tabular data, we adopt SHAP local explanations to gauge how each patient feature contributes to the predicted outcome. We explain graph machine learning predictions with an example-based method that highlights influential past patients. Results: Machine learning models trained on tabular data exhibit a 76% accuracy for the Random Forest model at predicting relapse evaluated with a 10-fold cross-validation (model was trained 10 times with different independent sets of patients in test, train and validation sets, the reported metrics are averaged over these 10 test sets). Graph machine learning reaches 68% accuracy over a 200-patient, held-out test set, calibrated on a held-out set of 100 patients. Conclusions: Our results show that machine learning models trained on tabular and graph data can enable objective, personalised and reproducible prediction of relapse and therefore, disease outcome in patients with early-stage NSCLC. With further prospective and multisite validation, and additional radiological and molecular data, this prognostic model could potentially serve as a predictive decision support tool for deciding the use of adjuvant treatments in early-stage lung cancer. Keywords: Non-Small-Cell Lung Cancer, Tumor Recurrence Prediction, Machine Learning △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:1809.06639 [pdf]

Lung Cancer Concept Annotation from Spanish Clinical Narratives

Authors: Marjan Najafabadipour, Juan Manuel Tuñas, Alejandro Rodríguez-González, Ernestina Menasalvas

Abstract: Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patients history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns ar… ▽ More Recent rapid increase in the generation of clinical data and rapid development of computational science make us able to extract new insights from massive datasets in healthcare industry. Oncological clinical notes are creating rich databases for documenting patients history and they potentially contain lots of patterns that could help in better management of the disease. However, these patterns are locked within free text (unstructured) portions of clinical documents and consequence in limiting health professionals to extract useful information from them and to finally perform Query and Answering (QA) process in an accurate way. The Information Extraction (IE) process requires Natural Language Processing (NLP) techniques to assign semantics to these patterns. Therefore, in this paper, we analyze the design of annotators for specific lung cancer concepts that can be integrated over Apache Unstructured Information Management Architecture (UIMA) framework. In addition, we explain the details of generation and storage of annotation outcomes. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: 10 pages, 6 figures

Journal ref: Data Integration in the Life Sciences (DILS 2018)

arXiv:1806.01367 [pdf, other]

Understanding diseases as increased heterogeneity: a complex network computational framework

Authors: Massimiliano Zanin, Juan Manuel Tuñas, Ernestina Menasalvas

Abstract: Due to the complexity of the human body, most diseases present a high inter-personal variability in the way they manifest, i.e. in their phenotype, which has important clinical repercussions - as for instance the difficulty in defining objective diagnostic rules. We here explore the hypothesis that signs and symptoms used to define a disease should be understood in terms of the dispersion (as oppo… ▽ More Due to the complexity of the human body, most diseases present a high inter-personal variability in the way they manifest, i.e. in their phenotype, which has important clinical repercussions - as for instance the difficulty in defining objective diagnostic rules. We here explore the hypothesis that signs and symptoms used to define a disease should be understood in terms of the dispersion (as opposed to the average) of physical observables. To that end, we propose a computational framework, based on complex networks theory, to map groups of subjects to a network structure, based on their pairwise phenotypical similarity. We demonstrate that the resulting structure can be used to improve the performance of classification algorithms, especially in the case of a limited number of instances, both with synthetic and real data sets. Beyond providing an alternative conceptual understanding of diseases, the proposed framework could be of special relevance in the growing field of personalised, or N-to-1, medicine. △ Less

Submitted 1 June, 2018; originally announced June 2018.

Comments: 4 figures, 2 tables, plus SI

arXiv:1802.03966 [pdf, other]

From the difference of structures to the structure of the difference

Authors: Massimiliano Zanin, Ernestina Menasalvas, Xiaoqian Sun, Sebastian Wandelt

Abstract: When dealing with evolving or multi-dimensional complex systems, network theory provides with elegant ways of describing their constituting components, through respectively time-varying and multi-layer complex networks. Nevertheless, the analysis of how these components are related is still an open problem. We here propose a framework for analysing the evolution of a (complex) system, by describin… ▽ More When dealing with evolving or multi-dimensional complex systems, network theory provides with elegant ways of describing their constituting components, through respectively time-varying and multi-layer complex networks. Nevertheless, the analysis of how these components are related is still an open problem. We here propose a framework for analysing the evolution of a (complex) system, by describing the structure created by the difference between multiple networks by means of the Information Content metric. As opposed to other approaches, as for instance the use of global overlap or entropies, the proposed one allows to understand if the observed changes are due to random noise, or to structural (targeted) modifications. We validate the framework by means of sets of synthetic networks, as well as networks representing real technological, social and biological evolving systems. We further propose a way of reconstructing network correlograms, which allow to convert the system's evolution to the frequency domain. △ Less

Submitted 12 February, 2018; originally announced February 2018.

Comments: 21 pages, 7 figures

arXiv:1604.08816 [pdf, other]

doi 10.1016/j.physrep.2016.04.005

Combining complex networks and data mining: why and how

Authors: M. Zanin, D. Papo, P. A. Sousa, E. Menasalvas, A. Nicchi, E. Kubik, S. Boccaletti

Abstract: The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theor… ▽ More The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed. △ Less

Submitted 19 May, 2016; v1 submitted 29 April, 2016; originally announced April 2016.

Comments: 58 pages, 19 figures

MSC Class: 05C82; 62-07; 92C42

arXiv:1401.5247 [pdf, ps, other]

doi 10.1209/0295-5075/106/30001

Information content: assessing meso-scale structures in complex networks

Authors: Massimiliano Zanin, Pedro A. Sousa, Ernestina Menasalvas

Abstract: We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identification of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of… ▽ More We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identification of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of modularity, bipartite and core-periphery configurations, or motifs. Results corresponding to a large set of real networks are used to validate its ability to detect non-trivial topological patterns. △ Less

Submitted 17 May, 2014; v1 submitted 21 January, 2014; originally announced January 2014.

Comments: Published as: M. Zanin, P. A. Sousa and E. Menasalvas, Information content: assessing meso-scale structures in complex networks EPL 106 (3), (2014) 30001

Journal ref: EPL 106 (3), (2014) 30001

arXiv:1304.1896 [pdf, other]

Parenclitic networks: a multilayer description of heterogeneous and static data-sets

Authors: Massimiliano Zanin, Joaquín Medina Alcazar, Jesus Vicente Carbajosa, David Papo, M. Gomez Paez, Pedro Sousa, Ernestina Menasalvas, Stefano Boccaletti

Abstract: Describing a complex system is in many ways a problem akin to identifying an object, in that it involves defining boundaries, constituent parts and their relationships by the use of grou** laws. Here we propose a novel method which extends the use of complex networks theory to a generalized class of non-Gestaltic systems, taking the form of collections of isolated, possibly heterogeneous, scalar… ▽ More Describing a complex system is in many ways a problem akin to identifying an object, in that it involves defining boundaries, constituent parts and their relationships by the use of grou** laws. Here we propose a novel method which extends the use of complex networks theory to a generalized class of non-Gestaltic systems, taking the form of collections of isolated, possibly heterogeneous, scalars, e.g. sets of biomedical tests. The ability of the method to unveil relevant information is illustrated for the case of gene expression in the response to osmotic stress of {\it Arabidopsis thaliana}. The most important genes turn out to be the nodes with highest centrality in appropriately reconstructed networks. The method allows predicting a set of 15 genes whose relationship with such stress was previously unknown in the literature. The validity of such predictions is demonstrated by means of a target experiment, in which the predicted genes are one by one artificially induced, and the growth of the corresponding phenotypes turns out to feature statistically significant differences when compared to that of the wild-type. △ Less

Submitted 14 August, 2013; v1 submitted 6 April, 2013; originally announced April 2013.

Comments: 5 pages, 4 figures

Showing 1–8 of 8 results for author: Menasalvas, E