-
DCAE-SR: Design of a Denoising Convolutional Autoencoder for reconstructing Electrocardiograms signals at Super Resolution
Authors:
Ugo Lomoio,
Pierangelo Veltri,
Pietro Hiram Guzzi,
Pietro Lio'
Abstract:
Electrocardiogram (ECG) signals play a pivotal role in cardiovascular diagnostics, providing essential information on the electrical activity of the heart. However, the inherent noise and limited resolution in ECG recordings can hinder accurate interpretation and diagnosis. In this paper, we propose a novel model for ECG super resolution (SR) that uses a DNAE to enhance temporal and frequency info…
▽ More
Electrocardiogram (ECG) signals play a pivotal role in cardiovascular diagnostics, providing essential information on the electrical activity of the heart. However, the inherent noise and limited resolution in ECG recordings can hinder accurate interpretation and diagnosis. In this paper, we propose a novel model for ECG super resolution (SR) that uses a DNAE to enhance temporal and frequency information inside ECG signals. Our approach addresses the limitations of traditional ECG signal processing techniques. Our model takes in input 5-second length ECG windows sampled at 50 Hz (very low resolution) and it is able to reconstruct a denoised super-resolution signal with an x10 upsampling rate (sampled at 500 Hz). We trained the proposed DCAE-SR on public available myocardial infraction ECG signals. Our method demonstrates superior performance in reconstructing high-resolution ECG signals from very low-resolution signals with a sampling rate of 50 Hz. We compared our results with the current deep-learning literature approaches for ECG super-resolution and some non-deep learning reproducible methods that can perform both super-resolution and denoising. We obtained current state-of-the-art performances in super-resolution of very low resolution ECG signals frequently corrupted by ECG artifacts. We were able to obtain a signal-to-noise ratio of 12.20 dB (outperforms previous 4.68 dB), mean squared error of 0.0044 (outperforms previous 0.0154) and root mean squared error of 4.86% (outperforms previous 12.40%). In conclusion, our DCAE-SR model offers a robust (to artefact presence), versatile and explainable solution to enhance the quality of ECG signals. This advancement holds promise in advancing the field of cardiovascular diagnostics, paving the way for improved patient care and high-quality clinical decisions
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Leveraging graph neural networks for supporting Automatic Triage of Patients
Authors:
Annamaria Defilippo,
Pierangelo Veltri,
Pietro Lio',
Pietro Hiram Guzzi
Abstract:
Patient triage plays a crucial role in emergency departments, ensuring timely and appropriate care based on correctly evaluating the emergency grade of patient conditions.
Triage methods are generally performed by human operator based on her own experience and information that are gathered from the patient management process.
Thus, it is a process that can generate errors in emergency level as…
▽ More
Patient triage plays a crucial role in emergency departments, ensuring timely and appropriate care based on correctly evaluating the emergency grade of patient conditions.
Triage methods are generally performed by human operator based on her own experience and information that are gathered from the patient management process.
Thus, it is a process that can generate errors in emergency level associations. Recently, Traditional triage methods heavily rely on human decisions, which can be subjective and prone to errors.
Recently, a growing interest has been focused on leveraging artificial intelligence (AI) to develop algorithms able to maximize information gathering and minimize errors in patient triage processing.
We define and implement an AI based module to manage patients emergency code assignments in emergency departments. It uses emergency department historical data to train the medical decision process. Data containing relevant patient information, such as vital signs, symptoms, and medical history, are used to accurately classify patients into triage categories. Experimental results demonstrate that the proposed algorithm achieved high accuracy outperforming traditional triage methods. By using the proposed method we claim that healthcare professionals can predict severity index to guide patient management processing and resource allocation.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
A novel Network Science Algorithm for Improving Triage of Patients
Authors:
Pietro Hiram Guzzi,
Annamaria De Filippo,
Pierangelo Veltri
Abstract:
Patient triage plays a crucial role in healthcare, ensuring timely and appropriate care based on the urgency of patient conditions. Traditional triage methods heavily rely on human judgment, which can be subjective and prone to errors. Recently, a growing interest has been in leveraging artificial intelligence (AI) to develop algorithms for triaging patients. This paper presents the development of…
▽ More
Patient triage plays a crucial role in healthcare, ensuring timely and appropriate care based on the urgency of patient conditions. Traditional triage methods heavily rely on human judgment, which can be subjective and prone to errors. Recently, a growing interest has been in leveraging artificial intelligence (AI) to develop algorithms for triaging patients. This paper presents the development of a novel algorithm for triaging patients. It is based on the analysis of patient data to produce decisions regarding their prioritization. The algorithm was trained on a comprehensive data set containing relevant patient information, such as vital signs, symptoms, and medical history. The algorithm was designed to accurately classify patients into triage categories through rigorous preprocessing and feature engineering. Experimental results demonstrate that our algorithm achieved high accuracy and performance, outperforming traditional triage methods. By incorporating computer science into the triage process, healthcare professionals can benefit from improved efficiency, accuracy, and consistency, prioritizing patients effectively and optimizing resource allocation. Although further research is needed to address challenges such as biases in training data and model interpretability, the development of AI-based algorithms for triaging patients shows great promise in enhancing healthcare delivery and patient outcomes.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Current and future directions in network biology
Authors:
Marinka Zitnik,
Michelle M. Li,
Aydin Wells,
Kimberly Glass,
Deisy Morselli Gysi,
Arjun Krishnan,
T. M. Murali,
Predrag Radivojac,
Sushmita Roy,
Anaïs Baudot,
Serdar Bozdag,
Danny Z. Chen,
Lenore Cowen,
Kapil Devkota,
Anthony Gitter,
Sara Gosline,
Pengfei Gu,
Pietro H. Guzzi,
Heng Huang,
Meng Jiang,
Ziynet Nesibe Kesimoglu,
Mehmet Koyuturk,
Jian Ma,
Alexander R. Pico,
Nataša Pržulj
, et al. (12 additional authors not shown)
Abstract:
Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These challenges stem from various fa…
▽ More
Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These challenges stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology and highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on the future directions of network biology. Additionally, we offer insights into scientific communities, educational initiatives, and the importance of fostering diversity within the field. This paper establishes a roadmap for an immediate and long-term vision for network biology.
△ Less
Submitted 11 June, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
MuLaN: a MultiLayer Networks Alignment Algorithm
Authors:
Marianna Milano,
Pietro Cinaglia,
Pietro Hiram Guzzi,
Mario Cannataro
Abstract:
A Multilayer Network (MN) is a system consisting of several topological levels (i.e., layers) representing the interactions between the system's objects and the related interdependency. Therefore, it may be represented as a set of layers that can be assimilated to a set of networks of its own objects, by means inter-layer edges (or inter-edges) linking the nodes of different layers; for instance,…
▽ More
A Multilayer Network (MN) is a system consisting of several topological levels (i.e., layers) representing the interactions between the system's objects and the related interdependency. Therefore, it may be represented as a set of layers that can be assimilated to a set of networks of its own objects, by means inter-layer edges (or inter-edges) linking the nodes of different layers; for instance, a biological MN may allow modeling of inter and intra interactions among diseases, genes, and drugs, only using its own structure. The analysis of MNs may reveal hidden knowledge, as demonstrated by several algorithms for the analysis. Recently, there is a growing interest in comparing two MNs by revealing local regions of similarity, as a counterpart of Network Alignment algorithms (NA) for simple networks. However, classical algorithms for NA such as Local NA (LNA) cannot be applied on multilayer networks, since they are not able to deal with inter-layer edges. Therefore, there is the need for the introduction of novel algorithms. In this paper, we present MuLaN, an algorithm for the local alignment of multilayer networks. We first show as proof of concept the performances of MuLaN on a set of synthetic multilayer networks. Then, we used as a case study a real multilayer network in the biomedical domain. Our results show that MuLaN is able to build high-quality alignments and can extract knowledge about the aligned multilayer networks. MuLaN is available at https://github.com/pietrocinaglia/mulan.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Towards a Recommender System for Profiling Users in a Renewable Energetic Community
Authors:
Pietro Hiram Guzzi,
Francesco Chiodo
Abstract:
Current Energy systems located in almost all nations are going through a radical transformation motivated by technological, environmental and institutional needs. The introduction of novel technologies for energy production and storing, the insurgence of climate change and the attention for the introduction of low impact technologies in some countries are main factors leading this transformation.…
▽ More
Current Energy systems located in almost all nations are going through a radical transformation motivated by technological, environmental and institutional needs. The introduction of novel technologies for energy production and storing, the insurgence of climate change and the attention for the introduction of low impact technologies in some countries are main factors leading this transformation. Here we focus in particular on the introduction of relatively small community energy systems based on solar energy that aim to re-organize local energy systems to integrate distributed energy resources and engage local communities. In each community, there is a set of producers and a set of consumers (and a set of producers/consumers called prosumers). One of the key aspects of the energetic communities is to maximise the energy that is shared within the user. Thus, it is crucial to select the best consumers/prosumers on the basis of their profile of consumption, in order to minimize subsequent management of the energy once the community is built. Here we describe the design of a recommender sysstem that is able to profile users on the basis of their past profile for subsequent admission into the energetic community. Experiments supporting this publication have been carried out under the BDTI (Big Data Test Infrastructure) of the European Union. The contents of this publication are the sole responsibility of authors and do not necessarily reflect the opinion of the European Union.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Beyond COVID-19 Pandemic: Topology-aware optimisation of vaccination strategy for minimising virus spreading
Authors:
Francesco Petrizzelli,
Pietro Hiram Guzzi,
Tommaso Mazza
Abstract:
The mitigation of an infectious disease spreading has recently gained considerable attention from the research community. It may be obtained by adopting sanitary measurements social rules, together with an extensive vaccination campaign. Vaccination is currently the primary way for mitigating the Coronavirus Disease (COVID-19) outbreak without severe lockdown. Its effectiveness also depends on the…
▽ More
The mitigation of an infectious disease spreading has recently gained considerable attention from the research community. It may be obtained by adopting sanitary measurements social rules, together with an extensive vaccination campaign. Vaccination is currently the primary way for mitigating the Coronavirus Disease (COVID-19) outbreak without severe lockdown. Its effectiveness also depends on the number and timeliness of administrations and thus demands strict prioritization criteria. Almost all countries have prioritized similar classes of exposed workers obtaining to maximize the survival of patients and years of life saved. The mitigation of an infectious disease spreading has recently gained considerable attention from the research community. It may be obtained by adopting sanitary measurements, social rules, together with an extensive vaccination campaign. Vaccination is currently the primary way for mitigating the Coronavirus Disease (COVID-19) outbreak without severe lockdown. Its effectiveness also depends on the number and timeliness of administrations and thus demands strict prioritization criteria. Almost all countries have prioritized similar classes of exposed workers: healthcare professionals and the elderly, obtaining to maximize the survival of patients and years of life saved. Nevertheless, the virus is currently spreading at high rates, and any prioritization criterion so far adopted did not account for the structural organization of the contact networks.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Design and Development of PCN-Miner: A tool for the Analysis of Protein Contact Networks
Authors:
Pietro Hiram Guzzi,
Luisa Di Paola,
Alessandro Giuliani,
Pierangelo Veltri
Abstract:
Protein Contact Network (PCN) is a powerful tool for analysing the structure and function of proteins. In particular, PCN has been used for disclosing the molecular features of allosteric regulation through PCN clustering. Such analysis is relevant in many applications, such as the recent study of SARS-CoV-2 Spike Protein. Despite its relevance, methods for the analysis of PCN are spread into a se…
▽ More
Protein Contact Network (PCN) is a powerful tool for analysing the structure and function of proteins. In particular, PCN has been used for disclosing the molecular features of allosteric regulation through PCN clustering. Such analysis is relevant in many applications, such as the recent study of SARS-CoV-2 Spike Protein. Despite its relevance, methods for the analysis of PCN are spread into a set of different libraries and tools. Therefore, the introduction of a tool that incorporates all the function may help researchers. We present PCN-Miner a software tool implemented in the Python programming language able to import protein in the Protein Data Bank format and generate the corresponding protein contact network. Then it offers a set of algorithms for the analysis of PCS that cover a large set of applications: from clustering to embedding and subsequent analysis.
Software is available at \url{https://github.com/hguzzi/ProteinContactNetworks}
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Modeling multi-scale data via a network of networks
Authors:
Shawn Gu,
Meng Jiang,
Pietro Hiram Guzzi,
Tijana Milenkovic
Abstract:
Prediction of node and graph labels are prominent network science tasks. Data analyzed in these tasks are sometimes related: entities represented by nodes in a higher-level (higher-scale) network can themselves be modeled as networks at a lower level. We argue that systems involving such entities should be integrated with a "network of networks" (NoN) representation. Then, we ask whether entity la…
▽ More
Prediction of node and graph labels are prominent network science tasks. Data analyzed in these tasks are sometimes related: entities represented by nodes in a higher-level (higher-scale) network can themselves be modeled as networks at a lower level. We argue that systems involving such entities should be integrated with a "network of networks" (NoN) representation. Then, we ask whether entity label prediction using multi-level NoN data via our proposed approaches is more accurate than using each of single-level node and graph data alone, i.e., than traditional node label prediction on the higher-level network and graph label prediction on the lower-level networks. To obtain data, we develop the first synthetic NoN generator and construct a real biological NoN. We evaluate accuracy of considered approaches when predicting artificial labels from the synthetic NoNs and proteins' functions from the biological NoN. For the synthetic NoNs, our NoN approaches outperform or are as good as node- and network-level ones depending on the NoN properties. For the biological NoN, our NoN approaches outperform the single-level approaches for just under half of the protein functions, and for 30% of the functions, only our NoN approaches make meaningful predictions, while node- and network-level ones achieve random accuracy. So, NoN-based data integration is important.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Using Dual-Network Analyser for extracting communities from Dual Networks
Authors:
Pietro Hiram Guzzi,
Giuseppe Tradigo,
Pierangelo Veltri
Abstract:
The representation of data and its relationships using networks is prevalent in many research fields such as computational biology, medical informatics and social networks. Recently, complex networks models have been introduced to better capture the insights of the modelled scenarios. Among others, dual networks -based models have been introduced, which consist in map** information as pair of ne…
▽ More
The representation of data and its relationships using networks is prevalent in many research fields such as computational biology, medical informatics and social networks. Recently, complex networks models have been introduced to better capture the insights of the modelled scenarios. Among others, dual networks -based models have been introduced, which consist in map** information as pair of networks containing the same nodes but different edges.
We focus on the use of a novel approach to visualise and analyse dual networks. The method uses two algorithms for community discovery, and it is provided as a Python-based tool with a graphical user interface. The tool is able to load dual networks and to extract both the densest connected subgraph as well as the common modular communities. The latter is obtained by using an adapted implementation of the Louvain algorithm.
The proposed algorithm and graphical tool have been tested by using social, biological, and co-authorship networks. Results demonstrate that the proposed approach is efficient and is able to extract meaningful information from dual networks. Finally, as contribution, the proposed graphical user interface can be considered a valuable innovation to the context.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Analyzing Host-Viral Interactome of SARS-CoV-2 for Identifying Vulnerable Host Proteins during COVID-19 Pathogenesis
Authors:
Jayanta Kumar Das,
Swarup Roy,
Pietro Hiram Guzzi
Abstract:
The development of therapeutic targets for COVID-19 treatment is based on the understanding of the molecular mechanism of pathogenesis. The identification of genes and proteins involved in the infection mechanism is the key to shed out light into the complex molecular mechanisms. The combined effort of many laboratories distributed throughout the world has produced the accumulation of both protein…
▽ More
The development of therapeutic targets for COVID-19 treatment is based on the understanding of the molecular mechanism of pathogenesis. The identification of genes and proteins involved in the infection mechanism is the key to shed out light into the complex molecular mechanisms. The combined effort of many laboratories distributed throughout the world has produced the accumulation of both protein and genetic interactions. In this work we integrate these available results and we obtain an host protein-protein interaction network composed by 1432 human proteins. We calculate network centrality measures to identify key proteins. Then we perform functional enrichment of central proteins. We observed that the identified proteins are mostly associated with several crucial pathways, including cellular process, signalling transduction, neurodegenerative disease. Finally, we focused on proteins involved in causing disease in the human respiratory tract. We conclude that COVID19 is a complex disease, and we highlighted many potential therapeutic targets including RBX1, HSPA5, ITCH, RAB7A, RAB5A, RAB8A, PSMC5, CAPZB, CANX, IGF2R, HSPA1A, which are central and also associated with multiple diseases
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
Evaluation of the Topological Agreement of Network Alignments
Authors:
Concettina Guerra,
Pietro Hiram Guzzi
Abstract:
Aligning protein interaction networks (PPI) of two or more organisms consists of finding a map** of the nodes (proteins) of the networks that captures important structural and functional associations (similarity). It is a well studied but difficult problem. It is provably NP-hard in some instances thus computationally very demanding. The problem comes in several versions: global versus local ali…
▽ More
Aligning protein interaction networks (PPI) of two or more organisms consists of finding a map** of the nodes (proteins) of the networks that captures important structural and functional associations (similarity). It is a well studied but difficult problem. It is provably NP-hard in some instances thus computationally very demanding. The problem comes in several versions: global versus local alignment; pairwise versus multiple alignment; one-to-one versus many-to-many alignment. Heuristics to address the various instances of the problem abound and they achieve some degree of success when their performance is measured in terms of node and/or edges conservation. However, as the evolutionary distance between the organisms being considered increases the results tend to degrade. Moreover, poor performance is achieved when the considered networks have remarkably different sizes in the number of nodes and/or edges. Here we address the challenge of analyzing and comparing different approaches to global network alignment, when a one-to-one map** is sought. We consider and propose various measures to evaluate the agreement between alignments obtained by existing approaches. We show that some such measures indicate an agreement that is often about the same than what would be obtained by chance. That tends to occur even when the map**s exhibit a good performance based on standard measures.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Using Network Embeddings for Improving Network Alignment
Authors:
Pietro Hiram Guzzi
Abstract:
Network (or Graph) Alignment Algorithms aims to reveal structural similarities among graphs. In particular Local Network Alignment Algorithms (LNAs) finds local regions of similarity among two or more networks. Such algorithms are in general based on a set of seed nodes that are used to grow an alignment. Almost all LNAs algorithms use as seed nodes a set of vertices based on context information (…
▽ More
Network (or Graph) Alignment Algorithms aims to reveal structural similarities among graphs. In particular Local Network Alignment Algorithms (LNAs) finds local regions of similarity among two or more networks. Such algorithms are in general based on a set of seed nodes that are used to grow an alignment. Almost all LNAs algorithms use as seed nodes a set of vertices based on context information (e.g. a set of biologically related in biological network alignment) and this may cause a bias or a data-circularity problem. More recently, we demonstrated that the use of topological information in the choice of seed nodes may improve the quality of the alignments. We used some common approaches based on global alignment algorithms for capturing topological similarity among nodes. In parallel, it has been demonstrated that the use of network embedding methods (or representation learning), may capture the structural similarity among nodes better than other methods. Therefore we propose to use network embeddings to learn structural similarity among nodes and to use such similarity to improve LNA extendings our previous algorithms. We define a framework for LNA.
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Top-k Connected Overlap** Densest Subgraphs in Dual Networks
Authors:
Riccardo Dondi,
Pietro Hiram Guzzi,
Mohammad Mehdi Hosseinzadeh
Abstract:
Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related…
▽ More
Networks are largely used for modelling and analysing data and relations among them. Recently, it has been shown that the use of a single network may not be the optimal choice, since a single network may misses some aspects. Consequently, it has been proposed to use a pair of networks to better model all the aspects, and the main approach is referred to as dual networks (DNs). DNs are two related graphs (one weighted, the other unweighted) that share the same set of vertices and two different edge sets. In DNs is often interesting to extract common subgraphs among the two networks that are maximally dense in the conceptual network and connected in the physical one. The simplest instance of this problem is finding a common densest connected subgraph (DCS), while we here focus on the detection of the Top-k Densest Connected subgraphs, i.e. a set k subgraphs having the largest density in the conceptual network which are also connected in the physical network. We formalise the problem and then we propose a heuristic to find a solution, since the problem is computationally hard. A set of experiments on synthetic and real networks is also presented to support our approach.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Extracting Dense and Connected Subgraphs in Dual Networks by Network Alignment
Authors:
Pietro Hiram Guzzi,
Emanuel Salerno,
Giuseppe Tradigo,
Pierangelo Veltri
Abstract:
The use of network based approaches to model and analyse large datasets is currently a growing research field. For instance in biology and medicine, networks are used to model interactions among biological molecules as well as relations among patients. Similarly, data coming from social networks can be trivially modelled by using graphs. More recently, the use of dual networks gained the attention…
▽ More
The use of network based approaches to model and analyse large datasets is currently a growing research field. For instance in biology and medicine, networks are used to model interactions among biological molecules as well as relations among patients. Similarly, data coming from social networks can be trivially modelled by using graphs. More recently, the use of dual networks gained the attention of researchers. A dual network model uses a pair of graphs to model a scenario in which one of the two graphs is usually unweighted (a network representing physical associations among nodes) while the other one is edge-weighted (a network representing conceptual associations among nodes). In this paper we focus on the problem of finding the Densest Connected sub-graph (DCS) having the largest density in the conceptual network which is also connected in the physical network. The problem is relevant but also computationally hard, therefore the need for introducing of novel algorithms arises. We formalise the problem and then we map DCS into a graph alignment problem. Then we propose a possible solution. A set of experiments is also presented to support our approach.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
HetNetAligner: Design and Implementation of an algorithm for heterogeneous network alignment on Apache Spark
Authors:
Pietro H Guzzi,
Marianna Milano,
Pierangelo Veltri,
Mario Cannataro
Abstract:
The importance of the use of networks to model and analyse biological data and the interplay of bio-molecules is widely recognised. Consequently, many algorithms for the analysis and the comparison of networks (such as alignment algorithms) have been developed in the past. Recently, many different approaches tried to integrate into a single model the interplay of different molecules, such as genes…
▽ More
The importance of the use of networks to model and analyse biological data and the interplay of bio-molecules is widely recognised. Consequently, many algorithms for the analysis and the comparison of networks (such as alignment algorithms) have been developed in the past. Recently, many different approaches tried to integrate into a single model the interplay of different molecules, such as genes, transcription factors and microRNAs. A possible formalism to model such scenario comes from node coloured networks (or heterogeneous networks) implemented as node/ edge-coloured graphs. Consequently, the need for the introduction of alignment algorithms able to analyse heterogeneous networks arises. To the best of our knowledge, all the existing algorithms are not able to mine heterogeneous networks. We propose a two-step alignment strategy that receives as input two heterogeneous networks (node-coloured graphs) and a similarity function among nodes of two networks extending the previous formulations. We first build a single alignment graph. Then we mine this graph extracting relevant subgraphs. Despite this simple approach, the analysis of such networks relies on graph and subgraph isomorphism and the size of the data is still growing. Therefore the use of high-performance data analytics framework is needed. We here present HetNetAligner a framework built on top of Apache Spark. We also implemented our algorithm, and we tested it on some selected heterogeneous biological networks. Preliminary results confirm that our method may extract relevant knowledge from biological data reducing the computational time.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Learning Weighted Association Rules in Human Phenotype Ontology
Authors:
Pietro Hiram Guzzi,
Giuseppe Agapito,
Marianna Milano,
Mario Cannataro
Abstract:
The Human Phenotype Ontology (HPO) is a structured repository of concepts (HPO Terms) that are associated to one or more diseases. The process of association is referred to as annotation. The relevance and the specificity of both HPO terms and annotations are evaluated by a measure defined as Information Content (IC). The analysis of annotated data is thus an important challenge for bioinformatics…
▽ More
The Human Phenotype Ontology (HPO) is a structured repository of concepts (HPO Terms) that are associated to one or more diseases. The process of association is referred to as annotation. The relevance and the specificity of both HPO terms and annotations are evaluated by a measure defined as Information Content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of Association Rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents HPO-Miner (Human Phenotype Ontology-based Weighted Association Rules) a methodology for extracting Weighted Association Rules. HPO-Miner can extract relevant rules from a biological point of view. A case study on using of HPO-Miner on publicly available HPO annotation datasets is used to demonstrate the effectiveness of our methodology.
△ Less
Submitted 31 December, 2016;
originally announced January 2017.
-
The impact of Gene Ontology evolution on GO-Term Information Content
Authors:
Pietro Hiram Guzzi,
Giuseppe Agapito,
Marianna Milano,
Mario Cannataro
Abstract:
The Gene Ontology (GO) is a major bioinformatics ontology that provides structured controlled vocabularies to classify gene and proteins function and role. The GO and its annotations to gene products are now an integral part of functional analysis. Recently, the evaluation of similarity among gene products starting from their annotations (also referred to as semantic similarities) has become an in…
▽ More
The Gene Ontology (GO) is a major bioinformatics ontology that provides structured controlled vocabularies to classify gene and proteins function and role. The GO and its annotations to gene products are now an integral part of functional analysis. Recently, the evaluation of similarity among gene products starting from their annotations (also referred to as semantic similarities) has become an increasing area in bioinformatics. While many research on updates to the structure of GO and on the annotation corpora have been made, the impact of GO evolution on semantic similarities is quite unobserved. Here we extensively analyze how GO changes that should be carefully considered by all users of semantic similarities. GO changes in particular have a big impact on information content (IC) of GO terms. Since many semantic similarities rely on calculation of IC it is obvious that the study of these changes should be deeply investigated. Here we consider GO versions from 2005 to 2014 and we calculate IC of all GO Terms considering five different formulation. Then we compare these results. Analysis confirm that there exists a statistically significant difference among different calculation on the same version of the ontology (and this is quite obvious) and there exists a statistically difference among the results obtained with different GO version on the same IC formula. Results evidence there exist a remarkable bias due to the GO evolution that has not been considered so far. Possible future works should keep into account this consideration.
△ Less
Submitted 30 December, 2016;
originally announced December 2016.
-
A web-based tool to Analyze Semantic Similarity Networks
Authors:
Mario Cannataro,
Pietro Hiram Guzzi,
Marianna Milano,
Pierangelo Veltri
Abstract:
In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO). The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes ar…
▽ More
In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO). The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. The analysis of SSNs may reveal biologically meaningful knowledge. For these aims, the need for the introduction of tool able to manage and analyze SSN arises. Consequently we developed SSN-Analyzer a web based tool able to build and preprocess SSN. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.
△ Less
Submitted 21 December, 2014;
originally announced December 2014.
-
Thresholding of Semantic Similarity Networks using a Spectral Graph Based Technique
Authors:
Pietro Hiram Guzzi,
Simone Truglia,
Pierangelo Veltri,
Mario Cannataro
Abstract:
Semantic similarity measures (SSMs) refer to a set of algorithms used to quantify the similarity of two or more terms belonging to the same ontology. Ontology terms may be associated to concepts, for instance in computational biology gene and proteins are associated with terms of biological ontologies. Thus, SSMs may be used to quantify the similarity of genes and proteins starting from the compar…
▽ More
Semantic similarity measures (SSMs) refer to a set of algorithms used to quantify the similarity of two or more terms belonging to the same ontology. Ontology terms may be associated to concepts, for instance in computational biology gene and proteins are associated with terms of biological ontologies. Thus, SSMs may be used to quantify the similarity of genes and proteins starting from the comparison of the associated annotations. SSMs have been recently used to compare genes and proteins even on a system level scale. More recently some works have focused on the building and analysis of Semantic Similarity Networks (SSNs) i.e. weighted networks in which nodes represents genes or proteins while weighted edges represent the semantic similarity score among them. SSNs are quasi-complete networks, thus their analysis presents different challenges that should be addressed. For instance, the need for the introduction of reliable thresholds for the elimination of meaningless edges arises. Nevertheless, the use of global thresholding methods may produce the elimination of meaningful nodes, while the use of local thresholds may introduce biases. For these aims, we introduce a novel technique, based on spectral graph considerations and on a mixed global-local focus. The effectiveness of our technique is demonstrated by using markov clustering for the extraction of biological modules. We applied clustering to simplified networks demonstrating a considerable improvements with respect to the original ones.
△ Less
Submitted 21 May, 2013;
originally announced May 2013.