-
MuLaN: a MultiLayer Networks Alignment Algorithm
Authors:
Marianna Milano,
Pietro Cinaglia,
Pietro Hiram Guzzi,
Mario Cannataro
Abstract:
A Multilayer Network (MN) is a system consisting of several topological levels (i.e., layers) representing the interactions between the system's objects and the related interdependency. Therefore, it may be represented as a set of layers that can be assimilated to a set of networks of its own objects, by means inter-layer edges (or inter-edges) linking the nodes of different layers; for instance,…
▽ More
A Multilayer Network (MN) is a system consisting of several topological levels (i.e., layers) representing the interactions between the system's objects and the related interdependency. Therefore, it may be represented as a set of layers that can be assimilated to a set of networks of its own objects, by means inter-layer edges (or inter-edges) linking the nodes of different layers; for instance, a biological MN may allow modeling of inter and intra interactions among diseases, genes, and drugs, only using its own structure. The analysis of MNs may reveal hidden knowledge, as demonstrated by several algorithms for the analysis. Recently, there is a growing interest in comparing two MNs by revealing local regions of similarity, as a counterpart of Network Alignment algorithms (NA) for simple networks. However, classical algorithms for NA such as Local NA (LNA) cannot be applied on multilayer networks, since they are not able to deal with inter-layer edges. Therefore, there is the need for the introduction of novel algorithms. In this paper, we present MuLaN, an algorithm for the local alignment of multilayer networks. We first show as proof of concept the performances of MuLaN on a set of synthetic multilayer networks. Then, we used as a case study a real multilayer network in the biomedical domain. Our results show that MuLaN is able to build high-quality alignments and can extract knowledge about the aligned multilayer networks. MuLaN is available at https://github.com/pietrocinaglia/mulan.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Invited Paper: Initial Steps Toward a Compiler for Distributed Programs
Authors:
Joseph M. Hellerstein,
Shadaj Laddad,
Mae Milano,
Conor Power,
Mingwei Samuel
Abstract:
In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program tra…
▽ More
In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a single-node program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
UNIFY: a Unified Policy Designing Framework for Solving Constrained Optimization Problems with Machine Learning
Authors:
Mattia Silvestri,
Allegra De Filippo,
Michele Lombardi,
Michela Milano
Abstract:
The interplay between Machine Learning (ML) and Constrained Optimization (CO) has recently been the subject of increasing interest, leading to a new and prolific research area covering (e.g.) Decision Focused Learning and Constrained Reinforcement Learning. Such approaches strive to tackle complex decision problems under uncertainty over multiple stages, involving both explicit (cost function, con…
▽ More
The interplay between Machine Learning (ML) and Constrained Optimization (CO) has recently been the subject of increasing interest, leading to a new and prolific research area covering (e.g.) Decision Focused Learning and Constrained Reinforcement Learning. Such approaches strive to tackle complex decision problems under uncertainty over multiple stages, involving both explicit (cost function, constraints) and implicit knowledge (from data), and possibly subject to execution time restrictions. While a good degree of success has been achieved, the existing methods still have limitations in terms of both applicability and effectiveness. For problems in this class, we propose UNIFY, a unified framework to design a solution policy for complex decision-making problems. Our approach relies on a clever decomposition of the policy in two stages, namely an unconstrained ML model and a CO problem, to take advantage of the strength of each approach while compensating for its weaknesses. With a little design effort, UNIFY can generalize several existing approaches, thus extending their applicability. We demonstrate the method effectiveness on two practical problems, namely an Energy Management System and the Set Multi-cover with stochastic coverage requirements. Finally, we highlight some current challenges of our method and future research directions that can benefit from the cross-fertilization of the two fields.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Keep CALM and CRDT On
Authors:
Shadaj Laddad,
Conor Power,
Mae Milano,
Alvin Cheung,
Natacha Crooks,
Joseph M. Hellerstein
Abstract:
Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CR…
▽ More
Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for develo** CRDT data stores that let developers safely and efficiently interact with replicated application state.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Katara: Synthesizing CRDTs with Verified Lifting
Authors:
Shadaj Laddad,
Conor Power,
Mae Milano,
Alvin Cheung,
Joseph M. Hellerstein
Abstract:
Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we pres…
▽ More
Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.
△ Less
Submitted 21 September, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Copiloting Autonomous Multi-Robot Missions: A Game-inspired Supervisory Control Interface
Authors:
Marcel Kaufmann,
Robert Trybula,
Ryan Stonebraker,
Michael Milano,
Gustavo J. Correa,
Tiago S. Vaquero,
Kyohei Otsu,
Ali-akbar Agha-mohammadi,
Giovanni Beltrame
Abstract:
Real-world deployment of new technology and capabilities can be daunting. The recent DARPA Subterranean (SubT) Challenge, for instance, aimed at the advancement of robotic platforms and autonomy capabilities in three one-year development pushes. While multi-agent systems are traditionally deployed in controlled and structured environments that allow for controlled testing (e.g., warehouses), the S…
▽ More
Real-world deployment of new technology and capabilities can be daunting. The recent DARPA Subterranean (SubT) Challenge, for instance, aimed at the advancement of robotic platforms and autonomy capabilities in three one-year development pushes. While multi-agent systems are traditionally deployed in controlled and structured environments that allow for controlled testing (e.g., warehouses), the SubT challenge targeted various types of unknown underground environments that imposed the risk of robot loss in the case of failure. In this work, we introduce a video game-inspired interface, an autonomous mission assistant, and test and deploy these using a heterogeneous multi-agent system in challenging environments. This work leads to improved human-supervisory control for a multi-agent system reducing overhead from application switching, task planning, execution, and verification while increasing available exploration time with this human-autonomy teaming platform.
△ Less
Submitted 13 April, 2022;
originally announced April 2022.
-
The turbulent flow over the BARC rectangular cylinder: a DNS study
Authors:
Alessandro Chiarini,
Maurizio Quadrio Politecnico di Milano
Abstract:
A Direct Numerical Simulation (DNS) of the incompressible flow around a rectangular cylinder with chord-to-thickness ratio 5:1 (also known as the BARC benchmark) is presented. The work replicates the first DNS of this kind recently presented by Cimarelli et al (2018), and intends to contribute to a solid numerical benchmark, albeit at a relatively low value of the Reynolds number. The study differ…
▽ More
A Direct Numerical Simulation (DNS) of the incompressible flow around a rectangular cylinder with chord-to-thickness ratio 5:1 (also known as the BARC benchmark) is presented. The work replicates the first DNS of this kind recently presented by Cimarelli et al (2018), and intends to contribute to a solid numerical benchmark, albeit at a relatively low value of the Reynolds number. The study differentiates from previous work by using an in-house finite-differences solver instead of the finite-volumes toolbox OpenFOAM, and by employing finer spatial discretization and longer temporal average.
The main features of the flow are described, and quantitative differences with the existing results are highlighted. The complete set of terms appearing in the budget equation for the components of the Reynolds stress tensor is provided for the first time. The different regions of the flow where production, redistribution and dissipation of each component take place are identified, and the anisotropic and inhomogeneous nature of the flow is discussed. Such information is valuable for the verification and fine-tuning of turbulence models in this complex separating and reattaching flow.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Deep Learning for Virus-Spreading Forecasting: a Brief Survey
Authors:
Federico Baldo,
Lorenzo Dall'Olio,
Mattia Ceccarelli,
Riccardo Scheda,
Michele Lombardi,
Andrea Borghesi,
Stefano Diciotti,
Michela Milano
Abstract:
The advent of the coronavirus pandemic has sparked the interest in predictive models capable of forecasting virus-spreading, especially for boosting and supporting decision-making processes. In this paper, we will outline the main Deep Learning approaches aimed at predicting the spreading of a disease in space and time. The aim is to show the emerging trends in this area of research and provide a…
▽ More
The advent of the coronavirus pandemic has sparked the interest in predictive models capable of forecasting virus-spreading, especially for boosting and supporting decision-making processes. In this paper, we will outline the main Deep Learning approaches aimed at predicting the spreading of a disease in space and time. The aim is to show the emerging trends in this area of research and provide a general perspective on the possible strategies to approach this problem. In doing so, we will mainly focus on two macro-categories: classical Deep Learning approaches and Hybrid models. Finally, we will discuss the main advantages and disadvantages of different models, and underline the most promising development directions to improve these approaches.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
New Directions in Cloud Programming
Authors:
Alvin Cheung,
Natacha Crooks,
Joseph M. Hellerstein,
Mae Milano
Abstract:
Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semant…
▽ More
Nearly twenty years after the launch of AWS, it remains difficult for most developers to harness the enormous potential of the cloud. In this paper we lay out an agenda for a new generation of cloud programming research aimed at bringing research ideas to programmers in an evolutionary fashion. Key to our approach is a separation of distributed programs into a PACT of four facets: Program semantics, Availablity, Consistency and Targets of optimization. We propose to migrate developers gradually to PACT programming by lifting familiar code into our more declarative level of abstraction. We then propose a multi-stage compiler that emits human-readable code at each stage that can be hand-tuned by developers seeking more control. Our agenda raises numerous research challenges across multiple areas including language design, query optimization, transactions, distributed consistency, compilers and program synthesis.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Improving Deep Learning Models via Constraint-Based Domain Knowledge: a Brief Survey
Authors:
Andrea Borghesi,
Federico Baldo,
Michela Milano
Abstract:
Deep Learning (DL) models proved themselves to perform extremely well on a wide variety of learning tasks, as they can learn useful patterns from large data sets. However, purely data-driven models might struggle when very difficult functions need to be learned or when there is not enough available training data. Fortunately, in many domains prior information can be retrieved and used to boost the…
▽ More
Deep Learning (DL) models proved themselves to perform extremely well on a wide variety of learning tasks, as they can learn useful patterns from large data sets. However, purely data-driven models might struggle when very difficult functions need to be learned or when there is not enough available training data. Fortunately, in many domains prior information can be retrieved and used to boost the performance of DL models. This paper presents a first survey of the approaches devised to integrate domain knowledge, expressed in the form of constraints, in DL learning models to improve their performance, in particular targeting deep neural networks. We identify five (non-mutually exclusive) categories that encompass the main approaches to inject domain knowledge: 1) acting on the features space, 2) modifications to the hypothesis space, 3) data augmentation, 4) regularization schemes, 5) constrained learning.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
An Analysis of Regularized Approaches for Constrained Machine Learning
Authors:
Michele Lombardi,
Federico Baldo,
Andrea Borghesi,
Michela Milano
Abstract:
Regularization-based approaches for injecting constraints in Machine Learning (ML) were introduced to improve a predictive model via expert knowledge. We tackle the issue of finding the right balance between the loss (the accuracy of the learner) and the regularization term (the degree of constraint satisfaction). The key results of this paper is the formal demonstration that this type of approach…
▽ More
Regularization-based approaches for injecting constraints in Machine Learning (ML) were introduced to improve a predictive model via expert knowledge. We tackle the issue of finding the right balance between the loss (the accuracy of the learner) and the regularization term (the degree of constraint satisfaction). The key results of this paper is the formal demonstration that this type of approach cannot guarantee to find all optimal solutions. In particular, in the non-convex case there might be optima for the constrained problem that do not correspond to any multiplier value.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Combining Learning and Optimization for Transprecision Computing
Authors:
Andrea Borghesi,
Giuseppe Tagliavini,
Michele Lombardi,
Luca Benini,
Michela Milano
Abstract:
The growing demands of the worldwide IT infrastructure stress the need for reduced power consumption, which is addressed in so-called transprecision computing by improving energy efficiency at the expense of precision. For example, reducing the number of bits for some floating-point operations leads to higher efficiency, but also to a non-linear decrease of the computation accuracy. Depending on t…
▽ More
The growing demands of the worldwide IT infrastructure stress the need for reduced power consumption, which is addressed in so-called transprecision computing by improving energy efficiency at the expense of precision. For example, reducing the number of bits for some floating-point operations leads to higher efficiency, but also to a non-linear decrease of the computation accuracy. Depending on the application, small errors can be tolerated, thus allowing to fine-tune the precision of the computation. Finding the optimal precision for all variables in respect of an error bound is a complex task, which is tackled in the literature via heuristics. In this paper, we report on a first attempt to address the problem by combining a Mathematical Programming (MP) model and a Machine Learning (ML) model, following the Empirical Model Learning methodology. The ML model learns the relation between variables precision and the output error; this information is then embedded in the MP focused on minimizing the number of bits. An additional refinement phase is then added to improve the quality of the solution. The experimental results demonstrate an average speedup of 6.5\% and a 3\% increase in solution quality compared to the state-of-the-art. In addition, experiments on a hardware platform capable of mixed-precision arithmetic (PULPissimo) show the benefits of the proposed approach, with energy savings of around 40\% compared to fixed-precision.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Teaching the Old Dog New Tricks: Supervised Learning with Constraints
Authors:
Fabrizio Detassis,
Michele Lombardi,
Michela Milano
Abstract:
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems, such as safety and fairness. Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output. Here, we investigate a different, complementary, strategy…
▽ More
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems, such as safety and fairness. Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output. Here, we investigate a different, complementary, strategy based on "teaching" constraint satisfaction to a supervised ML method via the direct use of a state-of-the-art constraint solver: this enables taking advantage of decades of research on constrained optimization with limited effort. In practice, we use a decomposition scheme alternating master steps (in charge of enforcing the constraints) and learner steps (where any supervised ML model and training algorithm can be employed). The process leads to approximate constraint satisfaction in general, and convergence properties are difficult to establish; despite this fact, we found empirically that even a naïve setup of our approach performs well on ML tasks with fairness constraints, and on classical datasets with synthetic constraints.
△ Less
Submitted 26 February, 2021; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Injecting Domain Knowledge in Neural Networks: a Controlled Experiment on a Constrained Problem
Authors:
Mattia Silvestri,
Michele Lombardi,
Michela Milano
Abstract:
Given enough data, Deep Neural Networks (DNNs) are capable of learning complex input-output relations with high accuracy. In several domains, however, data is scarce or expensive to retrieve, while a substantial amount of expert knowledge is available. It seems reasonable that if we can inject this additional information in the DNN, we could ease the learning process. One such case is that of Cons…
▽ More
Given enough data, Deep Neural Networks (DNNs) are capable of learning complex input-output relations with high accuracy. In several domains, however, data is scarce or expensive to retrieve, while a substantial amount of expert knowledge is available. It seems reasonable that if we can inject this additional information in the DNN, we could ease the learning process. One such case is that of Constraint Problems, for which declarative approaches exists and pure ML solutions have obtained mixed success. Using a classical constrained problem as a case study, we perform controlled experiments to probe the impact of progressively adding domain and empirical knowledge in the DNN. Our results are very encouraging, showing that (at least in our setup) embedding domain knowledge at training time can have a considerable effect and that a small amount of empirical knowledge is sufficient to obtain practically useful results.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Injective Domain Knowledge in Neural Networks for Transprecision Computing
Authors:
Andrea Borghesi,
Federico Baldo,
Michele Lombardi,
Michela Milano
Abstract:
Machine Learning (ML) models are very effective in many learning tasks, due to the capability to extract meaningful information from large data sets. Nevertheless, there are learning problems that cannot be easily solved relying on pure data, e.g. scarce data or very complex functions to be approximated. Fortunately, in many contexts domain knowledge is explicitly available and can be used to trai…
▽ More
Machine Learning (ML) models are very effective in many learning tasks, due to the capability to extract meaningful information from large data sets. Nevertheless, there are learning problems that cannot be easily solved relying on pure data, e.g. scarce data or very complex functions to be approximated. Fortunately, in many contexts domain knowledge is explicitly available and can be used to train better ML models. This paper studies the improvements that can be obtained by integrating prior knowledge when dealing with a non-trivial learning task, namely precision tuning of transprecision computing applications. The domain information is injected in the ML models in different ways: I) additional features, II) ad-hoc graph-based network topology, III) regularization schemes. The results clearly show that ML models exploiting problem-specific information outperform the purely data-driven ones, with an average accuracy improvement around 38%.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Anomaly Detection using Autoencoders in High Performance Computing Systems
Authors:
Andrea Borghesi,
Andrea Bartolini,
Michele Lombardi,
Michela Milano,
Luca Benini
Abstract:
Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unh…
▽ More
Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components. The current state of the art for automated anomaly detection employs Machine Learning methods or statistical regression models in a supervised fashion, meaning that the detection tool is trained to distinguish among a fixed set of behaviour classes (healthy and unhealthy states).
We propose a novel approach for anomaly detection in High Performance Computing systems based on a Machine (Deep) Learning technique, namely a type of neural network called autoencoder. The key idea is to train a set of autoencoders to learn the normal (healthy) behaviour of the supercomputer nodes and, after training, use them to identify abnormal conditions. This is different from previous approaches which where based on learning the abnormal condition, for which there are much smaller datasets (since it is very hard to identify them to begin with).
We test our approach on a real supercomputer equipped with a fine-grained, scalable monitoring infrastructure that can provide large amount of data to characterize the system behaviour. The results are extremely promising: after the training phase to learn the normal system behaviour, our method is capable of detecting anomalies that have never been seen before with a very good accuracy (values ranging between 88% and 96%).
△ Less
Submitted 13 November, 2018;
originally announced November 2018.
-
Boosting Combinatorial Problem Modeling with Machine Learning
Authors:
Michele Lombardi,
Michela Milano
Abstract:
In the past few years, the area of Machine Learning (ML) has witnessed tremendous advancements, becoming a pervasive technology in a wide range of applications. One area that can significantly benefit from the use of ML is Combinatorial Optimization. The three pillars of constraint satisfaction and optimization problem solving, i.e., modeling, search, and optimization, can exploit ML techniques to…
▽ More
In the past few years, the area of Machine Learning (ML) has witnessed tremendous advancements, becoming a pervasive technology in a wide range of applications. One area that can significantly benefit from the use of ML is Combinatorial Optimization. The three pillars of constraint satisfaction and optimization problem solving, i.e., modeling, search, and optimization, can exploit ML techniques to boost their accuracy, efficiency and effectiveness. In this survey we focus on the modeling component, whose effectiveness is crucial for solving the problem. The modeling activity has been traditionally shaped by optimization and domain experts, interacting to provide realistic results. Machine Learning techniques can tremendously ease the process, and exploit the available data to either create models or refine expert-designed ones. In this survey we cover approaches that have been recently proposed to enhance the modeling process by learning either single constraints, objective functions, or the whole model. We highlight common themes to multiple approaches and draw connections with related fields of research.
△ Less
Submitted 15 July, 2018;
originally announced July 2018.
-
Pricing Schemes for Energy-Efficient HPC Systems: Design and Exploration
Authors:
Andrea Borghesi,
Andrea Bartolini,
Michela Milano,
Luca Benini
Abstract:
Energy efficiency is of paramount importance for the sustainability of HPC systems. Energy consumption limits the peak performance of supercomputers and accounts for a large share of total cost of ownership. Consequently, system owners and final users have started exploring mechanisms to trade off performance for power consumption, for example through frequency and voltage scaling.
However, only…
▽ More
Energy efficiency is of paramount importance for the sustainability of HPC systems. Energy consumption limits the peak performance of supercomputers and accounts for a large share of total cost of ownership. Consequently, system owners and final users have started exploring mechanisms to trade off performance for power consumption, for example through frequency and voltage scaling.
However, only a limited number of studies have been devoted to explore the economic viability of performance scaling solutions and to devise pricing mechanisms fostering a more energy-conscious usage of resources, without adversely impacting return-of-investment on the HPC facility. We present a parametrized model to analyze the impact of frequency scaling on energy and to assess the potential total cost benefits for the HPC facility and the user. We evaluate four pricing schemes, considering both facility manager and the user perspectives. We then perform a design space exploration considering current and near-future HPC systems and technologies.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
HetNetAligner: Design and Implementation of an algorithm for heterogeneous network alignment on Apache Spark
Authors:
Pietro H Guzzi,
Marianna Milano,
Pierangelo Veltri,
Mario Cannataro
Abstract:
The importance of the use of networks to model and analyse biological data and the interplay of bio-molecules is widely recognised. Consequently, many algorithms for the analysis and the comparison of networks (such as alignment algorithms) have been developed in the past. Recently, many different approaches tried to integrate into a single model the interplay of different molecules, such as genes…
▽ More
The importance of the use of networks to model and analyse biological data and the interplay of bio-molecules is widely recognised. Consequently, many algorithms for the analysis and the comparison of networks (such as alignment algorithms) have been developed in the past. Recently, many different approaches tried to integrate into a single model the interplay of different molecules, such as genes, transcription factors and microRNAs. A possible formalism to model such scenario comes from node coloured networks (or heterogeneous networks) implemented as node/ edge-coloured graphs. Consequently, the need for the introduction of alignment algorithms able to analyse heterogeneous networks arises. To the best of our knowledge, all the existing algorithms are not able to mine heterogeneous networks. We propose a two-step alignment strategy that receives as input two heterogeneous networks (node-coloured graphs) and a similarity function among nodes of two networks extending the previous formulations. We first build a single alignment graph. Then we mine this graph extracting relevant subgraphs. Despite this simple approach, the analysis of such networks relies on graph and subgraph isomorphism and the size of the data is still growing. Therefore the use of high-performance data analytics framework is needed. We here present HetNetAligner a framework built on top of Apache Spark. We also implemented our algorithm, and we tested it on some selected heterogeneous biological networks. Preliminary results confirm that our method may extract relevant knowledge from biological data reducing the computational time.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Learning Weighted Association Rules in Human Phenotype Ontology
Authors:
Pietro Hiram Guzzi,
Giuseppe Agapito,
Marianna Milano,
Mario Cannataro
Abstract:
The Human Phenotype Ontology (HPO) is a structured repository of concepts (HPO Terms) that are associated to one or more diseases. The process of association is referred to as annotation. The relevance and the specificity of both HPO terms and annotations are evaluated by a measure defined as Information Content (IC). The analysis of annotated data is thus an important challenge for bioinformatics…
▽ More
The Human Phenotype Ontology (HPO) is a structured repository of concepts (HPO Terms) that are associated to one or more diseases. The process of association is referred to as annotation. The relevance and the specificity of both HPO terms and annotations are evaluated by a measure defined as Information Content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of Association Rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents HPO-Miner (Human Phenotype Ontology-based Weighted Association Rules) a methodology for extracting Weighted Association Rules. HPO-Miner can extract relevant rules from a biological point of view. A case study on using of HPO-Miner on publicly available HPO annotation datasets is used to demonstrate the effectiveness of our methodology.
△ Less
Submitted 31 December, 2016;
originally announced January 2017.
-
The impact of Gene Ontology evolution on GO-Term Information Content
Authors:
Pietro Hiram Guzzi,
Giuseppe Agapito,
Marianna Milano,
Mario Cannataro
Abstract:
The Gene Ontology (GO) is a major bioinformatics ontology that provides structured controlled vocabularies to classify gene and proteins function and role. The GO and its annotations to gene products are now an integral part of functional analysis. Recently, the evaluation of similarity among gene products starting from their annotations (also referred to as semantic similarities) has become an in…
▽ More
The Gene Ontology (GO) is a major bioinformatics ontology that provides structured controlled vocabularies to classify gene and proteins function and role. The GO and its annotations to gene products are now an integral part of functional analysis. Recently, the evaluation of similarity among gene products starting from their annotations (also referred to as semantic similarities) has become an increasing area in bioinformatics. While many research on updates to the structure of GO and on the annotation corpora have been made, the impact of GO evolution on semantic similarities is quite unobserved. Here we extensively analyze how GO changes that should be carefully considered by all users of semantic similarities. GO changes in particular have a big impact on information content (IC) of GO terms. Since many semantic similarities rely on calculation of IC it is obvious that the study of these changes should be deeply investigated. Here we consider GO versions from 2005 to 2014 and we calculate IC of all GO Terms considering five different formulation. Then we compare these results. Analysis confirm that there exists a statistically significant difference among different calculation on the same version of the ontology (and this is quite obvious) and there exists a statistically difference among the results obtained with different GO version on the same IC formula. Results evidence there exist a remarkable bias due to the GO evolution that has not been considered so far. Possible future works should keep into account this consideration.
△ Less
Submitted 30 December, 2016;
originally announced December 2016.
-
A web-based tool to Analyze Semantic Similarity Networks
Authors:
Mario Cannataro,
Pietro Hiram Guzzi,
Marianna Milano,
Pierangelo Veltri
Abstract:
In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO). The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes ar…
▽ More
In computational biology, biological entities such as genes or proteins are usually annotated with terms extracted from Gene Ontology (GO). The functional similarity among terms of an ontology is evaluated by using Semantic Similarity Measures (SSM). More recently, the extensive application of SSMs yielded to the Semantic Similarity Networks (SSNs). SSNs are edge-weighted graphs where the nodes are concepts (e.g. proteins) and each edge has an associated weight that represents the semantic similarity among related pairs of nodes. The analysis of SSNs may reveal biologically meaningful knowledge. For these aims, the need for the introduction of tool able to manage and analyze SSN arises. Consequently we developed SSN-Analyzer a web based tool able to build and preprocess SSN. As proof of concept we demonstrate that community detection algorithms applied to filtered (thresholded) networks, have better performances in terms of biological relevance of the results, with respect to the use of raw unfiltered networks.
△ Less
Submitted 21 December, 2014;
originally announced December 2014.
-
Multi-Criteria Optimal Planning for Energy Policies in CLP
Authors:
Marco Gavanelli,
Stefano Bragaglia,
Michela Milano,
Federico Chesani,
Elisa Marengo,
Paolo Cagnoli
Abstract:
In the policy making process a number of disparate and diverse issues such as economic development, environmental aspects, as well as the social acceptance of the policy, need to be considered. A single person might not have all the required expertises, and decision support systems featuring optimization components can help to assess policies. Leveraging on previous work on Strategic Environmental…
▽ More
In the policy making process a number of disparate and diverse issues such as economic development, environmental aspects, as well as the social acceptance of the policy, need to be considered. A single person might not have all the required expertises, and decision support systems featuring optimization components can help to assess policies. Leveraging on previous work on Strategic Environmental Assessment, we developed a fully-fledged system that is able to provide optimal plans with respect to a given objective, to perform multi-objective optimization and provide sets of Pareto optimal plans, and to visually compare them. Each plan is environmentally assessed and its footprint is evaluated. The heart of the system is an application developed in a popular Constraint Logic Programming system on the Reals sort. It has been equipped with a web service module that can be queried through standard interfaces, and an intuitive graphic user interface.
△ Less
Submitted 15 May, 2014;
originally announced May 2014.
-
Solving the Satisfiability Problem Through Boolean Networks
Authors:
Andrea Roli,
Michela Milano
Abstract:
In this paper we present a new approach to solve the satisfiability problem (SAT), based on boolean networks (BN). We define a map** between a SAT instance and a BN, and we solve SAT problem by simulating the BN dynamics. We prove that BN fixed points correspond to the SAT solutions. The map** presented allows to develop a new class of algorithms to solve SAT. Moreover, this new approach sugge…
▽ More
In this paper we present a new approach to solve the satisfiability problem (SAT), based on boolean networks (BN). We define a map** between a SAT instance and a BN, and we solve SAT problem by simulating the BN dynamics. We prove that BN fixed points correspond to the SAT solutions. The map** presented allows to develop a new class of algorithms to solve SAT. Moreover, this new approach suggests new ways to combine symbolic and connectionist computation and provides a general framework for local search algorithms.
△ Less
Submitted 31 January, 2011;
originally announced January 2011.
-
Logic-Based Decision Support for Strategic Environmental Assessment
Authors:
Marco Gavanelli,
Fabrizio Riguzzi,
Michela Milano,
Paolo Cagnoli
Abstract:
Strategic Environmental Assessment is a procedure aimed at introducing systematic assessment of the environmental effects of plans and programs. This procedure is based on the so-called coaxial matrices that define dependencies between plan activities (infrastructures, plants, resource extractions, buildings, etc.) and positive and negative environmental impacts, and dependencies between these imp…
▽ More
Strategic Environmental Assessment is a procedure aimed at introducing systematic assessment of the environmental effects of plans and programs. This procedure is based on the so-called coaxial matrices that define dependencies between plan activities (infrastructures, plants, resource extractions, buildings, etc.) and positive and negative environmental impacts, and dependencies between these impacts and environmental receptors. Up to now, this procedure is manually implemented by environmental experts for checking the environmental effects of a given plan or program, but it is never applied during the plan/program construction. A decision support system, based on a clear logic semantics, would be an invaluable tool not only in assessing a single, already defined plan, but also during the planning process in order to produce an optimized, environmentally assessed plan and to study possible alternative scenarios. We propose two logic-based approaches to the problem, one based on Constraint Logic Programming and one on Probabilistic Logic Programming that could be, in the future, conveniently merged to exploit the advantages of both. We test the proposed approaches on a real energy plan and we discuss their limitations and advantages.
△ Less
Submitted 19 July, 2010;
originally announced July 2010.
-
A CHR-based Implementation of Known Arc-Consistency
Authors:
Marco Alberti,
Marco Gavanelli,
Evelina Lamma,
Paola Mello,
Michela Milano
Abstract:
In classical CLP(FD) systems, domains of variables are completely known at the beginning of the constraint propagation process. However, in systems interacting with an external environment, acquiring the whole domains of variables before the beginning of constraint propagation may cause waste of computation time, or even obsolescence of the acquired data at the time of use.
For such cases, the…
▽ More
In classical CLP(FD) systems, domains of variables are completely known at the beginning of the constraint propagation process. However, in systems interacting with an external environment, acquiring the whole domains of variables before the beginning of constraint propagation may cause waste of computation time, or even obsolescence of the acquired data at the time of use.
For such cases, the Interactive Constraint Satisfaction Problem (ICSP) model has been proposed as an extension of the CSP model, to make it possible to start constraint propagation even when domains are not fully known, performing acquisition of domain elements only when necessary, and without the need for restarting the propagation after every acquisition.
In this paper, we show how a solver for the two sorted CLP language, defined in previous work, to express ICSPs, has been implemented in the Constraint Handling Rules (CHR) language, a declarative language particularly suitable for high level implementation of constraint solvers.
△ Less
Submitted 24 August, 2004;
originally announced August 2004.
-
Reduced cost-based ranking for generating promising subproblems
Authors:
M. Milano,
W. J. van Hoeve
Abstract:
In this paper, we propose an effective search procedure that interleaves two steps: subproblem generation and subproblem solution. We mainly focus on the first part. It consists of a variable domain value ranking based on reduced costs. Exploiting the ranking, we generate, in a Limited Discrepancy Search tree, the most promising subproblems first. An interesting result is that reduced costs prov…
▽ More
In this paper, we propose an effective search procedure that interleaves two steps: subproblem generation and subproblem solution. We mainly focus on the first part. It consists of a variable domain value ranking based on reduced costs. Exploiting the ranking, we generate, in a Limited Discrepancy Search tree, the most promising subproblems first. An interesting result is that reduced costs provide a very precise ranking that allows to almost always find the optimal solution in the first generated subproblem, even if its dimension is significantly smaller than that of the original problem. Concerning the proof of optimality, we exploit a way to increase the lower bound for subproblems at higher discrepancies. We show experimental results on the TSP and its time constrained variant to show the effectiveness of the proposed approach, but the technique could be generalized for other problems.
△ Less
Submitted 16 July, 2004;
originally announced July 2004.
-
Postponing Branching Decisions
Authors:
Willem Jan van Hoeve,
Michela Milano
Abstract:
Solution techniques for Constraint Satisfaction and Optimisation Problems often make use of backtrack search methods, exploiting variable and value ordering heuristics. In this paper, we propose and analyse a very simple method to apply in case the value ordering heuristic produces ties: postponing the branching decision. To this end, we group together values in a tie, branch on this sub-domain,…
▽ More
Solution techniques for Constraint Satisfaction and Optimisation Problems often make use of backtrack search methods, exploiting variable and value ordering heuristics. In this paper, we propose and analyse a very simple method to apply in case the value ordering heuristic produces ties: postponing the branching decision. To this end, we group together values in a tie, branch on this sub-domain, and defer the decision among them to lower levels of the search tree. We show theoretically and experimentally that this simple modification can dramatically improve the efficiency of the search strategy. Although in practise similar methods may have been applied already, to our knowledge, no empirical or theoretical study has been proposed in the literature to identify when and to what extent this strategy should be used.
△ Less
Submitted 16 July, 2004;
originally announced July 2004.
-
Decomposition Based Search - A theoretical and experimental evaluation
Authors:
W. J. van Hoeve,
M. Milano
Abstract:
In this paper we present and evaluate a search strategy called Decomposition Based Search (DBS) which is based on two steps: subproblem generation and subproblem solution. The generation of subproblems is done through value ranking and domain splitting. Subdomains are explored so as to generate, according to the heuristic chosen, promising subproblems first.
We show that two well known search…
▽ More
In this paper we present and evaluate a search strategy called Decomposition Based Search (DBS) which is based on two steps: subproblem generation and subproblem solution. The generation of subproblems is done through value ranking and domain splitting. Subdomains are explored so as to generate, according to the heuristic chosen, promising subproblems first.
We show that two well known search strategies, Limited Discrepancy Search (LDS) and Iterative Broadening (IB), can be seen as special cases of DBS. First we present a tuning of DBS that visits the same search nodes as IB, but avoids restarts. Then we compare both theoretically and computationally DBS and LDS using the same heuristic. We prove that DBS has a higher probability of being successful than LDS on a comparable number of nodes, under realistic assumptions. Experiments on a constraint satisfaction problem and an optimization problem show that DBS is indeed very effective if compared to LDS.
△ Less
Submitted 16 July, 2004;
originally announced July 2004.