-
Learned Clause Minimization in Parallel SAT Solvers
Authors:
Marc Hartung,
Florian Schintke
Abstract:
Learned clauses minimization (LCM) let to performance improvements of modern SAT solvers especially in solving hard SAT instances. Despite the success of LCM approaches in sequential solvers, they are not widely incorporated in parallel SAT solvers. In this paper we explore the potential of LCM for parallel SAT solvers by defining multiple LCM approaches based on clause vivification, comparing the…
▽ More
Learned clauses minimization (LCM) let to performance improvements of modern SAT solvers especially in solving hard SAT instances. Despite the success of LCM approaches in sequential solvers, they are not widely incorporated in parallel SAT solvers. In this paper we explore the potential of LCM for parallel SAT solvers by defining multiple LCM approaches based on clause vivification, comparing their runtime in different SAT solvers and discussing reasons for performance gains and losses. Results show that LCM only boosts performance of parallel SAT solvers on a fraction of SAT instances. More commonly applying LCM decreases performance. Only certain LCM approaches are able to improve the overall performance of parallel SAT solvers.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Predicting Disease-Gene Associations using Cross-Document Graph-based Features
Authors:
Hendrik ter Horst,
Matthias Hartung,
Roman Klinger,
Matthias Zwick,
Philipp Cimiano
Abstract:
In the context of personalized medicine, text mining methods pose an interesting option for identifying disease-gene associations, as they can be used to generate novel links between diseases and genes which may complement knowledge from structured databases. The most straightforward approach to extract such links from text is to rely on a simple assumption postulating an association between all g…
▽ More
In the context of personalized medicine, text mining methods pose an interesting option for identifying disease-gene associations, as they can be used to generate novel links between diseases and genes which may complement knowledge from structured databases. The most straightforward approach to extract such links from text is to rely on a simple assumption postulating an association between all genes and diseases that co-occur within the same document. However, this approach (i) tends to yield a number of spurious associations, (ii) does not capture different relevant types of associations, and (iii) is incapable of aggregating knowledge that is spread across documents. Thus, we propose an approach in which disease-gene co-occurrences and gene-gene interactions are represented in an RDF graph. A machine learning-based classifier is trained that incorporates features extracted from the graph to separate disease-gene pairs into valid disease-gene associations and spurious ones. On the manually curated Genetic Testing Registry, our approach yields a 30 points increase in F1 score over a plain co-occurrence baseline.
△ Less
Submitted 26 September, 2017;
originally announced September 2017.
-
How do Ontology Map**s Change in the Life Sciences?
Authors:
Anika Gross,
Michael Hartung,
Andreas Thor,
Erhard Rahm
Abstract:
Map**s between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontologies also require the adaptation of ontology map**s. So far the evolution of ontology map**s has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze how map**s between popular life science ontologies e…
▽ More
Map**s between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontologies also require the adaptation of ontology map**s. So far the evolution of ontology map**s has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze how map**s between popular life science ontologies evolve for different match algorithms. We also evaluate which semantic ontology changes primarily affect the map**s. We further investigate alternatives to predict or estimate the degree of future map** changes based on previous ontology and map** transitions.
△ Less
Submitted 12 April, 2012;
originally announced April 2012.
-
Rule-based Generation of Diff Evolution Map**s between Ontology Versions
Authors:
Michael Hartung,
Anika Groß,
Erhard Rahm
Abstract:
Ontologies such as taxonomies, product catalogs or web directories are heavily used and hence evolve frequently to meet new requirements or to better reflect the current instance data of a domain. To effectively manage the evolution of ontologies it is essential to identify the difference (Diff) between two ontology versions. We propose a novel approach to determine an expressive and invertible di…
▽ More
Ontologies such as taxonomies, product catalogs or web directories are heavily used and hence evolve frequently to meet new requirements or to better reflect the current instance data of a domain. To effectively manage the evolution of ontologies it is essential to identify the difference (Diff) between two ontology versions. We propose a novel approach to determine an expressive and invertible diff evolution map** between given versions of an ontology. Our approach utilizes the result of a match operation to determine an evolution map** consisting of a set of basic change operations (insert/update/delete). To semantically enrich the evolution map** we adopt a rule-based approach to transform the basic change operations into a smaller set of more complex change operations, such as merge, split, or changes of entire subgraphs. The proposed algorithm is customizable in different ways to meet the requirements of diverse ontologies and application scenarios. We evaluate the proposed approach by determining and analyzing evolution map**s for real-world life science ontologies and web directories.
△ Less
Submitted 1 October, 2010;
originally announced October 2010.
-
Data Partitioning for Parallel Entity Matching
Authors:
Toralf Kirsten,
Lars Kolb,
Michael Hartung,
Anika Groß,
Hanna Köpcke,
Erhard Rahm
Abstract:
Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose different strategies to partition the input data and generate multiple match tasks that can be independently executed. One of our strategies supports both, bloc…
▽ More
Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose different strategies to partition the input data and generate multiple match tasks that can be independently executed. One of our strategies supports both, blocking to reduce the search space for matching and parallel matching to improve efficiency. Special attention is given to the number and size of data partitions as they impact the overall communication overhead and memory requirements of individual match tasks. We have developed a service-based distributed infrastructure for the parallel execution of match workflows. We evaluate our approach in detail for different match strategies for matching real-world product data of different web shops. We also consider caching of in-put entities and affinity-based scheduling of match tasks.
△ Less
Submitted 28 June, 2010;
originally announced June 2010.