Showing 1–2 of 2 results for author: Petermann, A
-
DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems
Authors:
André Petermann,
Martin Junghanns,
Erhard Rahm
Abstract:
Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that utilizes the features provided by distributed in-memory dataflow…
▽ More
Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that utilizes the features provided by distributed in-memory dataflow systems such as Apache Spark or Apache Flink. It determines the complete set of frequent subgraphs from arbitrary string-labeled directed multigraphs as they occur in social, business and knowledge networks. DIMSpan is optimized to runtime and minimal network traffic but memory-aware. An extensive performance evaluation on large graph collections shows the scalability of DIMSpan and the effectiveness of its pruning and optimization techniques.
△ Less
Submitted 6 March, 2017;
originally announced March 2017.
-
GRADOOP: Scalable Graph Data Management and Analytics with Hadoop
Authors:
Martin Junghanns,
André Petermann,
Kevin Gómez,
Erhard Rahm
Abstract:
Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore develo** a new end-to-end approach for graph data management and analysis ba…
▽ More
Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore develo** a new end-to-end approach for graph data management and analysis based on the Hadoop ecosystem, called Gradoop (Graph analytics on Hadoop). Gradoop is designed around the so-called Extended Property Graph Data Model (EPGM) supporting semantically rich, schema-free graph data within many distinct graphs. A set of high-level operators is provided for analyzing both single graphs and collections of graphs. Based on these operators, we propose a domain-specific language to define analytical workflows. The Gradoop graph store is currently utilizing HBase for distributed storage of graph data in Hadoop clusters. An initial version of Gradoop has been used to analyze graph data for business intelligence and social network analysis.
△ Less
Submitted 2 June, 2015; v1 submitted 1 June, 2015;
originally announced June 2015.