Showing 1–2 of 2 results for author: Petermann, A

Search v0.5.6 released 2020-02-24

arXiv:1703.01910 [pdf, ps, other]

cs.DB

DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems

Authors: André Petermann, Martin Junghanns, Erhard Rahm

Abstract: Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that utilizes the features provided by distributed in-memory dataflow… ▽ More Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that utilizes the features provided by distributed in-memory dataflow systems such as Apache Spark or Apache Flink. It determines the complete set of frequent subgraphs from arbitrary string-labeled directed multigraphs as they occur in social, business and knowledge networks. DIMSpan is optimized to runtime and minimal network traffic but memory-aware. An extensive performance evaluation on large graph collections shows the scalability of DIMSpan and the effectiveness of its pruning and optimization techniques. △ Less

Submitted 6 March, 2017; originally announced March 2017.
arXiv:1506.00548 [pdf, ps, other]

cs.DB

GRADOOP: Scalable Graph Data Management and Analytics with Hadoop

Authors: Martin Junghanns, André Petermann, Kevin Gómez, Erhard Rahm

Abstract: Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore develo** a new end-to-end approach for graph data management and analysis ba… ▽ More Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore develo** a new end-to-end approach for graph data management and analysis based on the Hadoop ecosystem, called Gradoop (Graph analytics on Hadoop). Gradoop is designed around the so-called Extended Property Graph Data Model (EPGM) supporting semantically rich, schema-free graph data within many distinct graphs. A set of high-level operators is provided for analyzing both single graphs and collections of graphs. Based on these operators, we propose a domain-specific language to define analytical workflows. The Gradoop graph store is currently utilizing HBase for distributed storage of graph data in Hadoop clusters. An initial version of Gradoop has been used to analyze graph data for business intelligence and social network analysis. △ Less

Submitted 2 June, 2015; v1 submitted 1 June, 2015; originally announced June 2015.

Comments: Technical Report

Search v0.5.6 released 2020-02-24