-
Path-based Algebraic Foundations of Graph Query Languages
Authors:
Renzo Angles,
Angela Bonifati,
Roberto García,
Domagoj Vrgoč
Abstract:
Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data model and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, th…
▽ More
Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data model and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, there exists a panoply of concrete graph query languages in commercial and open-source graph databases, each of which exhibits different features and modes. In this paper, we tackle the heterogeneity problem of graph query languages by laying the foundations of a unifying path-oriented algebraic framework. Such a theoretical framework is currently missing in the graph databases landscape, thus impeding a lingua franca in which different graph query language implementations can be expressed and cross-compared. Our framework gives a blueprint for correct implementation of graph queries of different expressiveness. It allows to overcome the boundaries of current versions of standard query languages, thus paving the way to future extensions including query composability. It also allows, when the path-based semantics is stripped off, to express classical Codd's relational algebra enhanced with a recursive operator, thus proving its utility for a wide range of queries in database management systems.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
SparqLog: A System for Efficient Evaluation of SPARQL 1.1 Queries via Datalog [Experiment, Analysis and Benchmark]
Authors:
Renzo Angles,
Georg Gottlob,
Aleksandar Pavlovic,
Reinhard Pichler,
Emanuel Sallinger
Abstract:
Over the past decade, Knowledge Graphs have received enormous interest both from industry and from academia. Research in this area has been driven, above all, by the Database (DB) community and the Semantic Web (SW) community. However, there still remains a certain divide between approaches coming from these two communities. For instance, while languages such as SQL or Datalog are widely used in t…
▽ More
Over the past decade, Knowledge Graphs have received enormous interest both from industry and from academia. Research in this area has been driven, above all, by the Database (DB) community and the Semantic Web (SW) community. However, there still remains a certain divide between approaches coming from these two communities. For instance, while languages such as SQL or Datalog are widely used in the DB area, a different set of languages such as SPARQL and OWL is used in the SW area. Interoperability between such technologies is still a challenge. The goal of this work is to present a uniform and consistent framework meeting important requirements from both, the SW and DB field.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
PG-Schema: Schemas for Property Graphs
Authors:
Renzo Angles,
Angela Bonifati,
Stefania Dumbrava,
George Fletcher,
Alastair Green,
Jan Hidders,
Bei Li,
Leonid Libkin,
Victor Marsault,
Wim Martens,
Filip Murlak,
Stefan Plantikow,
Ognjen Savković,
Michael Schmidt,
Juan Sequeda,
Sławek Staworko,
Dominik Tomaszuk,
Hannes Voigt,
Domagoj Vrgoč,
Mingxi Wu,
Dušan Živković
Abstract:
Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL…
▽ More
Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Types with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.
△ Less
Submitted 8 July, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
MillenniumDB: A Persistent, Open-Source, Graph Database
Authors:
Domagoj Vrgoc,
Carlos Rojas,
Renzo Angles,
Marcelo Arenas,
Diego Arroyuelo,
Carlos Buil Aranda,
Aidan Hogan,
Gonzalo Navarro,
Cristian Riveros,
Juan Romero
Abstract:
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data manage…
▽ More
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
The Future is Big Graphs! A Community View on Graph Processing Systems
Authors:
Sherif Sakr,
Angela Bonifati,
Hannes Voigt,
Alexandru Iosup,
Khaled Ammar,
Renzo Angles,
Walid Aref,
Marcelo Arenas,
Maciej Besta,
Peter A. Boncz,
Khuzaima Daudjee,
Emanuele Della Valle,
Stefania Dumbrava,
Olaf Hartig,
Bernhard Haslhofer,
Tim Hegeman,
Jan Hidders,
Katja Hose,
Adriana Iamnitchi,
Vasiliki Kalavri,
Hugo Kapp,
Wim Martens,
M. Tamer Özsu,
Eric Peukert,
Stefan Plantikow
, et al. (16 additional authors not shown)
Abstract:
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t…
▽ More
Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed?
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
The LDBC Social Network Benchmark
Authors:
Renzo Angles,
János Benjamin Antal,
Alex Averbuch,
Altan Birler,
Peter Boncz,
Márton Búr,
Orri Erling,
Andrey Gubichev,
Vlad Haprian,
Moritz Kaufmann,
Josep Lluís Larriba Pey,
Norbert Martínez,
József Marton,
Marcus Paradies,
Minh-Duc Pham,
Arnau Prat-Pérez,
David Püroja,
Mirko Spasić,
Benjamin A. Steer,
Dávid Szakállas,
Gábor Szárnyas,
Jack Waudby,
Mingxi Wu,
Yuchen Zhang
Abstract:
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (int…
▽ More
The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (interactive transactional queries) and the Business Intelligence workload (analytical queries). This document contains the definition of both workloads. This includes a detailed explanation of the data used in the LDBC SNB, a detailed description for all queries, and instructions on how to generate the data and run the benchmark with the provided software.
△ Less
Submitted 14 January, 2024; v1 submitted 7 January, 2020;
originally announced January 2020.
-
Directly Map** RDF Databases to Property Graph Databases
Authors:
Renzo Angles,
Harsh Thakkar,
Dominik Tomaszuk
Abstract:
RDF triplestores and property graph databases are two approaches for data management which are based on modeling, storing, and querying graph-like data. In spite of such common principles, they present special features that complicate the task of database interoperability. While there exist some methods to transform RDF graphs into property graphs, and vice versa, they lack compatibility and a sol…
▽ More
RDF triplestores and property graph databases are two approaches for data management which are based on modeling, storing, and querying graph-like data. In spite of such common principles, they present special features that complicate the task of database interoperability. While there exist some methods to transform RDF graphs into property graphs, and vice versa, they lack compatibility and a solid formal foundation. This paper presents three direct map**s (schema-dependent and schema-independent) for transforming an RDF database into a property graph database, including data and schema. We show that two of the proposed map**s satisfy the properties of semantics preservation and information preservation. The existence of both map**s allows us to conclude that the property graph data model subsumes the information capacity of the RDF data model.
△ Less
Submitted 3 June, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
The Problem of Correlation and Substitution in SPARQL -- Extended Version
Authors:
Daniel Hernández,
Claudio Gutierrez,
Renzo Angles
Abstract:
Implementations of a standard language are expected to give same outputs to identical queries. In this paper we study why different implementations of SPARQL (Fuseki, Virtuoso, Blazegraph and rdf4j) behave differently when evaluating queries with correlated variables. We show that at the core of this problem lies the historically troubling notion of logical substitution. We present a formal framew…
▽ More
Implementations of a standard language are expected to give same outputs to identical queries. In this paper we study why different implementations of SPARQL (Fuseki, Virtuoso, Blazegraph and rdf4j) behave differently when evaluating queries with correlated variables. We show that at the core of this problem lies the historically troubling notion of logical substitution. We present a formal framework to study this issue based on Datalog that besides clarifying the problem, gives a solid base to define and implement nesting.
△ Less
Submitted 22 February, 2018; v1 submitted 13 January, 2018;
originally announced January 2018.
-
An introduction to Graph Data Management
Authors:
Renzo Angles,
Claudio Gutierrez
Abstract:
A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that…
▽ More
A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them.
△ Less
Submitted 29 December, 2017;
originally announced January 2018.
-
G-CORE: A Core for Future Graph Query Languages
Authors:
Renzo Angles,
Marcelo Arenas,
Pablo Barceló,
Peter Boncz,
George H. L. Fletcher,
Claudio Gutierrez,
Tobias Lindaaker,
Marcus Paradies,
Stefan Plantikow,
Juan Sequeda,
Oskar van Rest,
Hannes Voigt
Abstract:
We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class…
▽ More
We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity.
△ Less
Submitted 6 December, 2017; v1 submitted 5 December, 2017;
originally announced December 2017.
-
Foundations of Modern Query Languages for Graph Databases
Authors:
Renzo Angles,
Marcelo Arenas,
Pablo Barcelo,
Aidan Hogan,
Juan Reutter,
Domagoj Vrgoc
Abstract:
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start wit…
▽ More
We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed, and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area.
△ Less
Submitted 15 June, 2017; v1 submitted 19 October, 2016;
originally announced October 2016.
-
The multiset semantics of SPARQL patterns
Authors:
Renzo Angles,
Claudio Gutierrez
Abstract:
The paper determines the algebraic and logic structure of the multiset semantics of the core patterns of SPARQL. We prove that the fragment formed by AND, UNION, OPTIONAL, FILTER, MINUS and SELECT corresponds precisely to both, the intuitive multiset relational algebra (projection, selection, natural join, arithmetic union and except), and the multiset non-recursive Datalog with safe negation.
The paper determines the algebraic and logic structure of the multiset semantics of the core patterns of SPARQL. We prove that the fragment formed by AND, UNION, OPTIONAL, FILTER, MINUS and SELECT corresponds precisely to both, the intuitive multiset relational algebra (projection, selection, natural join, arithmetic union and except), and the multiset non-recursive Datalog with safe negation.
△ Less
Submitted 13 October, 2016;
originally announced October 2016.
-
Correlation and Substitution in SPARQL
Authors:
Daniel Hernández,
Claudio Gutierrez,
Renzo Angles
Abstract:
In the current SPARQL specification the notion of correlation and substitution are not well defined. This problem triggers several ambiguities in the semantics. In fact, implementations as Fuseki and Virtuoso assume different semantics.
In this technical report, we provide a semantics of correlation and substitution following the classic philosophy of substitution and correlation in logic, progr…
▽ More
In the current SPARQL specification the notion of correlation and substitution are not well defined. This problem triggers several ambiguities in the semantics. In fact, implementations as Fuseki and Virtuoso assume different semantics.
In this technical report, we provide a semantics of correlation and substitution following the classic philosophy of substitution and correlation in logic, programming languages and SQL. We think this proposal not only fix the current ambiguities and problems, but helps to set a safe formal base to further extensions of the language.
This work is part of an ongoing work of Daniel Hernandez. These anomalies in the W3C Specification of SPARQL 1.1 were detected early and reported no later than 2014, when two erratas were registered (cf. https://www.w3.org/2013/sparql-errata#errata-query-8 and https://www.w3.org/2013/sparql-errata#errata-query-10).
△ Less
Submitted 11 July, 2016; v1 submitted 4 June, 2016;
originally announced June 2016.
-
Negation in SPARQL
Authors:
Renzo Angles,
Claudio Gutierrez
Abstract:
This paper presents a thorough study of negation in SPARQL. The types of negation supported in SPARQL are identified and their main features discussed. Then, we study the expressive power of the corresponding negation operators. At this point, we identify a core SPARQL algebra which could be used instead of the W3C SPARQL algebra. Finally, we analyze the negation operators in terms of their compli…
▽ More
This paper presents a thorough study of negation in SPARQL. The types of negation supported in SPARQL are identified and their main features discussed. Then, we study the expressive power of the corresponding negation operators. At this point, we identify a core SPARQL algebra which could be used instead of the W3C SPARQL algebra. Finally, we analyze the negation operators in terms of their compliance with elementary axioms of set theory.
△ Less
Submitted 6 June, 2016; v1 submitted 19 March, 2016;
originally announced March 2016.