Search | arXiv e-print repository

Path-based Algebraic Foundations of Graph Query Languages

Authors: Renzo Angles, Angela Bonifati, Roberto García, Domagoj Vrgoč

Abstract: Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data model and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, th… ▽ More Graph databases are gaining momentum thanks to the flexibility and expressiveness of their data model and query languages. A standardization activity driven by the ISO/IEC standardization body is also ongoing and has already conducted to the specification of the first versions of two standard graph query languages, namely SQL/PGQ and GQL, respectively in 2023 and 2024. Apart from the standards, there exists a panoply of concrete graph query languages in commercial and open-source graph databases, each of which exhibits different features and modes. In this paper, we tackle the heterogeneity problem of graph query languages by laying the foundations of a unifying path-oriented algebraic framework. Such a theoretical framework is currently missing in the graph databases landscape, thus impeding a lingua franca in which different graph query language implementations can be expressed and cross-compared. Our framework gives a blueprint for correct implementation of graph queries of different expressiveness. It allows to overcome the boundaries of current versions of standard query languages, thus paving the way to future extensions including query composability. It also allows, when the path-based semantics is stripped off, to express classical Codd's relational algebra enhanced with a recursive operator, thus proving its utility for a wide range of queries in database management systems. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Under review

arXiv:2307.06119 [pdf, other]

SparqLog: A System for Efficient Evaluation of SPARQL 1.1 Queries via Datalog [Experiment, Analysis and Benchmark]

Authors: Renzo Angles, Georg Gottlob, Aleksandar Pavlovic, Reinhard Pichler, Emanuel Sallinger

Abstract: Over the past decade, Knowledge Graphs have received enormous interest both from industry and from academia. Research in this area has been driven, above all, by the Database (DB) community and the Semantic Web (SW) community. However, there still remains a certain divide between approaches coming from these two communities. For instance, while languages such as SQL or Datalog are widely used in t… ▽ More Over the past decade, Knowledge Graphs have received enormous interest both from industry and from academia. Research in this area has been driven, above all, by the Database (DB) community and the Semantic Web (SW) community. However, there still remains a certain divide between approaches coming from these two communities. For instance, while languages such as SQL or Datalog are widely used in the DB area, a different set of languages such as SPARQL and OWL is used in the SW area. Interoperability between such technologies is still a challenge. The goal of this work is to present a uniform and consistent framework meeting important requirements from both, the SW and DB field. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2211.10962 [pdf, ps, other]

doi 10.1145/3589778

PG-Schema: Schemas for Property Graphs

Authors: Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savković, Michael Schmidt, Juan Sequeda, Sławek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoč, Mingxi Wu, Dušan Živković

Abstract: Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL… ▽ More Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Types with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems. △ Less

Submitted 8 July, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: 26 pages

Journal ref: Proc. ACM Manag. Data (2023)

arXiv:2111.01540 [pdf, other]

MillenniumDB: A Persistent, Open-Source, Graph Database

Authors: Domagoj Vrgoc, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, Juan Romero

Abstract: In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data manage… ▽ More In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2012.06171 [pdf, other]

doi 10.1145/3434642

The Future is Big Graphs! A Community View on Graph Processing Systems

Authors: Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow , et al. (16 additional authors not shown)

Abstract: Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t… ▽ More Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue to succeed? △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the ACM

ACM Class: C.3; E.0; H.2; J.0

arXiv:2001.02299 [pdf, other]

The LDBC Social Network Benchmark

Authors: Renzo Angles, János Benjamin Antal, Alex Averbuch, Altan Birler, Peter Boncz, Márton Búr, Orri Erling, Andrey Gubichev, Vlad Haprian, Moritz Kaufmann, Josep Lluís Larriba Pey, Norbert Martínez, József Marton, Marcus Paradies, Minh-Duc Pham, Arnau Prat-Pérez, David Püroja, Mirko Spasić, Benjamin A. Steer, Dávid Szakállas, Gábor Szárnyas, Jack Waudby, Mingxi Wu, Yuchen Zhang

Abstract: The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (int… ▽ More The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (interactive transactional queries) and the Business Intelligence workload (analytical queries). This document contains the definition of both workloads. This includes a detailed explanation of the data used in the LDBC SNB, a detailed description for all queries, and instructions on how to generate the data and run the benchmark with the provided software. △ Less

Submitted 14 January, 2024; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: For the repository containing the source code of this technical report, see https://github.com/ldbc/ldbc_snb_docs

ACM Class: H.2.4

arXiv:1912.02127 [pdf, other]

doi 10.1109/ACCESS.2020.2993117

Directly Map** RDF Databases to Property Graph Databases

Authors: Renzo Angles, Harsh Thakkar, Dominik Tomaszuk

Abstract: RDF triplestores and property graph databases are two approaches for data management which are based on modeling, storing, and querying graph-like data. In spite of such common principles, they present special features that complicate the task of database interoperability. While there exist some methods to transform RDF graphs into property graphs, and vice versa, they lack compatibility and a sol… ▽ More RDF triplestores and property graph databases are two approaches for data management which are based on modeling, storing, and querying graph-like data. In spite of such common principles, they present special features that complicate the task of database interoperability. While there exist some methods to transform RDF graphs into property graphs, and vice versa, they lack compatibility and a solid formal foundation. This paper presents three direct map**s (schema-dependent and schema-independent) for transforming an RDF database into a property graph database, including data and schema. We show that two of the proposed map**s satisfy the properties of semantics preservation and information preservation. The existence of both map**s allows us to conclude that the property graph data model subsumes the information capacity of the RDF data model. △ Less

Submitted 3 June, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

Comments: This work has been accepted and published at the IEEE Access Journal DOI: 10.1109/ACCESS.2020.2993117

Journal ref: IEEE Access Volume 8, 2020

arXiv:1801.04387 [pdf, ps, other]

The Problem of Correlation and Substitution in SPARQL -- Extended Version

Authors: Daniel Hernández, Claudio Gutierrez, Renzo Angles

Abstract: Implementations of a standard language are expected to give same outputs to identical queries. In this paper we study why different implementations of SPARQL (Fuseki, Virtuoso, Blazegraph and rdf4j) behave differently when evaluating queries with correlated variables. We show that at the core of this problem lies the historically troubling notion of logical substitution. We present a formal framew… ▽ More Implementations of a standard language are expected to give same outputs to identical queries. In this paper we study why different implementations of SPARQL (Fuseki, Virtuoso, Blazegraph and rdf4j) behave differently when evaluating queries with correlated variables. We show that at the core of this problem lies the historically troubling notion of logical substitution. We present a formal framework to study this issue based on Datalog that besides clarifying the problem, gives a solid base to define and implement nesting. △ Less

Submitted 22 February, 2018; v1 submitted 13 January, 2018; originally announced January 2018.

arXiv:1801.00036 [pdf, other]

doi 10.1007/978-3-319-96193-4_1

An introduction to Graph Data Management

Authors: Renzo Angles, Claudio Gutierrez

Abstract: A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that… ▽ More A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them. △ Less

Submitted 29 December, 2017; originally announced January 2018.

arXiv:1712.01550 [pdf, other]

G-CORE: A Core for Future Graph Query Languages

Authors: Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, Oskar van Rest, Hannes Voigt

Abstract: We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class… ▽ More We report on a community effort between industry and academia to shape the future of graph query languages. We argue that existing graph database management systems should consider supporting a query language with two key characteristics. First, it should be composable, meaning, that graphs are the input and the output of queries. Second, the graph query language should treat paths as first-class citizens. Our result is G-CORE, a powerful graph query language design that fulfills these goals, and strikes a careful balance between path query expressivity and evaluation complexity. △ Less

Submitted 6 December, 2017; v1 submitted 5 December, 2017; originally announced December 2017.

arXiv:1610.06264 [pdf, ps, other]

Foundations of Modern Query Languages for Graph Databases

Authors: Renzo Angles, Marcelo Arenas, Pablo Barcelo, Aidan Hogan, Juan Reutter, Domagoj Vrgoc

Abstract: We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start wit… ▽ More We survey foundational features underlying modern graph query languages. We first discuss two popular graph data models: edge-labelled graphs, where nodes are connected by directed, labelled edges; and property graphs, where nodes and edges can further have attributes. Next we discuss the two most fundamental graph querying functionalities: graph patterns and navigational expressions. We start with graph patterns, in which a graph-structured query is matched against the data. Thereafter we discuss navigational expressions, in which patterns can be matched recursively against the graph to navigate paths of arbitrary length; we give an overview of what kinds of expressions have been proposed, and how they can be combined with graph patterns. We also discuss several semantics under which queries using the previous features can be evaluated, what effects the selection of features and semantics has on complexity, and offer examples of such features in three modern languages that are used to query graphs: SPARQL, Cypher and Gremlin. We conclude by discussing the importance of formalisation for graph query languages; a summary of what is known about SPARQL, Cypher and Gremlin in terms of expressivity and complexity; and an outline of possible future directions for the area. △ Less

Submitted 15 June, 2017; v1 submitted 19 October, 2016; originally announced October 2016.

arXiv:1610.04315 [pdf, ps, other]

The multiset semantics of SPARQL patterns

Authors: Renzo Angles, Claudio Gutierrez

Abstract: The paper determines the algebraic and logic structure of the multiset semantics of the core patterns of SPARQL. We prove that the fragment formed by AND, UNION, OPTIONAL, FILTER, MINUS and SELECT corresponds precisely to both, the intuitive multiset relational algebra (projection, selection, natural join, arithmetic union and except), and the multiset non-recursive Datalog with safe negation. The paper determines the algebraic and logic structure of the multiset semantics of the core patterns of SPARQL. We prove that the fragment formed by AND, UNION, OPTIONAL, FILTER, MINUS and SELECT corresponds precisely to both, the intuitive multiset relational algebra (projection, selection, natural join, arithmetic union and except), and the multiset non-recursive Datalog with safe negation. △ Less

Submitted 13 October, 2016; originally announced October 2016.

Comments: This is an extended and updated version of the paper accepted at the International Semantic Web Conference 2016

arXiv:1606.01441 [pdf, ps, other]

Correlation and Substitution in SPARQL

Authors: Daniel Hernández, Claudio Gutierrez, Renzo Angles

Abstract: In the current SPARQL specification the notion of correlation and substitution are not well defined. This problem triggers several ambiguities in the semantics. In fact, implementations as Fuseki and Virtuoso assume different semantics. In this technical report, we provide a semantics of correlation and substitution following the classic philosophy of substitution and correlation in logic, progr… ▽ More In the current SPARQL specification the notion of correlation and substitution are not well defined. This problem triggers several ambiguities in the semantics. In fact, implementations as Fuseki and Virtuoso assume different semantics. In this technical report, we provide a semantics of correlation and substitution following the classic philosophy of substitution and correlation in logic, programming languages and SQL. We think this proposal not only fix the current ambiguities and problems, but helps to set a safe formal base to further extensions of the language. This work is part of an ongoing work of Daniel Hernandez. These anomalies in the W3C Specification of SPARQL 1.1 were detected early and reported no later than 2014, when two erratas were registered (cf. https://www.w3.org/2013/sparql-errata#errata-query-8 and https://www.w3.org/2013/sparql-errata#errata-query-10). △ Less

Submitted 11 July, 2016; v1 submitted 4 June, 2016; originally announced June 2016.

arXiv:1603.06053 [pdf, ps, other]

Negation in SPARQL

Authors: Renzo Angles, Claudio Gutierrez

Abstract: This paper presents a thorough study of negation in SPARQL. The types of negation supported in SPARQL are identified and their main features discussed. Then, we study the expressive power of the corresponding negation operators. At this point, we identify a core SPARQL algebra which could be used instead of the W3C SPARQL algebra. Finally, we analyze the negation operators in terms of their compli… ▽ More This paper presents a thorough study of negation in SPARQL. The types of negation supported in SPARQL are identified and their main features discussed. Then, we study the expressive power of the corresponding negation operators. At this point, we identify a core SPARQL algebra which could be used instead of the W3C SPARQL algebra. Finally, we analyze the negation operators in terms of their compliance with elementary axioms of set theory. △ Less

Submitted 6 June, 2016; v1 submitted 19 March, 2016; originally announced March 2016.

Comments: Proc. of the Alberto Mendelzon International Workshop on Foundations of Data Management (AMW'2016)

Showing 1–14 of 14 results for author: Angles, R