Search | arXiv e-print repository

Opportunities for Shape-based Optimization of Link Traversal Queries

Authors: Bryan-Elliott Tam, Ruben Taelman, Pieter Colpaert, Ruben Verborgh

Abstract: Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about… ▽ More Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape map**s with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 6 pages, 2 figures

arXiv:2406.10659 [pdf, other]

RDF Surfaces: Enabling Classical Negation on the Semantic Web

Authors: Patrick Hochstenbach, Mathijs van Noort, Dörthe Arndt, Rebekka Martens, Jos De Roo, Ruben Verborgh, Pieter Bonte, Femke Ongenae

Abstract: The Resource Description Framework (RDF) is a fundamental technology in the Semantic Web, enabling the representation and interchange of structured data. However, RDF lacks the capability to express negated statements in a generic way. As a result, exchanging negative information on a Web scale is thus far restricted to specific cases and predefined statements. The ability to negate (virtually) an… ▽ More The Resource Description Framework (RDF) is a fundamental technology in the Semantic Web, enabling the representation and interchange of structured data. However, RDF lacks the capability to express negated statements in a generic way. As a result, exchanging negative information on a Web scale is thus far restricted to specific cases and predefined statements. The ability to negate (virtually) any RDF statement allows for a comprehensive way to refute, deny or otherwise invalidate claims on a Web scale. Via an intermediate step of a diagrammatic approach to logical expressions called Peirce graphs, we introduce RDF Surfaces, an extension of RDF that incorporates the concept of classic negation, known from first-order logic. Overall, RDF Surfaces provides an abstract, visual approach to negation within the Semantic Web, offering a more general and widely applicable approach than previous attempts at incorporating negation. Aside from a (traditional) programmatic syntax, RDF Surfaces can also be represented visually by means of diagrams inspired by Peirce graphs. We demonstrate negation via RDF Surfaces and how to reason upon it in illustrative use cases drawn from the domains of academic publishing and eHealth. We hope this vision paper attracts new implementers and opens the discussion to its formal specification. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2309.16365 [pdf, other]

Libertas: Privacy-Preserving Computation for Decentralised Personal Data Stores

Authors: Rui Zhao, Naman Goel, Nitin Agrawal, Jun Zhao, Jake Stein, Ruben Verborgh, Reuben Binns, Tim Berners-Lee, Nigel Shadbolt

Abstract: Data-driven decision-making and AI applications present exciting new opportunities delivering widespread benefits. The rapid adoption of such applications triggers legitimate concerns about loss of privacy and misuse of personal data. This leads to a growing and pervasive tension between harvesting ubiquitous data on the Web and the need to protect individuals. Decentralised personal data stores (… ▽ More Data-driven decision-making and AI applications present exciting new opportunities delivering widespread benefits. The rapid adoption of such applications triggers legitimate concerns about loss of privacy and misuse of personal data. This leads to a growing and pervasive tension between harvesting ubiquitous data on the Web and the need to protect individuals. Decentralised personal data stores (PDS) such as Solid are frameworks designed to give individuals ultimate control over their personal data. But current PDS approaches have limited support for ensuring privacy when computations combine data spread across users. Secure Multi-Party Computation (MPC) is a well-known subfield of cryptography, enabling multiple autonomous parties to collaboratively compute a function while ensuring the secrecy of inputs (input privacy). These two technologies complement each other, but existing practices fall short in addressing the requirements and challenges of introducing MPC in a PDS environment. For the first time, we propose a modular design for integrating MPC with Solid while respecting the requirements of decentralisation in this context. Our architecture, Libertas, requires no protocol level changes in the underlying design of Solid, and can be adapted to other PDS. We further show how this can be combined with existing differential privacy techniques to also ensure output privacy. We use empirical benchmarks to inform and evaluate our implementation and design choices. We show the technical feasibility and scalability pattern of the proposed system in two novel scenarios -- 1) empowering gig workers with aggregate computations on their earnings data; and 2) generating high-quality differentially-private synthetic data without requiring a trusted centre. With this, we demonstrate the linear scalability of Libertas, and gained insights about compute optimisations under such an architecture. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2305.08476 [pdf, other]

RDF Surfaces: Computer Says No

Authors: Patrick Hochstenbach, Jos De Roo, Ruben Verborgh

Abstract: Logic can define how agents are provided or denied access to resources, how to interlink resources using mining processes and provide users with choices for possible next steps in a workflow. These decisions are for the most part hidden, internal to machines processing data. In order to exchange this internal logic a portable Web logic is required which the Semantic Web could provide. Combining lo… ▽ More Logic can define how agents are provided or denied access to resources, how to interlink resources using mining processes and provide users with choices for possible next steps in a workflow. These decisions are for the most part hidden, internal to machines processing data. In order to exchange this internal logic a portable Web logic is required which the Semantic Web could provide. Combining logic and data provides insights into the reasoning process and creates a new level of trust on the Semantic Web. Current Web logics carries only a fragment of first-order logic (FOL) to keep exchange languages decidable or easily processable. But, this is at a cost: the portability of logic. Machines require implicit agreements to know which fragment of logic is being exchanged and need a strategy for how to cope with the different fragments. These choices could obscure insights into the reasoning process. We created RDF Surfaces in order to express the full expressivity of FOL including saying explicitly `no'. This vision paper provides basic principles and compares existing work. Even though support for FOL is semi-decidable, we argue these problems are surmountable. RDF Surfaces span many use cases, including describing misuse of information, adding explainability and trust to reasoning, and providing scope for reasoning over streams of data and queries. RDF Surfaces provide the direct translation of FOL for the Semantic Web. We hope this vision paper attracts new implementers and opens the discussion to its formal specification. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 5 pages, position paper for the ESWC2023 TrusDeKW workshop

ACM Class: D.3; F.3; H.4

arXiv:2302.14411 [pdf, other]

Distributed Subweb Specifications for Traversing the Web

Authors: Bart Bogaerts, Bas Ketsman, Younes Zeboudj, Heba Aamer, Ruben Taelman, Ruben Verborgh

Abstract: Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables d… ▽ More Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP). △ Less

Submitted 27 March, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)

arXiv:2302.06933 [pdf]

Evaluation of Link Traversal Query Execution over Decentralized Environments with Structural Assumptions

Authors: Ruben Taelman, Ruben Verborgh

Abstract: To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for… ▽ More To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, and provide the SolidBench benchmark to simulate Solid environments representatively. We introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds for non-complex queries, which existing algorithms cannot achieve. Furthermore, we discuss limitations with respect to more complex queries. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: Not peer-reviewed

arXiv:2210.04631 [pdf]

A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query Processing (Extended Version)

Authors: Ruben Taelman, Ruben Verborgh

Abstract: The societal and economical consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data network… ▽ More The societal and economical consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine's host machine or the query initiator's personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. Each threat is explained, together with an example, and one or more avenues for mitigations are proposed. We conclude with several concrete recommendations for LTQP query engine developers and data publishers as a first step to mitigate some of these issues. With this work, we start filling the unknowns for enabling querying over decentralized environments. Aside from future work on security, wider research is needed to uncover missing building blocks for enabling true decentralization. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: This is an extended version of an article with the same title published in the proceedings of the QuWeDa workshop at ISWC 2022. Next to more details in the related work and conclusions sections, this extension introduces concrete mitigations of each vulnerability

arXiv:2208.00665 [pdf, other]

Event Notifications in Value-Adding Networks

Authors: Patrick Hochstenbach, Herbert Van de Sompel, Miel Vander Sande, Ruben Dedecker, Ruben Verborgh

Abstract: Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. I… ▽ More Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. In this paper we report on work aimed at making linkages available in real-time, in which an alternative, decentralised scholarly communication network is considered that consists of interacting data nodes that host artifacts and service nodes that add value to artifacts. The first result of this work, the "Event Notifications in Value-Adding Networks" specification, details interoperability requirements for the exchange of real-time life-cycle information pertaining to artifacts using Linked Data Notifications. In an experiment, we applied our specification to one particular use-case: distributing Scholix data-literature links to a network of Belgian institutional repositories by a national service node. The results of our experiment confirm the potential of our approach and provide a framework to create a network of interacting nodes implementing the core scholarly functions (registration, certification, awareness and archiving) in a decentralized and decoupled way. △ Less

Submitted 3 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 12 pages, 2 figures, Accepted at the 26th International Conference on Theory and Practice of Digital Libraries, Padua, Italy

arXiv:2005.02239 [pdf, ps, other]

Guided Link-Traversal-Based Query Processing

Authors: Ruben Verborgh, Ruben Taelman

Abstract: Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and tr… ▽ More Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and trustworthiness of results, given that sources are selected dynamically. To address both issues, we propose guided LTBQP, a technique in which information about document linking structure and content policies is passed to a query processor. Thereby, the processor can prune the search tree of documents by only following relevant links, and restrict the result set to desired results by limiting which documents are considered for what kinds of content. In this exploratory paper, we describe the technique at a high level and sketch some of its applications. We argue that such guidance can make LTBQP a valuable query strategy in decentralized environments, where data is spread across documents with varying levels of user trust. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: 4 pages

arXiv:1609.07108 [pdf, other]

A Web API ecosystem through feature-based reuse

Authors: Ruben Verborgh, Michel Dumontier

Abstract: The fast-growing Web API landscape brings clients more options than ever before---in theory. In practice, they cannot easily switch between different providers offering similar functionality. We discuss a vision for develo** Web APIs based on reuse of interface parts called features. Through the introduction of 5 design principles, we investigate the impact of feature-based reuse on Web APIs. Ap… ▽ More The fast-growing Web API landscape brings clients more options than ever before---in theory. In practice, they cannot easily switch between different providers offering similar functionality. We discuss a vision for develo** Web APIs based on reuse of interface parts called features. Through the introduction of 5 design principles, we investigate the impact of feature-based reuse on Web APIs. Applying these principles enables a granular reuse of client and server code, documentation, and tools. Together, they can foster a measurable ecosystem with cross-API compatibility, opening the door to a more flexible generation of Web clients. △ Less

Submitted 12 March, 2018; v1 submitted 22 September, 2016; originally announced September 2016.

arXiv:1512.07780 [pdf, other]

doi 10.1017/S1471068416000016

The Pragmatic Proof: Hypermedia API Composition and Execution

Authors: Ruben Verborgh, Dörthe Arndt, Sofie Van Hoecke, Jos De Roo, Giovanni Mels, Thomas Steiner, Joaquim Gabarro

Abstract: Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume… ▽ More Machine clients are increasingly making use of the Web to perform tasks. While Web services traditionally mimic remote procedure calling interfaces, a new generation of so-called hypermedia APIs works through hyperlinks and forms, in a way similar to how people browse the Web. This means that existing composition techniques, which determine a procedural plan upfront, are not sufficient to consume hypermedia APIs, which need to be navigated at runtime. Clients instead need a more dynamic plan that allows them to follow hyperlinks and use forms with a preset goal. Therefore, in this article, we show how compositions of hypermedia APIs can be created by generic Semantic Web reasoners. This is achieved through the generation of a proof based on semantic descriptions of the APIs' functionality. To pragmatically verify the applicability of compositions, we introduce the notion of pre-execution and post-execution proofs. The runtime interaction between a client and a server is guided by proofs but driven by hypermedia, allowing the client to react to the application's actual state indicated by the server's response. We describe how to generate compositions from descriptions, discuss a computer-assisted process to generate descriptions, and verify reasoner performance on various composition tasks using a benchmark suite. The experimental results lead to the conclusion that proof-based consumption of hypermedia APIs is a feasible strategy at Web scale. △ Less

Submitted 24 December, 2015; originally announced December 2015.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)

arXiv:1501.06329 [pdf, other]

Disaster Monitoring with Wikipedia and Online Social Networking Sites: Structured Data and Linked Data Fragments to the Rescue?

Authors: Thomas Steiner, Ruben Verborgh

Abstract: In this paper, we present the first results of our ongoing early-stage research on a realtime disaster detection and monitoring tool. Based on Wikipedia, it is language-agnostic and leverages user-generated multimedia content shared on online social networking sites to help disaster responders prioritize their efforts. We make the tool and its source code publicly available as we make progress on… ▽ More In this paper, we present the first results of our ongoing early-stage research on a realtime disaster detection and monitoring tool. Based on Wikipedia, it is language-agnostic and leverages user-generated multimedia content shared on online social networking sites to help disaster responders prioritize their efforts. We make the tool and its source code publicly available as we make progress on it. Furthermore, we strive to publish detected disasters and accompanying multimedia content following the Linked Data principles to facilitate its wide consumption, redistribution, and evaluation of its usefulness. △ Less

Submitted 26 January, 2015; originally announced January 2015.

Comments: Accepted for publication at the AAAI Spring Symposium 2015: Structured Data for Humanitarian Technologies: Perfect fit or Overkill? #SD4HumTech15

Showing 1–12 of 12 results for author: Verborgh, R