Search | arXiv e-print repository

LiveData -- A Worldwide Data Mesh for Stratified Data

Authors: Simone Bocca, Amarsanaa Ganbold, Tsolmon Zundui

Abstract: Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information i… ▽ More Data reuse is fundamental for reducing the data integration effort required to build data supporting new applications, especially in data scarcity contexts. However, data reuse requires to deal with data heterogeneity, which is always present in data coming from different sources. Such heterogeneity appears at different levels, like the language used by the data, the structure of the information it represents, and the data types and formats adopted by the datasets. Despite the valuable insights gained by reusing data across contexts, dealing with data heterogeneity is still a high price to pay. Additionally, data reuse is hampered by the lack of data distribution infrastructures supporting the production and distribution of quality and interoperable data. These issues affecting data reuse are amplified considering cross-country data reuse, where geographical and cultural differences are more pronounced. In this paper, we propose LiveData, a cross-country data distribution network handling high quality and diversity-aware data. LiveData is composed by different nodes having an architecture providing components for the generation and distribution of a new type of data, where heterogeneity is transformed into information diversity and considered as a feature, explicitly defined and used to satisfy the data users purposes. This paper presents the specification of the LiveData network, by defining the architecture and the type of data handled by its nodes. This specification is currently being used to implement a concrete use case for data reuse and integration between the University of Trento (Italy) and the National University of Mongolia. △ Less

Submitted 27 May, 2024; originally announced July 2024.

Comments: Accepted to MMT-2024 Mongolian conference and ICTfocus journal (https://ictfocus.org/)

arXiv:2305.06088 [pdf, other]

Building Interoperable Electronic Health Records as Purpose-Driven Knowledge Graphs

Authors: Simone Bocca, Alessio Zamboni, Gabor Bella, Yamini Chandrashekar, Mayukh Bagchi, Gabriel Kuper, Paolo Bouquet, Fausto Giunchiglia

Abstract: When building a new application we are increasingly confronted with the need of reusing and integrating pre-existing knowledge. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This is true also in domains, e.g., eHealth, where a lot of effort has been put into develo** high-quality standards and reference ontologies, e.g. FHIR1. In this paper, we prop… ▽ More When building a new application we are increasingly confronted with the need of reusing and integrating pre-existing knowledge. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This is true also in domains, e.g., eHealth, where a lot of effort has been put into develo** high-quality standards and reference ontologies, e.g. FHIR1. In this paper, we propose an integrated methodology, called iTelos, which enables data and knowledge reuse towards the construction of Interoperable Electronic Health Records (iEHR). The key intuition is that the data level and the schema level of an application should be developed independently, thus allowing for maximum flexibility in the reuse of the prior knowledge, but under the overall guidance of the needs to be satisfied, formalized as competence queries. This intuition is implemented by codifying all the requirements, including those concerning reuse, as part of a purpose defined a priori, which is then used to drive a middle-out development process where the application schema and data are continuously aligned. The proposed methodology is validated through its application to a large-scale case study. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: DSAI SPRINGER BOOK. arXiv admin note: text overlap with arXiv:2105.09418

Report number: DISIDSAI2023

Journal ref: DSAI SPRINGER BOOK, 2023

arXiv:2209.14049 [pdf, other]

Popularity Driven Data Integration

Authors: Fausto Giunchiglia, Simone Bocca, Mattia Fumagalli, Mayukh Bagchi, Alessio Zamboni

Abstract: More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos is a general purpose methodology aiming at minimizing the effects… ▽ More More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos is a general purpose methodology aiming at minimizing the effects of this process. The intuition is that data will be treated differently based on their popularity: the more a certain set of data have been reused, the more they will be reused and the less they will be changed across reuses, thus decreasing the overall data preprocessing costs, while increasing backward compatibility and future sharing △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: KGSWC 2022. Fourth Ibero-American Knowledge Graph and Semantic Web Conference joint with Third Indo-American Knowledge Graph and Semantic Web Conference 21-23 November 2022, Universidad Camilo José Cela, Madrid, Spain. arXiv admin note: substantial text overlap with arXiv:2105.09418

arXiv:2105.09432 [pdf, other]

Stratified Data Integration

Authors: Fausto Giunchiglia, Alessio Zamboni, Mayukh Bagchi, Simone Bocca

Abstract: We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph wh… ▽ More We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph where nodes are entity types and links are properties), and data(in the form of a graph of entities populating the previous knowledge graph). This allows us to state the problem of semantic heterogeneity as a problem of Representation Diversity where the different types of heterogeneity, viz. Conceptual, Language, Knowledge, and Data, are uniformly dealt within each single layer, independently from the others. In this paper we describe the proposed stratified representation of data and the process by which data are first transformed into the target representation, then suitably integrated and then, finally, presented to the user in her preferred format. The proposed framework has been evaluated in various pilot case studies and in a number of industrial data integration problems. △ Less

Submitted 19 May, 2021; originally announced May 2021.

arXiv:2105.09418 [pdf, other]

iTelos -- Purpose Driven Knowledge Graph Generation

Authors: Fausto Giunchiglia, Simone Bocca, Mattia Fumagalli, Mayukh Bagchi, Alessio Zamboni

Abstract: When building a new application we are more and more confronted with the need of reusing and integrating pre-existing knowledge, e.g., ontologies, schemas, data of any kind, from multiple sources. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This difficulty is the cause of high costs, with the further drawback that the resulting application will agai… ▽ More When building a new application we are more and more confronted with the need of reusing and integrating pre-existing knowledge, e.g., ontologies, schemas, data of any kind, from multiple sources. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This difficulty is the cause of high costs, with the further drawback that the resulting application will again be hardly reusable. It is a negative loop which consistently reinforces itself. iTelos is a general purpose methodology aiming at minimizing as much as possible the effects of this loop. iTelos is based on the intuition that the data level and the schema level of an application should be developed independently, thus allowing for maximum flexibility in the reuse of the prior knowledge, but under the overall guidance of the needs to be satisfied, formalized as competence queries. This intuition is implemented by codifying all the requirements, including those concerning reuse, as part of an a-priori defined purpose, which is then used to drive a middle-out development process where the application schema and data are continuously aligned. △ Less

Submitted 15 December, 2021; v1 submitted 19 May, 2021; originally announced May 2021.

arXiv:2103.07703 [pdf]

Is your Schema Good Enough to Answer my Query?

Authors: Yuanwei Zhao, Lan Huang, Bo Wang, Dongxu Zhang, Simone Bocca, Fausto Giunchiglia, Rui Zhang

Abstract: Ontology-based data integration has been one of the practical methodologies for heterogeneous legacy database integrated service construction. However, it is neither efficient nor economical to build the cross-domain ontology on top of the schemas of each legacy database for the specific integration application than to reuse the existed ontologies. Then the question lies in whether the existed ont… ▽ More Ontology-based data integration has been one of the practical methodologies for heterogeneous legacy database integrated service construction. However, it is neither efficient nor economical to build the cross-domain ontology on top of the schemas of each legacy database for the specific integration application than to reuse the existed ontologies. Then the question lies in whether the existed ontology is compatible with the cross-domain queries and with all the legacy systems. It is highly needed an effective criteria to evaluate the compatibility as it limits the upbound quality of the integrated services. This paper studies the semantic similarity of schemas from the aspect of properties. It provides a set of in-depth criteria, namely coverage and flexibility to evaluate the compatibility among the queries, the schemas and the existing ontology. The weights of classes are extended to make precise compatibility computation. The use of such criteria in the practical project verifies the applicability of our method. △ Less

Submitted 13 March, 2021; originally announced March 2021.

Showing 1–6 of 6 results for author: Bocca, S