Search | arXiv e-print repository

From Knowledge Organization to Knowledge Representation and Back

Authors: Fausto Giunchiglia, Mayukh Bagchi, Subhashis Das

Abstract: Knowledge Organization (KO) and Knowledge Representation (KR) have been the two mainstream methodologies of knowledge modelling in the Information Science community and the Artificial Intelligence community, respectively. The facet-analytical tradition of KO has developed an exhaustive set of guiding canons for ensuring quality in organising and managing knowledge but has remained limited in terms… ▽ More Knowledge Organization (KO) and Knowledge Representation (KR) have been the two mainstream methodologies of knowledge modelling in the Information Science community and the Artificial Intelligence community, respectively. The facet-analytical tradition of KO has developed an exhaustive set of guiding canons for ensuring quality in organising and managing knowledge but has remained limited in terms of technology-driven activities to expand its scope and services beyond the bibliographic universe of knowledge. KR, on the other hand, boasts of a robust ecosystem of technologies and technology-driven service design which can be tailored to model any entity or scale to any service in the entire universe of knowledge. This paper elucidates both the facet-analytical KO and KR methodologies in detail and provides a functional map** between them. Out of the map**, the paper proposes an integrated KR-enriched KO methodology with all the standard components of a KO methodology plus the advanced technologies provided by the KR approach. The practical benefits of the methodological integration has been exemplified through the flagship application of the Digital University at the University of Trento, Italy. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted @ Annals of Library and Information Studies (ALIS) Journal - Ranganathan Commemorative Issue (2024)

Report number: DISI22012024

arXiv:2312.07302 [pdf, other]

From Knowledge Representation to Knowledge Organization and Back

Authors: Fausto Giunchiglia, Mayukh Bagchi

Abstract: Knowledge Representation (KR) and facet-analytical Knowledge Organization (KO) have been the two most prominent methodologies of data and knowledge modelling in the Artificial Intelligence community and the Information Science community, respectively. KR boasts of a robust and scalable ecosystem of technologies to support knowledge modelling while, often, underemphasizing the quality of its models… ▽ More Knowledge Representation (KR) and facet-analytical Knowledge Organization (KO) have been the two most prominent methodologies of data and knowledge modelling in the Artificial Intelligence community and the Information Science community, respectively. KR boasts of a robust and scalable ecosystem of technologies to support knowledge modelling while, often, underemphasizing the quality of its models (and model-based data). KO, on the other hand, is less technology-driven but has developed a robust framework of guiding principles (canons) for ensuring modelling (and model-based data) quality. This paper elucidates both the KR and facet-analytical KO methodologies in detail and provides a functional map** between them. Out of the map**, the paper proposes an integrated KO-enriched KR methodology with all the standard components of a KR methodology plus the guiding canons of modelling quality provided by KO. The practical benefits of the methodological integration has been exemplified through a prominent case study of KR-based image annotation exercise. △ Less

Submitted 8 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: International Conference on Information (iConference) 2024 - Wisdom, Well-being, Win-win - Springer LNCS, Springer Cham Switzerland

Report number: KNOWDIVEDISI012024

arXiv:2311.12465 [pdf, other]

Towards a Gateway for Knowledge Graph Schemas Collection, Analysis, and Embedding

Authors: Mattia Fumagalli, Marco Boffo, Daqian Shi, Mayukh Bagchi, Fausto Giunchiglia

Abstract: One of the significant barriers to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning pro… ▽ More One of the significant barriers to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning process. Currently, many high-quality catalogs of knowledge graphs, are available. However, their primary goal is the re-usability of these resources, and their interconnection, in the context of the Semantic Web. This paper describes the LiveSchema initiative, namely, a first version of a gateway that has the main scope of leveraging the gold mine of data collected by many existing catalogs collecting relational data like ontologies and knowledge graphs. At the current state, LiveSchema contains - 1000 datasets from 4 main sources and offers some key facilities, which allow to: i) evolving LiveSchema, by aggregating other source catalogs and repositories as input sources; ii) querying all the collected resources; iii) transforming each given dataset into formal concept analysis matrices that enable analysis and visualization services; iv) generating models and tensors from each given dataset. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: Ontology Showcase and Demonstrations Track, 9th Joint Ontology Workshops (JOWO 2023), Co-located with FOIS 2023, 19-20 July, 2023, Sherbrooke, Québec, Canada. arXiv admin note: substantial text overlap with arXiv:2207.06112

Report number: DISIKNOWDIVE21112023

arXiv:2307.14119 [pdf, other]

A semantics-driven methodology for high-quality image annotation

Authors: Fausto Giunchiglia, Mayukh Bagchi, Xiaolei Diao

Abstract: Recent work in Machine Learning and Computer Vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many map**s which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that… ▽ More Recent work in Machine Learning and Computer Vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many map**s which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this paper, we propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology whose main goal is to make explicit the (otherwise implicit) intended annotation semantics, thus minimizing the number and role of subjective choices. A key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels and, as a consequence, for driving the annotation of images based on the objects and the visual properties they depict. The methodology is validated on images populating a subset of the ImageNet hierarchy. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: Accepted @ 26th European Conference on Artificial Intelligence (ECAI) 2023, Kraków, Poland

Report number: KDECAI23

arXiv:2305.06088 [pdf, other]

Building Interoperable Electronic Health Records as Purpose-Driven Knowledge Graphs

Authors: Simone Bocca, Alessio Zamboni, Gabor Bella, Yamini Chandrashekar, Mayukh Bagchi, Gabriel Kuper, Paolo Bouquet, Fausto Giunchiglia

Abstract: When building a new application we are increasingly confronted with the need of reusing and integrating pre-existing knowledge. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This is true also in domains, e.g., eHealth, where a lot of effort has been put into develo** high-quality standards and reference ontologies, e.g. FHIR1. In this paper, we prop… ▽ More When building a new application we are increasingly confronted with the need of reusing and integrating pre-existing knowledge. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This is true also in domains, e.g., eHealth, where a lot of effort has been put into develo** high-quality standards and reference ontologies, e.g. FHIR1. In this paper, we propose an integrated methodology, called iTelos, which enables data and knowledge reuse towards the construction of Interoperable Electronic Health Records (iEHR). The key intuition is that the data level and the schema level of an application should be developed independently, thus allowing for maximum flexibility in the reuse of the prior knowledge, but under the overall guidance of the needs to be satisfied, formalized as competence queries. This intuition is implemented by codifying all the requirements, including those concerning reuse, as part of a purpose defined a priori, which is then used to drive a middle-out development process where the application schema and data are continuously aligned. The proposed methodology is validated through its application to a large-scale case study. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: DSAI SPRINGER BOOK. arXiv admin note: text overlap with arXiv:2105.09418

Report number: DISIDSAI2023

Journal ref: DSAI SPRINGER BOOK, 2023

arXiv:2304.08989 [pdf, other]

Incremental Image Labeling via Iterative Refinement

Authors: Fausto Giunchiglia, Xiaolei Diao, Mayukh Bagchi

Abstract: Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many map** between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer… ▽ More Data quality is critical for multimedia tasks, while various types of systematic flaws are found in image benchmark datasets, as discussed in recent work. In particular, the existence of the semantic gap problem leads to a many-to-many map** between the information extracted from an image and its linguistic description. This unavoidable bias further leads to poor performance on current computer vision tasks. To address this issue, we introduce a Knowledge Representation (KR)-based methodology to provide guidelines driving the labeling process, thereby indirectly introducing intended semantics in ML models. Specifically, an iterative refinement-based annotation method is proposed to optimize data labeling by organizing objects in a classification hierarchy according to their visual properties, ensuring that they are aligned with their linguistic descriptions. Preliminary results verify the effectiveness of the proposed method. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Journal ref: IWCIM@ICASSP 2023

arXiv:2304.00004 [pdf, other]

Disentangling Domain Ontologies

Authors: Mayukh Bagchi, Subhashis Das

Abstract: In this paper, we introduce and illustrate the novel phenomenon of Conceptual Entanglement which emerges due to the representational manifoldness immanent while incrementally modelling domain ontologies step-by-step across the following five levels: perception, labelling, semantic alignment, hierarchical modelling and intensional definition. In turn, we propose Conceptual Disentanglement, a multi-… ▽ More In this paper, we introduce and illustrate the novel phenomenon of Conceptual Entanglement which emerges due to the representational manifoldness immanent while incrementally modelling domain ontologies step-by-step across the following five levels: perception, labelling, semantic alignment, hierarchical modelling and intensional definition. In turn, we propose Conceptual Disentanglement, a multi-level conceptual modelling strategy which enforces and explicates, via guiding principles, semantic bijections with respect to each level of conceptual entanglement (across all the above five levels) paving the way for engineering conceptually disentangled domain ontologies. We also briefly argue why state-of-the-art ontology development methodologies and approaches are insufficient with respect to our characterization. △ Less

Submitted 21 March, 2023; originally announced April 2023.

Comments: In: Proceedings of the 19th Italian Research Conference on Digital Libraries (IRCDL), February 23-24, 2023, Bari, Italy

Report number: IRCDL2023

arXiv:2212.06629 [pdf, other]

Aligning Visual and Lexical Semantics

Authors: Fausto Giunchiglia, Mayukh Bagchi, Xiaolei Diao

Abstract: We discuss two kinds of semantics relevant to Computer Vision (CV) systems - Visual Semantics and Lexical Semantics. While visual semantics focus on how humans build concepts when using vision to perceive a target reality, lexical semantics focus on how humans build concepts of the same target reality through the use of language. The lack of coincidence between visual and lexical semantics, in tur… ▽ More We discuss two kinds of semantics relevant to Computer Vision (CV) systems - Visual Semantics and Lexical Semantics. While visual semantics focus on how humans build concepts when using vision to perceive a target reality, lexical semantics focus on how humans build concepts of the same target reality through the use of language. The lack of coincidence between visual and lexical semantics, in turn, has a major impact on CV systems in the form of the Semantic Gap Problem (SGP). The paper, while extensively exemplifying the lack of coincidence as above, introduces a general, domain-agnostic methodology to enforce alignment between visual and lexical semantics. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: iConference 2023, Barcelona, March 27 - 29, 2023

arXiv:2209.14049 [pdf, other]

Popularity Driven Data Integration

Authors: Fausto Giunchiglia, Simone Bocca, Mattia Fumagalli, Mayukh Bagchi, Alessio Zamboni

Abstract: More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos is a general purpose methodology aiming at minimizing the effects… ▽ More More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos is a general purpose methodology aiming at minimizing the effects of this process. The intuition is that data will be treated differently based on their popularity: the more a certain set of data have been reused, the more they will be reused and the less they will be changed across reuses, thus decreasing the overall data preprocessing costs, while increasing backward compatibility and future sharing △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: KGSWC 2022. Fourth Ibero-American Knowledge Graph and Semantic Web Conference joint with Third Indo-American Knowledge Graph and Semantic Web Conference 21-23 November 2022, Universidad Camilo José Cela, Madrid, Spain. arXiv admin note: substantial text overlap with arXiv:2105.09418

arXiv:2208.13064 [pdf]

A Diversity-Aware Domain Development Methodology

Authors: Mayukh Bagchi

Abstract: The development of domain ontological models, though being a mature research arena backed by well-established methodologies, still suffer from two key shortcomings. Firstly, the issues concerning the semantic persistency of ontology concepts and their flexible reuse in domain development employing existing approaches. Secondly, due to the difficulty in understanding and reusing top-level concepts… ▽ More The development of domain ontological models, though being a mature research arena backed by well-established methodologies, still suffer from two key shortcomings. Firstly, the issues concerning the semantic persistency of ontology concepts and their flexible reuse in domain development employing existing approaches. Secondly, due to the difficulty in understanding and reusing top-level concepts in existing foundational ontologies, the obfuscation regarding the semantic nature of domain representations. The paper grounds the aforementioned shortcomings in representation diversity and proposes a three-fold solution - (i) a pipeline for rendering concepts reuse-ready, (ii) a first characterization of a minimalistic foundational knowledge model, named foundational teleology, semantically explicating foundational distinctions enforcing the static as well as dynamic nature of domain representations, and (iii) a flexible, reuse-native methodology for diversity-aware domain development exploiting solutions (i) and (ii). The preliminary work reported validates the potentiality of the solution components. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: 41st International Conference on Conceptual Modeling (ER 2022), Online (Virtual)

arXiv:2207.06112 [pdf, other]

LiveSchema: A Gateway Towards Learning on Knowledge Graph Schemas

Authors: Mattia Fumagalli, Marco Boffo, Daqian Shi, Mayukh Bagchi, Fausto Giunchiglia

Abstract: One of the major barriers to the training of algorithms on knowledge graph schemas, such as vocabularies or ontologies, is the difficulty that scientists have in finding the best input resource to address the target prediction tasks. In addition to this, a key challenge is to determine how to manipulate (and embed) these data, which are often in the form of particular triples (i.e., subject, predi… ▽ More One of the major barriers to the training of algorithms on knowledge graph schemas, such as vocabularies or ontologies, is the difficulty that scientists have in finding the best input resource to address the target prediction tasks. In addition to this, a key challenge is to determine how to manipulate (and embed) these data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning process. In this paper, we describe the LiveSchema initiative, namely a gateway that offers a family of services to easily access, analyze, transform and exploit knowledge graph schemas, with the main goal of facilitating the reuse of these resources in machine learning use cases. As an early implementation of the initiative, we also advance an online catalog, which relies on more than 800 resources, with the first set of example services. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.01091 [pdf]

Representation Heterogeneity

Authors: Fausto Giunchiglia, Mayukh Bagchi

Abstract: Semantic Heterogeneity is conventionally understood as the existence of variance in the representation of a target reality when modelled, by independent parties, in different databases, schemas and/ or data. We argue that the mere encoding of variance, while being necessary, is not sufficient enough to deal with the problem of representational heterogeneity, given that it is also necessary to enco… ▽ More Semantic Heterogeneity is conventionally understood as the existence of variance in the representation of a target reality when modelled, by independent parties, in different databases, schemas and/ or data. We argue that the mere encoding of variance, while being necessary, is not sufficient enough to deal with the problem of representational heterogeneity, given that it is also necessary to encode the unifying basis on which such variance is manifested. To that end, this paper introduces a notion of Representation Heterogeneity in terms of the co-occurrent notions of Representation Unity and Representation Diversity. We have representation unity when two heterogeneous representations model the same target reality, representation diversity otherwise. In turn, this paper also highlights how these two notions get instantiated across the two layers of any representation, i.e., Language and Knowledge. △ Less

Submitted 3 July, 2022; originally announced July 2022.

arXiv:2202.13751 [pdf]

doi 10.1633/JISTaP.2023.11.1.5

GENOME: A GENeric methodology for Ontological Modelling of Epics

Authors: Udaya Varadarajan, Mayukh Bagchi, Amit Tiwari, M. P. Satija

Abstract: Ontological knowledge modelling of epics, though being an established research arena backed by concrete multilingual and multicultural works, still suffer from two key shortcomings. Firstly, all epic ontological models developed till date have been designed following ad-hoc methodologies, most often, combining existing general purpose ontology development methodologies. Secondly, none of the ad-ho… ▽ More Ontological knowledge modelling of epics, though being an established research arena backed by concrete multilingual and multicultural works, still suffer from two key shortcomings. Firstly, all epic ontological models developed till date have been designed following ad-hoc methodologies, most often, combining existing general purpose ontology development methodologies. Secondly, none of the ad-hoc methodologies consider the potential reuse of existing epic ontological models for enrichment, if available. The paper presents, as a unified solution to the above shortcomings, the design and development of GENOME - the first dedicated methodology for iterative ontological modelling of epics, potentially extensible to works in different research arenas of digital humanities in general. GENOME is grounded in transdisciplinary foundations of canonical norms for epics, knowledge modelling best practices, application satisfiability norms and cognitive generative questions. It is also the first methodology (in epic modelling but also in general) to be flexible enough to integrate, in practice, the options of knowledge modelling via reuse or from scratch. The feasibility of GENOME is validated via a first brief implementation of ontological modelling of the Indian epic - Mahabharata by reusing an existing ontology. The preliminary results are promising, with the GENOME-produced model being both ontologically thorough and performance-wise competent △ Less

Submitted 13 February, 2022; originally announced February 2022.

arXiv:2202.08512 [pdf, other]

Visual Ground Truth Construction as Faceted Classification

Authors: Fausto Giunchiglia, Mayukh Bagchi, Xiaolei Diao

Abstract: Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The consequences of this problem are major, in particular cons… ▽ More Recent work in Machine Learning and Computer Vision has provided evidence of systematic design flaws in the development of major object recognition benchmark datasets. One such example is ImageNet, wherein, for several categories of images, there are incongruences between the objects they represent and the labels used to annotate them. The consequences of this problem are major, in particular considering the large number of machine learning applications, not least those based on Deep Neural Networks, that have been trained on these datasets. In this paper we posit the problem to be the lack of a knowledge representation (KR) methodology providing the foundations for the construction of these ground truth benchmark datasets. Accordingly, we propose a solution articulated in three main steps: (i) deconstructing the object recognition process in four ordered stages grounded in the philosophical theory of teleosemantics; (ii) based on such stratification, proposing a novel four-phased methodology for organizing objects in classification hierarchies according to their visual properties; and (iii) performing such classification according to the faceted classification paradigm. The key novelty of our approach lies in the fact that we construct the classification hierarchies from visual properties exploiting visual genus-differentiae, and not from linguistically grounded properties. The proposed approach is validated by a set of experiments on the ImageNet hierarchy of musical experiments. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2202.08118 [pdf]

Smart Cities, Smart Libraries and Smart Knowledge Managers: Ushering in the neo-Knowledge Society

Authors: Mayukh Bagchi

Abstract: The emergence of smart cities as a specific concept is not very old. In simple terms, it refers to cities which are sustainable and driven predominantly by their Information and Communication Technology (ICT) infrastructure. Smart libraries and smart knowledge managers, alongside its other smart component-entities, are vital for their emergence, sustenance and progress. The paper attempts at deduc… ▽ More The emergence of smart cities as a specific concept is not very old. In simple terms, it refers to cities which are sustainable and driven predominantly by their Information and Communication Technology (ICT) infrastructure. Smart libraries and smart knowledge managers, alongside its other smart component-entities, are vital for their emergence, sustenance and progress. The paper attempts at deducing a symbiosis amongst smart cities, smart libraries and smart knowledge managers. It further elaborates on how these will usher in the neo-knowledge society, and the opportunities it'll offer vis-à-vis Library and Information Science (LIS). Finally, it concludes on an optimistic note, mentioning possible future research activities in this regard. △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2112.10531 [pdf]

Object Recognition as Classification via Visual Properties

Authors: Fausto Giunchiglia, Mayukh Bagchi

Abstract: We base our work on the teleosemantic modelling of concepts as abilities implementing the distinct functions of recognition and classification. Accordingly, we model two types of concepts - substance concepts suited for object recognition exploiting visual properties, and classification concepts suited for classification of substance concepts exploiting linguistically grounded properties. The goal… ▽ More We base our work on the teleosemantic modelling of concepts as abilities implementing the distinct functions of recognition and classification. Accordingly, we model two types of concepts - substance concepts suited for object recognition exploiting visual properties, and classification concepts suited for classification of substance concepts exploiting linguistically grounded properties. The goal in this paper is to demonstrate that object recognition can be construed as classification via visual properties, as distinct from work in mainstream computer vision. Towards that, we present an object recognition process based on Ranganathan's four-phased faceted knowledge organization process, grounded in the teleosemantic distinctions of substance concept and classification concept. We also briefly introduce the ongoing project MultiMedia UKC, whose aim is to build an object recognition resource following our proposed process △ Less

Submitted 28 December, 2021; v1 submitted 20 December, 2021; originally announced December 2021.

arXiv:2105.10923 [pdf, ps, other]

Towards Knowledge Organization Ecosystems

Authors: Mayukh Bagchi

Abstract: It is needless to mention the (already established) overarching importance of knowledge organization and its tried-and-tested high-quality schemes in knowledge-based Artificial Intelligence (AI) systems. But equally, it is also hard to ignore that, increasingly, standalone KOSs are becoming functionally ineffective components for such systems, given their inability to capture the continuous faceti… ▽ More It is needless to mention the (already established) overarching importance of knowledge organization and its tried-and-tested high-quality schemes in knowledge-based Artificial Intelligence (AI) systems. But equally, it is also hard to ignore that, increasingly, standalone KOSs are becoming functionally ineffective components for such systems, given their inability to capture the continuous facetization and drift of domains. The paper proposes a radical re-conceptualization of KOSs as a first step to solve such an inability, and, accordingly, contributes in the form of the following dimensions: (i) an explicit characterization of Knowledge Organization Ecosystems (KOEs) (possibly for the first time) and their positioning as pivotal components in realizing sustainable knowledge-based AI solutions, (ii) as a consequence of such a novel characterization, a first examination and characterization of KOEs as Socio-Technical Systems (STSs), thus opening up an entirely new stream of research in knowledge-based AI, and (iii) motivating KOEs not to be mere STSs but STSs which are grounded in Ethics and Responsible Artificial Intelligence cardinals from their very genesis. The paper grounds the above contributions in relevant research literature in a distributed fashion throughout the paper, and finally concludes by outlining the future research possibilities. △ Less

Submitted 23 May, 2021; originally announced May 2021.

arXiv:2105.09432 [pdf, other]

Stratified Data Integration

Authors: Fausto Giunchiglia, Alessio Zamboni, Mayukh Bagchi, Simone Bocca

Abstract: We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph wh… ▽ More We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph where nodes are entity types and links are properties), and data(in the form of a graph of entities populating the previous knowledge graph). This allows us to state the problem of semantic heterogeneity as a problem of Representation Diversity where the different types of heterogeneity, viz. Conceptual, Language, Knowledge, and Data, are uniformly dealt within each single layer, independently from the others. In this paper we describe the proposed stratified representation of data and the process by which data are first transformed into the target representation, then suitably integrated and then, finally, presented to the user in her preferred format. The proposed framework has been evaluated in various pilot case studies and in a number of industrial data integration problems. △ Less

Submitted 19 May, 2021; originally announced May 2021.

arXiv:2105.09422 [pdf, other]

Classifying concepts via visual properties

Authors: Fausto Giunchiglia, Mayukh Bagchi

Abstract: We assume that substances in the world are represented by two types of concepts, namely substance concepts and classification concepts, the former instrumental to (visual) perception, the latter to (language based) classification. Based on this distinction, we introduce a general methodology for building lexico-semantic hierarchies of substance concepts, where nodes are annotated with the media, e… ▽ More We assume that substances in the world are represented by two types of concepts, namely substance concepts and classification concepts, the former instrumental to (visual) perception, the latter to (language based) classification. Based on this distinction, we introduce a general methodology for building lexico-semantic hierarchies of substance concepts, where nodes are annotated with the media, e.g.,videos or photos, from which substance concepts are extracted, and are associated with the corresponding classification concepts. The methodology is based on Ranganathan's original faceted approach, contextualized to the problem of classifying substance concepts. The key novelty is that the hierarchy is built exploiting the visual properties of substance concepts, while the linguistically defined properties of classification concepts are only used to describe substance concepts. The validity of the approach is exemplified by providing some highlights of an ongoing project whose goal is to build a large scale multimedia multilingual concept hierarchy. △ Less

Submitted 19 May, 2021; originally announced May 2021.

arXiv:2105.09418 [pdf, other]

iTelos -- Purpose Driven Knowledge Graph Generation

Authors: Fausto Giunchiglia, Simone Bocca, Mattia Fumagalli, Mayukh Bagchi, Alessio Zamboni

Abstract: When building a new application we are more and more confronted with the need of reusing and integrating pre-existing knowledge, e.g., ontologies, schemas, data of any kind, from multiple sources. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This difficulty is the cause of high costs, with the further drawback that the resulting application will agai… ▽ More When building a new application we are more and more confronted with the need of reusing and integrating pre-existing knowledge, e.g., ontologies, schemas, data of any kind, from multiple sources. Nevertheless, it is a fact that this prior knowledge is virtually impossible to reuse as-is. This difficulty is the cause of high costs, with the further drawback that the resulting application will again be hardly reusable. It is a negative loop which consistently reinforces itself. iTelos is a general purpose methodology aiming at minimizing as much as possible the effects of this loop. iTelos is based on the intuition that the data level and the schema level of an application should be developed independently, thus allowing for maximum flexibility in the reuse of the prior knowledge, but under the overall guidance of the needs to be satisfied, formalized as competence queries. This intuition is implemented by codifying all the requirements, including those concerning reuse, as part of an a-priori defined purpose, which is then used to drive a middle-out development process where the application schema and data are continuously aligned. △ Less

Submitted 15 December, 2021; v1 submitted 19 May, 2021; originally announced May 2021.

Showing 1–20 of 20 results for author: Bagchi, M