-
The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake
Authors:
Riccardo Albertoni,
David Browning,
Simon Cox,
Alejandra N. Gonzalez-Beltran,
Andrea Perego,
Peter Winstanley
Abstract:
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besid…
▽ More
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Reproducibility of Machine Learning: Terminology, Recommendations and Open Issues
Authors:
Riccardo Albertoni,
Sara Colantonio,
Piotr SkrzypczyĆski,
Jerzy Stefanowski
Abstract:
Reproducibility is one of the core dimensions that concur to deliver Trustworthy Artificial Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce the same or a similar experiment or method, thereby obtaining the same or similar results as the original scientists. It is an essential ingredient of the scientific method and crucial for gaining trust in relevan…
▽ More
Reproducibility is one of the core dimensions that concur to deliver Trustworthy Artificial Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce the same or a similar experiment or method, thereby obtaining the same or similar results as the original scientists. It is an essential ingredient of the scientific method and crucial for gaining trust in relevant claims. A reproducibility crisis has been recently acknowledged by scientists and this seems to affect even more Artificial Intelligence and Machine Learning, due to the complexity of the models at the core of their recent successes. Notwithstanding the recent debate on Artificial Intelligence reproducibility, its practical implementation is still insufficient, also because many technical issues are overlooked. In this survey, we critically review the current literature on the topic and highlight the open issues. Our contribution is three-fold. We propose a concise terminological review of the terms coming into play. We collect and systematize existing recommendations for achieving reproducibility, putting forth the means to comply with them. We identify key elements often overlooked in modern Machine Learning and provide novel recommendations for them. We further specialize these for two critical application domains, namely the biomedical and physical artificial intelligence fields.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
The CHRONIOUS Ontology-Driven Search Tool: Enabling Access to Focused and Up-to-Date Healthcare Literature
Authors:
Stephan Kiefer,
Jochen Rauch,
Riccardo Albertoni,
Marco Attene,
Franca Giannini,
Simone Marini,
Luc Schneider,
Carlos Mesquita,
Xin Xing,
Michael Lawo
Abstract:
This paper presents an advanced search engine prototype for bibliography retrieval developed within the CHRONIOUS European IP project of the seventh Framework Program (FP7). This search engine is specifically targeted to clinicians and healthcare practitioners searching for documents related to Chronic Obstructive Pulmonary Disease (COPD) and Chronic Kidney Disease (CKD). To this aim, the presente…
▽ More
This paper presents an advanced search engine prototype for bibliography retrieval developed within the CHRONIOUS European IP project of the seventh Framework Program (FP7). This search engine is specifically targeted to clinicians and healthcare practitioners searching for documents related to Chronic Obstructive Pulmonary Disease (COPD) and Chronic Kidney Disease (CKD). To this aim, the presented tool exploits two pathology-specific ontologies that allow focused document indexing and retrieval. These ontologies have been developed on the top of the Middle Layer Ontology for Clinical Care (MLOCC), which provides a link with the Basic Formal Ontology, a foundational ontology used in the Open Biological and Biomedical Ontologies (OBO) Foundry. In addition link with the terms of the MeSH (Medical Subject Heading) thesaurus has been provided to guarantee the coverage with the general certified medical terms and multilingual capabilities.
△ Less
Submitted 11 October, 2011;
originally announced October 2011.
-
Semantic Technology to Exploit Digital Content Exposed as Linked Data
Authors:
Riccardo Albertoni,
Monica De Martino
Abstract:
The paper illustrates the research result of the application of semantic technology to ease the use and reuse of digital contents exposed as Linked Data on the web. It focuses on the specific issue of explorative research for the resource selection: a context dependent semantic similarity assessment is proposed in order to compare datasets annotated through terminologies exposed as Linked Data (e.…
▽ More
The paper illustrates the research result of the application of semantic technology to ease the use and reuse of digital contents exposed as Linked Data on the web. It focuses on the specific issue of explorative research for the resource selection: a context dependent semantic similarity assessment is proposed in order to compare datasets annotated through terminologies exposed as Linked Data (e.g. habitats, species). Semantic similarity is shown as a building block technology to sift linked data resources. From semantic similarity application, we derived a set of recommendations underlying open issues in scaling the similarity assessment up to the Web of Data.
△ Less
Submitted 11 October, 2011;
originally announced October 2011.
-
A multilingual/multicultural semantic-based approach to improve Data Sharing in an SDI for Nature Conservation
Authors:
Monica De Martino,
Riccardo Albertoni
Abstract:
The paper proposes an approach to transcend multicultural and multilingual barriers in the use and reuse of geographical data at the European level. The approach aims at sharing scientific terms in the field of nature conservation with the goal of assisting different user communities with metadata compilation and information discovery. A multi-thesauri solution is proposed, based on a Common Thesa…
▽ More
The paper proposes an approach to transcend multicultural and multilingual barriers in the use and reuse of geographical data at the European level. The approach aims at sharing scientific terms in the field of nature conservation with the goal of assisting different user communities with metadata compilation and information discovery. A multi-thesauri solution is proposed, based on a Common Thesaurus Framework for Nature Conservation, where different well-known Knowledge Organization Systems are assembled and shared. It has been designed according to semantic web and W3C recommendations employing SKOS standard models and Linked Data to publish the thesauri as a whole in machine-understandable format. The outcome is a powerful framework satisfying the requirements of modularity and openness for further thesaurus extension and updating, interlinking among thesauri, and exploitability from other systems. The paper supports the employment of Linked Data to deal with terminologies in complex domains such as nature conservation and it proposes a hands-on recipe to publish thesauri in the framework.
△ Less
Submitted 8 July, 2011;
originally announced July 2011.
-
A Joint Initiative to Support the Semantic Interoperability within the GIIDA Project
Authors:
Paolo Plini,
Sabin Di Franco,
Valentina De Santis,
Vito F. Uricchio,
Dario De Carlo,
Stefania D'Arpa,
Monica De Martino,
Riccardo Albertoni
Abstract:
The GIIDA project aims to develop a digital infrastructure for the spatial information within CNR. It is foreseen to use semantic-oriented technologies to ease information modeling and connecting, according to international standards like the ISO/IEC 11179. Complex information management systems, like GIIDA, will take benefit from the use of terminological tools like thesauri that make available a…
▽ More
The GIIDA project aims to develop a digital infrastructure for the spatial information within CNR. It is foreseen to use semantic-oriented technologies to ease information modeling and connecting, according to international standards like the ISO/IEC 11179. Complex information management systems, like GIIDA, will take benefit from the use of terminological tools like thesauri that make available a reference lexicon for the indexing and retrieval of information. Within GIIDA the goal is to make available the EARTh thesaurus (Environmental Applications Reference Thesaurus), developed by the CNR-IIA-EKOLab. A web-based software, developed by the CNR-Water Research Institute (IRSA) was implemented to allow consultation and utilization of thesaurus through the web. This service is a useful tool to ensure interoperability between thesaurus and other systems of the indexing, with, the idea of cooperating to develop a comprehensive system of knowledge organization, that could be defined integrated, open, multi-functional and multilingual. Currently the system is available in multiple languages mode (Italian - English) and navigation can be done in the following ways: Alphabetical, Hierarchical and for Themes. A full search allows to find any term by searching for the whole term or a part of it and as well as allows to filter the results by themes. Within a collaborative initiative with the CNR-Institute of Applied Mathematics and Information Technology (IMATI) a SKOS (Simple Knowledge Organization System) version of EARTh was developed. This will ensure the possibility to support the use of the thesaurus within the framework of the Semantic Web in order to be used in decentralized metadata applications
△ Less
Submitted 1 December, 2010;
originally announced December 2010.
-
Using Context Dependent Semantic Similarity to Browse Information Resources: an Application for the Industrial Design
Authors:
Riccardo Albertoni,
Monica De Martino
Abstract:
This paper deals with the semantic interpretation of information resources (e.g., images, videos, 3D models). We present a case study of an approach based on semantic and context dependent similarity applied to the industrial design. Different application contexts are considered and modelled to browse a repository of 3D digital objects according to different perspectives. The paper briefly summari…
▽ More
This paper deals with the semantic interpretation of information resources (e.g., images, videos, 3D models). We present a case study of an approach based on semantic and context dependent similarity applied to the industrial design. Different application contexts are considered and modelled to browse a repository of 3D digital objects according to different perspectives. The paper briefly summarises the basic concepts behind the semantic similarity approach and illustrates its application and results.
△ Less
Submitted 12 October, 2010;
originally announced October 2010.