Search | arXiv e-print repository

The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake

Authors: Riccardo Albertoni, David Browning, Simon Cox, Alejandra N. Gonzalez-Beltran, Andrea Perego, Peter Winstanley

Abstract: DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besid… ▽ More DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2110.07117 [pdf, other]

doi 10.1098/rsta.2021.0300

FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Authors: Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan Hollocombe, Bram Boskamp, Ryan Field, Dennis Reddyhoff, Kristian Zarebski, Antony Wilson, Bruno Viola, Martin Burke, Blair Archibald, Paul Bessell, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett, Ruth Dundas, Jessica Enright, Alejandra N. Gonzalez-Beltran, Claire Harris, Ian Hinder, Christopher David Hughes, Martin Knight, Vino Mano , et al. (13 additional authors not shown)

Abstract: Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily da… ▽ More Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that allows easy annotation of data as they are consumed by analyses, while tracing the provenance of scientific outputs back through the analytical source code to data sources. Such a tool provides a mechanism for the public, and fellow scientists, to better assess the trust that should be placed in scientific evidence, while allowing scientists to support policy-makers in openly justifying their decisions. We believe that tools such as this should be promoted for use across all areas of policy-facing research. △ Less

Submitted 4 May, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

arXiv:2012.13117 [pdf, other]

Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

Authors: Task Force on Best Practices for Software Registries, :, Alain Monteil, Alejandra Gonzalez-Beltran, Alexandros Ioannidis, Alice Allen, Allen Lee, Anita Bandrowski, Bruce E. Wilson, Bryce Mecum, Cai Fan Du, Carly Robinson, Daniel Garijo, Daniel S. Katz, David Long, Genevieve Milliken, Hervé Ménager, Jessica Hausman, Jurriaan H. Spaaks, Katrina Fenlon, Kristin Vanderbilt, Lorraine Hwang, Lynn Davis, Martin Fenner, Michael R. Crusoe , et al. (8 additional authors not shown)

Abstract: Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, develo** these r… ▽ More Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, develo** these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines. △ Less

Submitted 24 December, 2020; originally announced December 2020.

Comments: 18 pages

arXiv:2012.02325 [pdf]

doi 10.1371/journal.pcbi.1009041

Ten Simple Rules for making a vocabulary FAIR

Authors: Simon J D Cox, Alejandra N Gonzalez-Beltran, Barbara Magagna, Maria-Cristina Marinescu

Abstract: We present ten simple rules that support converting a legacy vocabulary -- a list of terms available in a print-based glossary or table not accessible using web standards -- into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept sho… ▽ More We present ten simple rules that support converting a legacy vocabulary -- a list of terms available in a print-based glossary or table not accessible using web standards -- into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept should be returned when the individual IRI is de-referenced, using SKOS or OWL serialised in an RDF-based representation for machine-interchange, or in a web-page for human consumption. Guidelines for vocabulary and item metadata are provided, as well as development and maintenance considerations. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 13 pages

Journal ref: PLoS Comput Biol 17(6): e1009041 (2021)

arXiv:1905.08674 [pdf]

Software Citation Implementation Challenges

Authors: Daniel S. Katz, Daina Bouquin, Neil P. Chue Hong, Jessica Hausman, Catherine Jones, Daniel Chivvis, Tim Clark, Mercè Crosas, Stephan Druskat, Martin Fenner, Tom Gillespie, Alejandra Gonzalez-Beltran, Morane Gruenpeter, Ted Habermann, Robert Haines, Melissa Harrison, Edwin Henneken, Lorraine Hwang, Matthew B. Jones, Alastair A. Kelly, David N. Kennedy, Katrin Leinweber, Fernando Rios, Carly B. Robinson, Ilian Todorov , et al. (2 additional authors not shown)

Abstract: The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibilit… ▽ More The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibility, and specificity) and discussed how they could be used to implement software citation in the scholarly community. In a series of talks and other activities, we have promoted software citation using these increasingly accepted principles. At the time the initial paper was published, we also provided guidance and examples on how to make software citable, though we now realize there are unresolved problems with that guidance. The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed. △ Less

Submitted 21 May, 2019; originally announced May 2019.

arXiv:1902.11162 [pdf]

The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data

Authors: P. Wittenburg, H. Pergl Sustkova, A. Montesanti, S. M. Bloemers, S. H. de Waard, M. A. Musen, J. B. Graybeal, K. M. Hettne, A. Jacobsen, R. Pergl, R. W. W. Hooft, C. Staiger, C. W. G. van Gelder, S. L. Knijnenburg, A. C. van Arkel, B. Meerman, M. D. Wilkinson, S-A Sansone, P. Rocca-Serra, P. McQuilton, A. N. Gonzalez-Beltran, G. J. C. Aben, P. Henning, S. Alencar, C. Ribeiro , et al. (35 additional authors not shown)

Abstract: There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for… ▽ More There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for Machines (M4M), enabling any self-identified stakeholder to define and promote the reuse of standardized, comprehensive machine-actionable metadata. The funders of scientific research recognize that they have an important role to play in ensuring that experimental results are FAIR, and that high quality metadata and careful planning for FAIR data stewardship are central to these goals. We describe the outcome of a recent M4M workshop that has led to a pilot programme involving two national science funders, the Health Research Board of Ireland (HRB) and the Netherlands Organisation for Health Research and Development (ZonMW). These funding organizations will explore new technologies to define at the time that a request for proposals is issued the minimal set of machine-actionable metadata that they would like investigators to use to annotate their datasets, to enable investigators to create such metadata to help make their data FAIR, and to develop data-stewardship plans that ensure that experimental data will be managed appropriately abiding by the FAIR principles. The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop. △ Less

Submitted 6 March, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: This is a pre-print of the FAIR Funders pilot, an outcome of the first Metadata for Machines workshop, see: https://www.go-fair.org/resources/go-fair-workshop-series/metadata-for-machines-workshops/. Corresponding author: E. A Schultes, ORCID 0000-0001-8888-635X

arXiv:1012.5506 [pdf, other]

Ontology-based Queries over Cancer Data

Authors: Alejandra Gonzalez-Beltran, Ben Tagger, Anthony Finkelstein

Abstract: The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resour… ▽ More The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resources without exploiting the semantic annotations. In this paper, we present the design and development of an ontology-based querying functionality that consists of: the generation of OWL2 ontologies from the underlying data resources metadata and a query rewriting and translation process based on reasoning, which converts a query at the domain ontology level into queries at the software infrastructure level. We present a detailed analysis of our approach as well as an extensive performance evaluation. While the implementation and evaluation was performed for the caGrid infrastructure, the approach could be applicable to other model and metadata-driven environments for data sharing. △ Less

Submitted 26 December, 2010; originally announced December 2010.

Comments: in Adrian Paschke, Albert Burger, Andrea Splendiani, M. Scott Marshall, Paolo Romano: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences, Berlin,Germany, December 8-10, 2010

Report number: SWAT4LS 2010 ACM Class: J.3

Showing 1–7 of 7 results for author: Gonzalez-Beltran, A