-
The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake
Authors:
Riccardo Albertoni,
David Browning,
Simon Cox,
Alejandra N. Gonzalez-Beltran,
Andrea Perego,
Peter Winstanley
Abstract:
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besid…
▽ More
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows
Authors:
Sonia Natalie Mitchell,
Andrew Lahiff,
Nathan Cummings,
Jonathan Hollocombe,
Bram Boskamp,
Ryan Field,
Dennis Reddyhoff,
Kristian Zarebski,
Antony Wilson,
Bruno Viola,
Martin Burke,
Blair Archibald,
Paul Bessell,
Richard Blackwell,
Lisa A Boden,
Alys Brett,
Sam Brett,
Ruth Dundas,
Jessica Enright,
Alejandra N. Gonzalez-Beltran,
Claire Harris,
Ian Hinder,
Christopher David Hughes,
Martin Knight,
Vino Mano
, et al. (13 additional authors not shown)
Abstract:
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily da…
▽ More
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of "following the science" are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that allows easy annotation of data as they are consumed by analyses, while tracing the provenance of scientific outputs back through the analytical source code to data sources. Such a tool provides a mechanism for the public, and fellow scientists, to better assess the trust that should be placed in scientific evidence, while allowing scientists to support policy-makers in openly justifying their decisions. We believe that tools such as this should be promoted for use across all areas of policy-facing research.
△ Less
Submitted 4 May, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide
Authors:
Task Force on Best Practices for Software Registries,
:,
Alain Monteil,
Alejandra Gonzalez-Beltran,
Alexandros Ioannidis,
Alice Allen,
Allen Lee,
Anita Bandrowski,
Bruce E. Wilson,
Bryce Mecum,
Cai Fan Du,
Carly Robinson,
Daniel Garijo,
Daniel S. Katz,
David Long,
Genevieve Milliken,
Hervé Ménager,
Jessica Hausman,
Jurriaan H. Spaaks,
Katrina Fenlon,
Kristin Vanderbilt,
Lorraine Hwang,
Lynn Davis,
Martin Fenner,
Michael R. Crusoe
, et al. (8 additional authors not shown)
Abstract:
Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, develo** these r…
▽ More
Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, develo** these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Ten Simple Rules for making a vocabulary FAIR
Authors:
Simon J D Cox,
Alejandra N Gonzalez-Beltran,
Barbara Magagna,
Maria-Cristina Marinescu
Abstract:
We present ten simple rules that support converting a legacy vocabulary -- a list of terms available in a print-based glossary or table not accessible using web standards -- into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept sho…
▽ More
We present ten simple rules that support converting a legacy vocabulary -- a list of terms available in a print-based glossary or table not accessible using web standards -- into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept should be returned when the individual IRI is de-referenced, using SKOS or OWL serialised in an RDF-based representation for machine-interchange, or in a web-page for human consumption. Guidelines for vocabulary and item metadata are provided, as well as development and maintenance considerations. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Software Citation Implementation Challenges
Authors:
Daniel S. Katz,
Daina Bouquin,
Neil P. Chue Hong,
Jessica Hausman,
Catherine Jones,
Daniel Chivvis,
Tim Clark,
Mercè Crosas,
Stephan Druskat,
Martin Fenner,
Tom Gillespie,
Alejandra Gonzalez-Beltran,
Morane Gruenpeter,
Ted Habermann,
Robert Haines,
Melissa Harrison,
Edwin Henneken,
Lorraine Hwang,
Matthew B. Jones,
Alastair A. Kelly,
David N. Kennedy,
Katrin Leinweber,
Fernando Rios,
Carly B. Robinson,
Ilian Todorov
, et al. (2 additional authors not shown)
Abstract:
The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibilit…
▽ More
The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibility, and specificity) and discussed how they could be used to implement software citation in the scholarly community. In a series of talks and other activities, we have promoted software citation using these increasingly accepted principles. At the time the initial paper was published, we also provided guidance and examples on how to make software citable, though we now realize there are unresolved problems with that guidance. The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data
Authors:
P. Wittenburg,
H. Pergl Sustkova,
A. Montesanti,
S. M. Bloemers,
S. H. de Waard,
M. A. Musen,
J. B. Graybeal,
K. M. Hettne,
A. Jacobsen,
R. Pergl,
R. W. W. Hooft,
C. Staiger,
C. W. G. van Gelder,
S. L. Knijnenburg,
A. C. van Arkel,
B. Meerman,
M. D. Wilkinson,
S-A Sansone,
P. Rocca-Serra,
P. McQuilton,
A. N. Gonzalez-Beltran,
G. J. C. Aben,
P. Henning,
S. Alencar,
C. Ribeiro
, et al. (35 additional authors not shown)
Abstract:
There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for…
▽ More
There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for Machines (M4M), enabling any self-identified stakeholder to define and promote the reuse of standardized, comprehensive machine-actionable metadata. The funders of scientific research recognize that they have an important role to play in ensuring that experimental results are FAIR, and that high quality metadata and careful planning for FAIR data stewardship are central to these goals. We describe the outcome of a recent M4M workshop that has led to a pilot programme involving two national science funders, the Health Research Board of Ireland (HRB) and the Netherlands Organisation for Health Research and Development (ZonMW). These funding organizations will explore new technologies to define at the time that a request for proposals is issued the minimal set of machine-actionable metadata that they would like investigators to use to annotate their datasets, to enable investigators to create such metadata to help make their data FAIR, and to develop data-stewardship plans that ensure that experimental data will be managed appropriately abiding by the FAIR principles. The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop.
△ Less
Submitted 6 March, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Ontology-based Queries over Cancer Data
Authors:
Alejandra Gonzalez-Beltran,
Ben Tagger,
Anthony Finkelstein
Abstract:
The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resour…
▽ More
The ever-increasing amount of data in biomedical research, and in cancer research in particular, needs to be managed to support efficient data access, exchange and integration. Existing software infrastructures, such caGrid, support access to distributed information annotated with a domain ontology. However, caGrid's current querying functionality depends on the structure of individual data resources without exploiting the semantic annotations. In this paper, we present the design and development of an ontology-based querying functionality that consists of: the generation of OWL2 ontologies from the underlying data resources metadata and a query rewriting and translation process based on reasoning, which converts a query at the domain ontology level into queries at the software infrastructure level. We present a detailed analysis of our approach as well as an extensive performance evaluation. While the implementation and evaluation was performed for the caGrid infrastructure, the approach could be applicable to other model and metadata-driven environments for data sharing.
△ Less
Submitted 26 December, 2010;
originally announced December 2010.