Recording provenance of workflow runs with RO-Crate
Authors:
Simone Leo,
Michael R. Crusoe,
Laura Rodríguez-Navas,
Raül Sirvent,
Alexander Kanitz,
Paul De Geest,
Rudolf Wittner,
Luca Pireddu,
Daniel Garijo,
José M. Fernández,
Iacopo Colonnelli,
Matej Gallo,
Tazro Ohta,
Hirotaka Suetake,
Salvador Capella-Gutierrez,
Renske de Wit,
Bruno P. Kinoshita,
Stian Soiland-Reyes
Abstract:
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to…
▽ More
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated products (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.10368989
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
EOSC-LIFE WP4 TOOLBOX: Toolbox for sharing of sensitive data -- a concept description
Authors:
Jan-Willem Boiten,
Christian Ohmann,
Ayodeji Adeniran,
Steve Canham,
Monica Cano Abadia,
Gauthier Chassang,
Maria Luisa Chiusano,
Romain David,
Maddalena Fratelli,
Phil Gribbon,
Petr Holub,
Rebecca Ludwig,
Michaela Th. Mayrhofer,
Mihaela Matei,
Arshiya Merchant,
Maria Panagiotopoulou,
Luca Pireddu,
Alex Sanchez Pla,
Irene Schlünder,
George Tsamis,
Harald Wagener
Abstract:
The Horizon 2020 project EOSC-Life brings together the 13 Life Science 'ESFRI' research infrastructures to create an open, digital and collaborative space for biological and medical research. Sharing sensitive data is a specific challenge within EOSC-Life. For that reason, a toolbox is being developed, providing information to researchers who wish to share and/or use sensitive data in a cloud envi…
▽ More
The Horizon 2020 project EOSC-Life brings together the 13 Life Science 'ESFRI' research infrastructures to create an open, digital and collaborative space for biological and medical research. Sharing sensitive data is a specific challenge within EOSC-Life. For that reason, a toolbox is being developed, providing information to researchers who wish to share and/or use sensitive data in a cloud environment in general, and the European Open Science Cloud in particular. The sensitivity of the data may arise from its personal nature but can also be caused by intellectual property considerations, biohazard concerns, or the Nagoya protocol. The toolbox will not create new content, instead, it will allow researchers to find existing resources that are relevant for sharing sensitive data across all participating research infrastructures (F in FAIR). The toolbox will provide links to recommendations, procedures, and best practices, as well as to software (tools) to support data sharing and reuse. It will be based upon a tagging (categorisation) system, allowing consistent labelling and categorisation of resources. The current design document provides an outline for the anticipated toolbox, as well as its basic principles regarding content and sustainability.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.