-
Recording provenance of workflow runs with RO-Crate
Authors:
Simone Leo,
Michael R. Crusoe,
Laura Rodríguez-Navas,
Raül Sirvent,
Alexander Kanitz,
Paul De Geest,
Rudolf Wittner,
Luca Pireddu,
Daniel Garijo,
José M. Fernández,
Iacopo Colonnelli,
Matej Gallo,
Tazro Ohta,
Hirotaka Suetake,
Salvador Capella-Gutierrez,
Renske de Wit,
Bruno P. Kinoshita,
Stian Soiland-Reyes
Abstract:
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to…
▽ More
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated products (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.10368989
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Workflows Community Summit 2022: A Roadmap Revolution
Authors:
Rafael Ferreira da Silva,
Rosa M. Badia,
Venkat Bala,
Debbie Bard,
Peer-Timo Bremer,
Ian Buckley,
Silvina Caino-Lores,
Kyle Chard,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Manish Parashar,
Frederic Suter,
Nick Tyler,
Thomas Uram,
Ilkay Altintas,
Stefan Andersson,
William Arndt,
Juan Aznar,
Jonathan Bader,
Bartosz Balis,
Chris Blanton,
Kelly Rosa Braghetto,
Aharon Brodutch
, et al. (80 additional authors not shown)
Abstract:
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t…
▽ More
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
MPI+OpenMP Tasking Scalability for Multi-Morphology Simulations of the Human Brain
Authors:
Pedro Valero-Lara,
Raül Sirvent,
Antonio J. Peña,
Jesús Labarta
Abstract:
The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging target. In this work, we focus on the most important European initiative (the Human Brain Project) and on one of the models developed in this project.…
▽ More
The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging target. In this work, we focus on the most important European initiative (the Human Brain Project) and on one of the models developed in this project. This tool simulates the spikes triggered in a neural network by computing the voltage capacitance on the neurons' morphology, being one of the most precise simulators today. In the present work, we have evaluated the use of MPI+OpenMP tasking on top of this framework. We prove that this approach is able to achieve a good scaling even when computing a relatively low workload (number of neurons) per node. One of our targets consists of achieving not only a highly scalable implementation, but also to develop a tool with a high degree of abstraction without losing control and performance by using \emph{MPI+OpenMP} tasking. The main motivation of this work is the evaluation of this cutting-edge simulation on multi-morphology neural networks. The simulation of a high number of neurons, which are completely different among them, is an important challenge. In fact, in the multi-morphology simulations, we find an important unbalancing between the nodes, mainly due to the differences in the neurons, which causes an important under-utilization of the available resources. In this work, the authors present and evaluate mechanisms to deal with this and reduce the time of this kind of simulations considerably.
△ Less
Submitted 13 May, 2020;
originally announced May 2020.
-
TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation
Authors:
Karim Djemame,
Django Armstrong,
Richard Kavanagh,
Jean-Christophe Deprez,
Ana Juan Ferrer,
David Garcia Perez,
Rosa Badia,
Raul Sirvent,
Jorge Ejarque,
Yiannis Georgiou
Abstract:
The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources. It argues the need for novel methods and tools to support software developers aiming to optimise power consumption resulting from designing, develo**, deploying and running software on HPAs, while maintaining oth…
▽ More
The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources. It argues the need for novel methods and tools to support software developers aiming to optimise power consumption resulting from designing, develo**, deploying and running software on HPAs, while maintaining other quality aspects of software to adequate and agreed levels. To do so, a reference architecture to support energy efficiency at application construction, deployment, and operation is discussed, as well as its implementation and evaluation plans.
△ Less
Submitted 4 March, 2016;
originally announced March 2016.