The GA4GH Task Execution API: Enabling Easy Multi Cloud Task Execution
Authors:
Alexander Kanitz,
Matthew H. McLoughlin,
Liam Beckman,
Venkat S. Malladi,
Kyle P. Ellrott
Abstract:
The Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) API is a standardized schema and API for describing and executing batch execution tasks. It provides a common way to submit and manage tasks to a variety of compute environments, including on premise High Performance Compute and High Throughput Computing (HPC/HTC) systems, Cloud computing platforms, and hybrid environ…
▽ More
The Global Alliance for Genomics and Health (GA4GH) Task Execution Service (TES) API is a standardized schema and API for describing and executing batch execution tasks. It provides a common way to submit and manage tasks to a variety of compute environments, including on premise High Performance Compute and High Throughput Computing (HPC/HTC) systems, Cloud computing platforms, and hybrid environments. The TES API is designed to be flexible and extensible, allowing it to be adapted to a wide range of use cases, such as "bringing compute to the data" solutions for federated and distributed data analysis or load balancing across multi cloud infrastructures. This API has been adopted by a number of different service providers and utilized by several workflow engines. Using its capabilities, genomes research institutes are building hybrid compute systems to study life science.
△ Less
Submitted 8 February, 2024;
originally announced May 2024.
Recording provenance of workflow runs with RO-Crate
Authors:
Simone Leo,
Michael R. Crusoe,
Laura Rodríguez-Navas,
Raül Sirvent,
Alexander Kanitz,
Paul De Geest,
Rudolf Wittner,
Luca Pireddu,
Daniel Garijo,
José M. Fernández,
Iacopo Colonnelli,
Matej Gallo,
Tazro Ohta,
Hirotaka Suetake,
Salvador Capella-Gutierrez,
Renske de Wit,
Bruno P. Kinoshita,
Stian Soiland-Reyes
Abstract:
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to…
▽ More
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated products (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.10368989
△ Less
Submitted 12 December, 2023;
originally announced December 2023.