-
Microservices a Definition Analyzed by ßMACH
Authors:
Marcus Hilbrich,
Ninon De Mecquenem
Abstract:
Managing software artifacts is one of the most essential aspects of computer science. It enables to develop, operate, and maintain software in an engineer-like manner. Therefore, numerous concrete strategies, methods, best practices, and concepts are available. A combination of such methods must be adequate, efficient, applicable, and effective for a concrete project. Eelsewise, the developers, ma…
▽ More
Managing software artifacts is one of the most essential aspects of computer science. It enables to develop, operate, and maintain software in an engineer-like manner. Therefore, numerous concrete strategies, methods, best practices, and concepts are available. A combination of such methods must be adequate, efficient, applicable, and effective for a concrete project. Eelsewise, the developers, managers, and testers should understand it to avoid chaos. Therefore, we exemplify the ßMACH method that provides software guidance. The method can point out missing management aspects (e.g., the V-model is not usable for software operation), identify problems of knowledge transfer (e.g., how is responsible for requirements), provide an understandable management description (e.g., the developers describe what they do), and some more. The method provides a unified, knowledge-based description strategy applicable to all software management strategies. It provides a method to create a minimal but complete description. In this paper, we apply ßMACH to the microservice concept to explain both and to test the applicability and the advantages of ßMACH.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Validity Constraints for Data Analysis Workflows
Authors:
Florian Schintke,
Ninon De Mecquenem,
David Frantz,
Vanessa Emanuela Guarino,
Marcus Hilbrich,
Fabian Lehmann,
Rebecca Sattler,
Jan Arne Sparka,
Daniel Speckhard,
Hermann Stolte,
Anh Duc Vu,
Ulf Leser
Abstract:
Porting a scientific data analysis workflow (DAW) to a cluster infrastructure, a new software stack, or even only a new dataset with some notably different properties is often challenging. Despite the structured definition of the steps (tasks) and their interdependencies during a complex data analysis in the DAW specification, relevant assumptions may remain unspecified and implicit. Such hidden a…
▽ More
Porting a scientific data analysis workflow (DAW) to a cluster infrastructure, a new software stack, or even only a new dataset with some notably different properties is often challenging. Despite the structured definition of the steps (tasks) and their interdependencies during a complex data analysis in the DAW specification, relevant assumptions may remain unspecified and implicit. Such hidden assumptions often lead to crashing tasks without a reasonable error message, poor performance in general, non-terminating executions, or silent wrong results of the DAW, to name only a few possible consequences. Searching for the causes of such errors and drawbacks in a distributed compute cluster managed by a complex infrastructure stack, where DAWs for large datasets typically are executed, can be tedious and time-consuming.
We propose validity constraints (VCs) as a new concept for DAW languages to alleviate this situation. A VC is a constraint specifying some logical conditions that must be fulfilled at certain times for DAW executions to be valid. When defined together with a DAW, VCs help to improve the portability, adaptability, and reusability of DAWs by making implicit assumptions explicit. Once specified, VC can be controlled automatically by the DAW infrastructure, and violations can lead to meaningful error messages and graceful behaviour (e.g., termination or invocation of repair mechanisms). We provide a broad list of possible VCs, classify them along multiple dimensions, and compare them to similar concepts one can find in related fields. We also provide a first sketch for VCs' implementation into existing DAW infrastructures.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Workflows Community Summit 2022: A Roadmap Revolution
Authors:
Rafael Ferreira da Silva,
Rosa M. Badia,
Venkat Bala,
Debbie Bard,
Peer-Timo Bremer,
Ian Buckley,
Silvina Caino-Lores,
Kyle Chard,
Carole Goble,
Shantenu Jha,
Daniel S. Katz,
Daniel Laney,
Manish Parashar,
Frederic Suter,
Nick Tyler,
Thomas Uram,
Ilkay Altintas,
Stefan Andersson,
William Arndt,
Juan Aznar,
Jonathan Bader,
Bartosz Balis,
Chris Blanton,
Kelly Rosa Braghetto,
Aharon Brodutch
, et al. (80 additional authors not shown)
Abstract:
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t…
▽ More
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.