-
Foundational Competencies and Responsibilities of a Research Software Engineer
Authors:
Florian Goth,
Renato Alves,
Matthias Braun,
Leyla Jael Castro,
Gerasimos Chourdakis,
Simon Christ,
Jeremy Cohen,
Fredo Erxleben,
Jean-Noël Grad,
Magnus Hagdorn,
Toby Hodges,
Guido Juckeland,
Dominic Kempf,
Anna-Lena Lamprecht,
Jan Linxweiler,
Frank Löffler,
Michele Martone,
Moritz Schwarzmeier,
Heidi Seibold,
Jan Philipp Thiele,
Harald von Waldow,
Samantha Wittke
Abstract:
The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum,…
▽ More
The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum, RSE roles may look similar to a traditional research role. At the other extreme, they resemble that of a software engineer in industry. Most RSE roles inhabit the space between these two extremes. Therefore, providing a straightforward, comprehensive definition of what an RSE does and what experience, skills and competencies are required to become one is challenging. In this community paper we define the broad notion of what an RSE is, explore the different types of work they undertake, and define a list of fundamental competencies as well as values that define the general profile of an RSE. On this basis, we elaborate on the progression of these skills along different dimensions, looking at specific types of RSE roles, proposing recommendations for organisations, and giving examples of future specialisations. An appendix details how existing curricula fit into this framework.
△ Less
Submitted 12 April, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Need for Design Patterns: Interoperability Issues and Modelling Challenges for Observational Data
Authors:
Trupti Padiya,
Frank Löffler,
Friederike Klan
Abstract:
Interoperability issues concerning observational data have gained attention in recent times. Automated data integration is important when it comes to the scientific analysis of observational data from different sources. However, it is hampered by various data interoperability issues. We focus exclusively on semantic interoperability issues for observational characteristics. We propose a use-case-d…
▽ More
Interoperability issues concerning observational data have gained attention in recent times. Automated data integration is important when it comes to the scientific analysis of observational data from different sources. However, it is hampered by various data interoperability issues. We focus exclusively on semantic interoperability issues for observational characteristics. We propose a use-case-driven approach to identify general classes of interoperability issues. In this paper, this is exemplarily done for the use-case of citizen science fireball observations. We derive key concepts for the identified interoperability issues that are generalizable to observational data in other fields of science. These key concepts contain several modeling challenges, and we broadly describe each modeling challenges associated with its interoperability issue. We believe, that addressing these challenges with a set of ontology design patterns will be an effective means for unified semantic modeling, paving the way for a unified approach for resolving interoperability issues in observational data. We demonstrate this with one design pattern, highlighting the importance and need for ontology design patterns for observational data, and leave the remaining patterns to future work. Our paper thus describes interoperability issues along with modeling challenges as a starting point for develo** a set of extensible and reusable design patterns.
△ Less
Submitted 26 August, 2022;
originally announced August 2022.
-
Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles
Authors:
Sheeba Samuel,
Frank Löffler,
Birgitta König-Ries
Abstract:
Machine learning (ML) is an increasingly important scientific tool supporting decision making and knowledge generation in numerous fields. With this, it also becomes more and more important that the results of ML experiments are reproducible. Unfortunately, that often is not the case. Rather, ML, similar to many other disciplines, faces a reproducibility crisis. In this paper, we describe our goal…
▽ More
Machine learning (ML) is an increasingly important scientific tool supporting decision making and knowledge generation in numerous fields. With this, it also becomes more and more important that the results of ML experiments are reproducible. Unfortunately, that often is not the case. Rather, ML, similar to many other disciplines, faces a reproducibility crisis. In this paper, we describe our goals and initial steps in supporting the end-to-end reproducibility of ML pipelines. We investigate which factors beyond the availability of source code and datasets influence reproducibility of ML experiments. We propose ways to apply FAIR data practices to ML workflows. We present our preliminary results on the role of our tool, ProvBook, in capturing and comparing provenance of ML experiments and their reproducibility using Jupyter Notebooks.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
An Environment for Sustainable Research Software in Germany and Beyond: Current State, Open Challenges, and Call for Action
Authors:
Hartwig Anzt,
Felix Bach,
Stephan Druskat,
Frank Löffler,
Axel Loewe,
Bernhard Y. Renard,
Gunnar Seemann,
Alexander Struck,
Elke Achhammer,
Piush Aggarwal,
Franziska Appel,
Michael Bader,
Lutz Brusch,
Christian Busse,
Gerasimos Chourdakis,
Piotr W. Dabrowski,
Peter Ebert,
Bernd Flemisch,
Sven Friedl,
Bernadette Fritzsch,
Maximilian D. Funk,
Volker Gast,
Florian Goth,
Jean-Noël Grad,
Sibylle Hermann
, et al. (18 additional authors not shown)
Abstract:
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software…
▽ More
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
△ Less
Submitted 5 May, 2020; v1 submitted 27 April, 2020;
originally announced May 2020.
-
Dataset Search In Biodiversity Research: Do Metadata In Data Repositories Reflect Scholarly Information Needs?
Authors:
Felicitas Löffler,
Valentin Wesp,
Birgitta König-Ries,
Friederike Klan
Abstract:
The increasing amount of research data provides the opportunity to link and integrate data to create novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retri…
▽ More
The increasing amount of research data provides the opportunity to link and integrate data to create novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. We analyze the primary source in dataset search - metadata - and determine if they reflect scholarly search interests. We examine if metadata standards provide elements corresponding to search interests, we inspect if selected data repositories use metadata standards representing scholarly interests, and we determine how many fields of the metadata standards used are filled. To determine search interests in biodiversity research, we gathered 169 questions that researchers aimed to answer with the help of retrieved data, identified biological entities and grouped them into 13 categories. Our findings indicate that environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important search interests in biodiversity research. The comparison with existing metadata standards shows that domain-specific standards cover search interests quite well, whereas general standards do not explicitly contain elements that reflect search interests. We inspect metadata from five large data repositories. Our results confirm that metadata currently poorly reflect search interests in biodiversity research. From these findings, we derive recommendations for researchers and data repositories how to bridge the gap between search interest and metadata provided.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
A Survey of High Level Frameworks in Block-Structured Adaptive Mesh Refinement Packages
Authors:
Anshu Dubey,
Ann Almgren,
John Bell,
Martin Berzins,
Steve Brandt,
Greg Bryan,
Phillip Colella,
Daniel Graves,
Michael Lijewski,
Frank Löffler,
Brian O'Shea,
Erik Schnetter,
Brian Van Straalen,
Klaus Weide
Abstract:
Over the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a…
▽ More
Over the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah.
△ Less
Submitted 27 October, 2016;
originally announced October 2016.
-
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)
Authors:
Daniel S. Katz,
Sou-Cheng T. Choi,
Kyle E. Niemeyer,
James Hetherington,
Frank Löffler,
Dan Gunter,
Ray Idaszak,
Steven R. Brandt,
Mark A. Miller,
Sandra Gesing,
Nick D. Jones,
Nic Weber,
Suresh Marru,
Gabrielle Allen,
Birgit Penzenstadler,
Colin C. Venters,
Ethan Davis,
Lorraine Hwang,
Ilian Todorov,
Abani Patra,
Miguel de Val-Borro
Abstract:
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustain…
▽ More
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustaining scientific software. The final and main contribution of the report is a summary of the discussions, future steps, and future organization for a set of self-organized working groups on topics including develo** pathways to funding scientific software; constructing useful common metrics for crediting software stakeholders; identifying principles for sustainable software engineering design; reaching out to research software organizations around the world; and building communities for software sustainability. For each group, we include a point of contact and a landing page that can be used by those who want to join that group's future activities. The main challenge left by the workshop is to see if the groups will execute these activities that they have scheduled, and how the WSSSPE community can encourage this to happen.
△ Less
Submitted 6 February, 2016;
originally announced February 2016.
-
Chemora: A PDE Solving Framework for Modern HPC Architectures
Authors:
Erik Schnetter,
Marek Blazewicz,
Steven R. Brandt,
David M. Koppelman,
Frank Löffler
Abstract:
Modern HPC architectures consist of heterogeneous multi-core, many-node systems with deep memory hierarchies. Modern applications employ ever more advanced discretisation methods to study multi-physics problems. Develo** such applications that explore cutting-edge physics on cutting-edge HPC systems has become a complex task that requires significant HPC knowledge and experience. Unfortunately,…
▽ More
Modern HPC architectures consist of heterogeneous multi-core, many-node systems with deep memory hierarchies. Modern applications employ ever more advanced discretisation methods to study multi-physics problems. Develo** such applications that explore cutting-edge physics on cutting-edge HPC systems has become a complex task that requires significant HPC knowledge and experience. Unfortunately, this combined knowledge is currently out of reach for all but a few groups of application developers.
Chemora is a framework for solving systems of Partial Differential Equations (PDEs) that targets modern HPC architectures. Chemora is based on Cactus, which sees prominent usage in the computational relativistic astrophysics community. In Chemora, PDEs are expressed either in a high-level \LaTeX-like language or in Mathematica. Discretisation stencils are defined separately from equations, and can include Finite Differences, Discontinuous Galerkin Finite Elements (DGFE), Adaptive Mesh Refinement (AMR), and multi-block systems.
We use Chemora in the Einstein Toolkit to implement the Einstein Equations on CPUs and on accelerators, and study astrophysical systems such as black hole binaries, neutron stars, and core-collapse supernovae.
△ Less
Submitted 3 October, 2014;
originally announced October 2014.
-
Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)
Authors:
Daniel S. Katz,
Sou-Cheng T. Choi,
Hilmar Lapp,
Ketan Maheshwari,
Frank Löffler,
Matthew Turk,
Marcus D. Hanwell,
Nancy Wilkins-Diehr,
James Hetherington,
James Howison,
Shel Swenson,
Gabrielle D. Allen,
Anne C. Elster,
Bruce Berriman,
Colin Venters
Abstract:
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)…
▽ More
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop.
Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of "software papers", and the use of online systems, for example source code repositories like GitHub.
This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software.
△ Less
Submitted 12 June, 2014; v1 submitted 29 April, 2014;
originally announced April 2014.
-
Cactus: Issues for Sustainable Simulation Software
Authors:
Frank Löffler,
Steven R. Brandt,
Gabrielle Allen,
Erik Schnetter
Abstract:
The Cactus Framework is an open-source, modular, portable programming environment for the collaborative development and deployment of scientific applications using high-performance computing. Its roots reach back to 1996 at the National Center for Supercomputer Applications and the Albert Einstein Institute in Germany, where its development jumpstarted. Since then, the Cactus framework has witness…
▽ More
The Cactus Framework is an open-source, modular, portable programming environment for the collaborative development and deployment of scientific applications using high-performance computing. Its roots reach back to 1996 at the National Center for Supercomputer Applications and the Albert Einstein Institute in Germany, where its development jumpstarted. Since then, the Cactus framework has witnessed major changes in hardware infrastructure as well as its own community. This paper describes its endurance through these past changes and, drawing upon lessons from its past, also discusses future
△ Less
Submitted 15 September, 2013; v1 submitted 6 September, 2013;
originally announced September 2013.
-
Software Abstractions and Methodologies for HPC Simulation Codes on Future Architectures
Authors:
A. Dubey,
S. Brandt,
R. Brower,
M. Giles,
P. Hovland,
D. Q. Lamb,
F. Loffler,
B. Norris,
B. OShea,
C. Rebbi,
M. Snir,
R. Thakur
Abstract:
Large, complex, multi-scale, multi-physics simulation codes, running on high performance com-puting (HPC) platforms, have become essential to advancing science and engineering. These codes simulate multi-scale, multi-physics phenomena with unprecedented fidelity on petascale platforms, and are used by large communities. Continued ability of these codes to run on future platforms is as crucial to t…
▽ More
Large, complex, multi-scale, multi-physics simulation codes, running on high performance com-puting (HPC) platforms, have become essential to advancing science and engineering. These codes simulate multi-scale, multi-physics phenomena with unprecedented fidelity on petascale platforms, and are used by large communities. Continued ability of these codes to run on future platforms is as crucial to their communities as continued improvements in instruments and facilities are to experimental scientists. However, the ability of code developers to do these things faces a serious challenge with the paradigm shift underway in platform architecture. The complexity and uncertainty of the future platforms makes it essential to approach this challenge cooperatively as a community. We need to develop common abstractions, frameworks, programming models and software development methodologies that can be applied across a broad range of complex simulation codes, and common software infrastructure to support them. In this position paper we express and discuss our belief that such an infrastructure is critical to the deployment of existing and new large, multi-scale, multi-physics codes on future HPC platforms.
△ Less
Submitted 6 September, 2013;
originally announced September 2013.
-
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Authors:
Marek Blazewicz,
Ian Hinder,
David M. Koppelman,
Steven R. Brandt,
Milosz Ciznicki,
Michal Kierzynka,
Frank Löffler,
Erik Schnetter,
Jian Tao
Abstract:
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applicatio…
▽ More
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.
△ Less
Submitted 24 July, 2013;
originally announced July 2013.
-
A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems
Authors:
Marek Blazewicz,
Steven R. Brandt,
Peter Diener,
David M. Koppelman,
Krzysztof Kurowski,
Frank Löffler,
Erik Schnetter,
Jian Tao
Abstract:
Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge (a library based framework for heterogeneous multi-core systems), Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a new progr…
▽ More
Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge (a library based framework for heterogeneous multi-core systems), Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a new programming language for general purpose computation on the GPU) and CUDA-lite (an enhancement to CUDA that transforms code based on annotations). In addition, efforts are underway to improve compiler tools for automatic parallelization and optimization of affine loop nests for GPUs and for automatic translation of OpenMP parallelized codes to CUDA.
In this paper we present an alternative approach: a new computational framework for the development of massively data parallel scientific codes applications suitable for use on such petascale/exascale hybrid systems built upon the highly scalable Cactus framework. As the first non-trivial demonstration of its usefulness, we successfully developed a new 3D CFD code that achieves improved performance.
△ Less
Submitted 10 January, 2012;
originally announced January 2012.
-
Simplifying Complex Software Assembly: The Component Retrieval Language and Implementation
Authors:
Eric L. Seidel,
Gabrielle Allen,
Steven Brandt,
Frank Löffler,
Erik Schnetter
Abstract:
Assembling simulation software along with the associated tools and utilities is a challenging endeavor, particularly when the components are distributed across multiple source code versioning systems. It is problematic for researchers compiling and running the software across many different supercomputers, as well as for novices in a field who are often presented with a bewildering list of softwar…
▽ More
Assembling simulation software along with the associated tools and utilities is a challenging endeavor, particularly when the components are distributed across multiple source code versioning systems. It is problematic for researchers compiling and running the software across many different supercomputers, as well as for novices in a field who are often presented with a bewildering list of software to collect and install. In this paper, we describe a language (CRL) for specifying software components with the details needed to obtain them from source code repositories. The language supports public and private access. We describe a tool called GetComponents which implements CRL and can be used to assemble software. We demonstrate the tool for application scenarios with the Cactus Framework on the NSF TeraGrid resources. The tool itself is distributed with an open source license and freely available from our web page.
△ Less
Submitted 7 September, 2010;
originally announced September 2010.
-
Component Specification in the Cactus Framework: The Cactus Configuration Language
Authors:
Gabrielle Allen,
Tom Goodale,
Frank Löffler,
David Rideout,
Erik Schnetter,
Eric L. Seidel
Abstract:
Component frameworks are complex systems that rely on many layers of abstraction to function properly. One essential requirement is a consistent means of describing each individual component and how it relates to both other components and the whole framework. As component frameworks are designed to be flexible by nature, the description method should be simultaneously powerful, lead to efficient c…
▽ More
Component frameworks are complex systems that rely on many layers of abstraction to function properly. One essential requirement is a consistent means of describing each individual component and how it relates to both other components and the whole framework. As component frameworks are designed to be flexible by nature, the description method should be simultaneously powerful, lead to efficient code, and be easy to use, so that new users can quickly adapt their own code to work with the framework. In this paper, we discuss the Cactus Configuration Language (CCL) which is used to describe components ("thorns'') in the Cactus Framework. The CCL provides a description language for the variables, parameters, functions, scheduling and compilation of a component and includes concepts such as interface and implementation which allow thorns providing the same capabilities to be easily interchanged. We include several application examples which illustrate how community toolkits use the CCL and Cactus and identify needed additions to the language.
△ Less
Submitted 7 September, 2010;
originally announced September 2010.