-
Toward Research Software Categories
Authors:
Wilhelm Hasselbring,
Stephan Druskat,
Jan Bernoth,
Philine Betker,
Michael Felderer,
Stephan Ferenz,
Anna-Lena Lamprecht,
Jan Linxweiler,
Bernhard Rumpe
Abstract:
Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, developer-bas…
▽ More
Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, developer-based, and maturity-based categories. Since our work has been inspired by various previous efforts to categorize research software, we discuss them as related works. We characterize all these categories via the previously introduced template, to enable a systematic comparison.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Biomedical Open Source Software: Crucial Packages and Hidden Heroes
Authors:
Andrew Nesbitt,
Boris Veytsman,
Daniel Mietchen,
Eva Maxfield Brown,
James Howison,
João Felipe Pimentel,
Laurent Hèbert-Dufresne,
Stephan Druskat
Abstract:
Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon.
In this work…
▽ More
Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon.
In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research
Authors:
Stephan Druskat,
Neil P. Chue Hong,
Sammie Buzzard,
Olexandr Konovalov,
Patrick Kornek
Abstract:
Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research s…
▽ More
Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research software. Our approach includes sampling and data preparation, manual annotation for quality and mention characteristics, and annotation analysis. We applied it to two software mention datasets for evaluation based on qualitative observation. Doing this, we were able to find challenges to working with the selected datasets to do research. Main issues refer to the structure of the dataset, the quality of the extracted mentions (54% and 23% of mentions respectively are not to software), and software accessibility. While one dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors: (1) Links may come from different sources and each point to different software for the same mention. (2) The quality of the automatically retrieved links is generally poor (in our sample, 65.4% link the wrong software). (3) Links exist only for a small subset (in our sample, 20.5%) of mentions, which may lead to skewed or disproportionate samples. However, the greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation: Software should not be mentioned, it should be cited following the software citation principles.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Software publications with rich metadata: state of the art, automated workflows and HERMES concept
Authors:
Stephan Druskat,
Oliver Bertuch,
Guido Juckeland,
Oliver Knodel,
Tobias Schlauch
Abstract:
To satisfy the principles of FAIR software, software sustainability and software citation, research software must be formally published. Publication repositories make this possible and provide published software versions with unique and persistent identifiers. However, software publication is still a tedious, mostly manual process.
To streamline software publication, HERMES, a project funded by…
▽ More
To satisfy the principles of FAIR software, software sustainability and software citation, research software must be formally published. Publication repositories make this possible and provide published software versions with unique and persistent identifiers. However, software publication is still a tedious, mostly manual process.
To streamline software publication, HERMES, a project funded by the Helmholtz Metadata Collaboration, develops automated workflows to publish research software with rich metadata.
The tooling developed by the project utilizes continuous integration solutions to retrieve, collate, and process existing metadata in source repositories, and publish them on publication repositories, including checks against existing metadata requirements. To accompany the tooling and enable researchers to easily reuse it, the project also provides comprehensive documentation and templates for widely used CI solutions. In this paper, we outline the concept for these workflows, and describe how our solution advance the state of the art in research software publication.
△ Less
Submitted 22 January, 2022;
originally announced January 2022.
-
Research Software Sustainability and Citation
Authors:
Stephan Druskat,
Daniel S. Katz,
Ilian T. Todorov
Abstract:
Software citation contributes to achieving software sustainability in two ways: It provides an impact metric to incentivize stakeholders to make software sustainable. It also provides references to software used in research, which can be reused and adapted to become sustainable. While software citation faces a host of technical and social challenges, community initiatives have defined the principl…
▽ More
Software citation contributes to achieving software sustainability in two ways: It provides an impact metric to incentivize stakeholders to make software sustainable. It also provides references to software used in research, which can be reused and adapted to become sustainable. While software citation faces a host of technical and social challenges, community initiatives have defined the principles of software citation and are working on implementing solutions.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide
Authors:
Task Force on Best Practices for Software Registries,
:,
Alain Monteil,
Alejandra Gonzalez-Beltran,
Alexandros Ioannidis,
Alice Allen,
Allen Lee,
Anita Bandrowski,
Bruce E. Wilson,
Bryce Mecum,
Cai Fan Du,
Carly Robinson,
Daniel Garijo,
Daniel S. Katz,
David Long,
Genevieve Milliken,
Hervé Ménager,
Jessica Hausman,
Jurriaan H. Spaaks,
Katrina Fenlon,
Kristin Vanderbilt,
Lorraine Hwang,
Lynn Davis,
Martin Fenner,
Michael R. Crusoe
, et al. (8 additional authors not shown)
Abstract:
Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, develo** these r…
▽ More
Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, develo** these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
An Environment for Sustainable Research Software in Germany and Beyond: Current State, Open Challenges, and Call for Action
Authors:
Hartwig Anzt,
Felix Bach,
Stephan Druskat,
Frank Löffler,
Axel Loewe,
Bernhard Y. Renard,
Gunnar Seemann,
Alexander Struck,
Elke Achhammer,
Piush Aggarwal,
Franziska Appel,
Michael Bader,
Lutz Brusch,
Christian Busse,
Gerasimos Chourdakis,
Piotr W. Dabrowski,
Peter Ebert,
Bernd Flemisch,
Sven Friedl,
Bernadette Fritzsch,
Maximilian D. Funk,
Volker Gast,
Florian Goth,
Jean-Noël Grad,
Sibylle Hermann
, et al. (18 additional authors not shown)
Abstract:
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software…
▽ More
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
△ Less
Submitted 5 May, 2020; v1 submitted 27 April, 2020;
originally announced May 2020.
-
Challenges for Verifying and Validating Scientific Software in Computational Materials Science
Authors:
Thomas Vogel,
Stephan Druskat,
Markus Scheidgen,
Claudia Draxl,
Lars Grunske
Abstract:
Many fields of science rely on software systems to answer different research questions. For valid results researchers need to trust the results scientific software produces, and consequently quality assurance is of utmost importance. In this paper we are investigating the impact of quality assurance in the domain of computational materials science (CMS). Based on our experience in this domain we f…
▽ More
Many fields of science rely on software systems to answer different research questions. For valid results researchers need to trust the results scientific software produces, and consequently quality assurance is of utmost importance. In this paper we are investigating the impact of quality assurance in the domain of computational materials science (CMS). Based on our experience in this domain we formulate challenges for validation and verification of scientific software and their results. Furthermore, we describe directions for future research that can potentially help dealing with these challenges.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
Software and Dependencies in Research Citation Graphs
Authors:
Stephan Druskat
Abstract:
Following the widespread digitalization of scholarship, software has become essential for research, but the current sociotechnical system of citation does not reflect this sufficiently. Citation provides context for research, but the current model for the respective research citation graphs does not integrate software. In this paper, I develop a directed graph model to alleviate this, describe cha…
▽ More
Following the widespread digitalization of scholarship, software has become essential for research, but the current sociotechnical system of citation does not reflect this sufficiently. Citation provides context for research, but the current model for the respective research citation graphs does not integrate software. In this paper, I develop a directed graph model to alleviate this, describe challenges for its instantiation, and give an outlook of useful applications of research citation graphs, including transitive credit.
△ Less
Submitted 19 December, 2019; v1 submitted 14 June, 2019;
originally announced June 2019.
-
Software Citation Implementation Challenges
Authors:
Daniel S. Katz,
Daina Bouquin,
Neil P. Chue Hong,
Jessica Hausman,
Catherine Jones,
Daniel Chivvis,
Tim Clark,
Mercè Crosas,
Stephan Druskat,
Martin Fenner,
Tom Gillespie,
Alejandra Gonzalez-Beltran,
Morane Gruenpeter,
Ted Habermann,
Robert Haines,
Melissa Harrison,
Edwin Henneken,
Lorraine Hwang,
Matthew B. Jones,
Alastair A. Kelly,
David N. Kennedy,
Katrin Leinweber,
Fernando Rios,
Carly B. Robinson,
Ilian Todorov
, et al. (2 additional authors not shown)
Abstract:
The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibilit…
▽ More
The main output of the FORCE11 Software Citation working group (https://www.force11.org/group/software-citation-working-group) was a paper on software citation principles (https://doi.org/10.7717/peerj-cs.86) published in September 2016. This paper laid out a set of six high-level principles for software citation (importance, credit and attribution, unique identification, persistence, accessibility, and specificity) and discussed how they could be used to implement software citation in the scholarly community. In a series of talks and other activities, we have promoted software citation using these increasingly accepted principles. At the time the initial paper was published, we also provided guidance and examples on how to make software citable, though we now realize there are unresolved problems with that guidance. The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
The State of Sustainable Research Software: Results from the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1)
Authors:
Daniel S. Katz,
Stephan Druskat,
Robert Haines,
Caroline Jay,
Alexander Struck
Abstract:
This article summarizes motivations, organization, and activities of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) held in Manchester, UK in September 2017. The WSSSPE series promotes sustainable research software by positively impacting principles and best practices, careers, learning, and credit. This article discusses the Code of Conduct, idea papers, po…
▽ More
This article summarizes motivations, organization, and activities of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) held in Manchester, UK in September 2017. The WSSSPE series promotes sustainable research software by positively impacting principles and best practices, careers, learning, and credit. This article discusses the Code of Conduct, idea papers, position papers, experience papers, demos, and lightning talks presented during the workshop. The main part of the article discusses the speed-blogging groups that formed during the meeting, along with the outputs of those sessions.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Map** the research software sustainability space
Authors:
Stephan Druskat,
Daniel S. Katz
Abstract:
A growing number of largely uncoordinated initiatives focus on research software sustainability. A comprehensive map** of the research software sustainability space can help identify gaps in their efforts, track results, and avoid duplication of work. To this end, this paper suggests enhancing an existing schematic of activities in research software sustainability, and formalizing it in a direct…
▽ More
A growing number of largely uncoordinated initiatives focus on research software sustainability. A comprehensive map** of the research software sustainability space can help identify gaps in their efforts, track results, and avoid duplication of work. To this end, this paper suggests enhancing an existing schematic of activities in research software sustainability, and formalizing it in a directed graph model. Such a model can be further used to define a classification schema which, applied to research results in the field, can drive the identification of past activities and the planning of future efforts.
△ Less
Submitted 26 October, 2018; v1 submitted 4 July, 2018;
originally announced July 2018.
-
A Proposal for the Measurement and Documentation of Research Software Sustainability in Interactive Metadata Repositories
Authors:
Stephan Druskat
Abstract:
This paper proposes an interactive repository type for research software metadata which measures and documents software sustainability by accumulating metadata, and computing sustainability metrics over them. Such a repository would help to overcome technical barriers to software sustainability by furthering the discovery and identification of sustainable software, thereby also facilitating docume…
▽ More
This paper proposes an interactive repository type for research software metadata which measures and documents software sustainability by accumulating metadata, and computing sustainability metrics over them. Such a repository would help to overcome technical barriers to software sustainability by furthering the discovery and identification of sustainable software, thereby also facilitating documentation of research software within the framework of software management plans.
△ Less
Submitted 17 August, 2016; v1 submitted 16 August, 2016;
originally announced August 2016.