Don't mention it: An approach to assess challenges to using software mentions for citation and discoverability research
Authors:
Stephan Druskat,
Neil P. Chue Hong,
Sammie Buzzard,
Olexandr Konovalov,
Patrick Kornek
Abstract:
Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research s…
▽ More
Datasets collecting software mentions from scholarly publications can potentially be used for research into the software that has been used in the published research, as well as into the practice of software citation. Recently, new software mention datasets with different characteristics have been published. We present an approach to assess the usability of such datasets for research on research software. Our approach includes sampling and data preparation, manual annotation for quality and mention characteristics, and annotation analysis. We applied it to two software mention datasets for evaluation based on qualitative observation. Doing this, we were able to find challenges to working with the selected datasets to do research. Main issues refer to the structure of the dataset, the quality of the extracted mentions (54% and 23% of mentions respectively are not to software), and software accessibility. While one dataset does not provide links to mentioned software at all, the other does so in a way that can impede quantitative research endeavors: (1) Links may come from different sources and each point to different software for the same mention. (2) The quality of the automatically retrieved links is generally poor (in our sample, 65.4% link the wrong software). (3) Links exist only for a small subset (in our sample, 20.5%) of mentions, which may lead to skewed or disproportionate samples. However, the greatest challenge and underlying issue in working with software mention datasets is the still suboptimal practice of software citation: Software should not be mentioned, it should be cited following the software citation principles.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
Closing the loop: Autonomous experiments enabled by machine-learning-based online data analysis in synchrotron beamline environments
Authors:
Linus Pithan,
Vladimir Starostin,
David Mareček,
Lukas Petersdorf,
Constantin Völter,
Valentin Munteanu,
Maciej Jankowski,
Oleg Konovalov,
Alexander Gerlach,
Alexander Hinderhofer,
Bridget Murphy,
Stefan Kowarik,
Frank Schreiber
Abstract:
Recently, there has been significant interest in applying machine learning (ML) techniques to X-ray scattering experiments, which proves to be a valuable tool for enhancing research that involves large or rapidly generated datasets. ML allows for the automated interpretation of experimental results, particularly those obtained from synchrotron or neutron facilities. The speed at which ML models ca…
▽ More
Recently, there has been significant interest in applying machine learning (ML) techniques to X-ray scattering experiments, which proves to be a valuable tool for enhancing research that involves large or rapidly generated datasets. ML allows for the automated interpretation of experimental results, particularly those obtained from synchrotron or neutron facilities. The speed at which ML models can process data presents an important opportunity to establish a closed-loop feedback system, enabling real-time decision-making based on online data analysis. In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR), using the growth of organic thin films as an example. Our focus lies on the beamline integration of ML-based online data analysis and closed-loop feedback. We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment. Our data demonstrates the accuracy and robustness of ML methods for analyzing XRR curves and Bragg reflections and its autonomous control over a vacuum deposition setup.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.