-
A sustainable infrastructure concept for improved accessibility, reusability, and archival of research software
Authors:
Timo Koch,
Dennis Gläser,
Anett Seeland,
Sarbani Roy,
Katharina Schulze,
Kilian Weishaupt,
David Boehringer,
Sibylle Hermann,
Bernd Flemisch
Abstract:
Research software is an integral part of most research today and it is widely accepted that research software artifacts should be accessible and reproducible. However, the sustainable archival of research software artifacts is an ongoing effort. We identify research software artifacts as snapshots of the current state of research and an integral part of a sustainable cycle of software development,…
▽ More
Research software is an integral part of most research today and it is widely accepted that research software artifacts should be accessible and reproducible. However, the sustainable archival of research software artifacts is an ongoing effort. We identify research software artifacts as snapshots of the current state of research and an integral part of a sustainable cycle of software development, research, and publication. We develop requirements and recommendations to improve the archival, access, and reuse of research software artifacts based on installable, configurable, extensible research software, and sustainable public open-access infrastructure. The described goal is to enable the reuse and exploration of research software beyond published research results, in parallel with reproducibility efforts, and in line with the FAIR principles for data and software. Research software artifacts can be reused in varying scenarios. To this end, we design a multi-modal representation concept supporting multiple reuse scenarios. We identify types of research software artifacts that can be viewed as different modes of the same software-based research result, for example, installation-free configurable browser-based apps to containerized environments, descriptions in journal publications and software documentation, or source code with installation instructions. We discuss how the sustainability and reuse of research software are enhanced or enabled by a suitable archive infrastructure. Finally, at the example of a pilot project at the University of Stuttgart, Germany -- a collaborative effort between research software developers and infrastructure providers -- we outline practical challenges and experiences
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Classifier Transfer with Data Selection Strategies for Online Support Vector Machine Classification with Class Imbalance
Authors:
Mario Michael Krell,
Nils Wilshusen,
Anett Seeland,
Su Kyoung Kim
Abstract:
Objective: Classifier transfers usually come with dataset shifts. To overcome them, online strategies have to be applied. For practical applications, limitations in the computational resources for the adaptation of batch learning algorithms, like the SVM, have to be considered.
Approach: We review and compare several strategies for online learning with SVMs. We focus on data selection strategies…
▽ More
Objective: Classifier transfers usually come with dataset shifts. To overcome them, online strategies have to be applied. For practical applications, limitations in the computational resources for the adaptation of batch learning algorithms, like the SVM, have to be considered.
Approach: We review and compare several strategies for online learning with SVMs. We focus on data selection strategies which limit the size of the stored training data [...]
Main Results: For different data shifts, different criteria are appropriate. For the synthetic data, adding all samples to the pool of considered samples performs often significantly worse than other criteria. Especially, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add potential new support vectors of the SVM which reduces processing resources.
Significance: For BCIs based on EEG models, trained on data from a calibration session, a previous recording session, or even from a recording session with one or several other subjects, are used. This transfer of the learned model usually decreases the performance and can therefore benefit from online learning which adapts the classifier like the established SVM. We show that by using the right combination of data selection criteria, it is possible to adapt the classifier and largely increase the performance. Furthermore, in some cases it is possible to speed up the processing and save computational by updating with a subset of special samples and kee** a small subset of samples for training the classifier.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Data Augmentation for Brain-Computer Interfaces: Analysis on Event-Related Potentials Data
Authors:
Mario Michael Krell,
Anett Seeland,
Su Kyoung Kim
Abstract:
On image data, data augmentation is becoming less relevant due to the large amount of available training data and regularization techniques. Common approaches are moving windows (crop**), scaling, affine distortions, random noise, and elastic deformations. For electroencephalographic data, the lack of sufficient training data is still a major issue. We suggest and evaluate different approaches t…
▽ More
On image data, data augmentation is becoming less relevant due to the large amount of available training data and regularization techniques. Common approaches are moving windows (crop**), scaling, affine distortions, random noise, and elastic deformations. For electroencephalographic data, the lack of sufficient training data is still a major issue. We suggest and evaluate different approaches to generate augmented data using temporal and spatial/rotational distortions. Our results on the perception of rare stimuli (P300 data) and movement prediction (MRCP data) show that these approaches are feasible and can significantly increase the performance of signal processing chains for brain-computer interfaces by 1% to 6%.
△ Less
Submitted 8 January, 2018;
originally announced January 2018.