Search | arXiv e-print repository

Requirements Engineering for Research Software: A Vision

Authors: Adrian Bajraktari, Michelle Binder, Andreas Vogelsang

Abstract: Modern science is relying on software more than ever. The behavior and outcomes of this software shape the scientific and public discourse on important topics like climate change, economic growth, or the spread of infections. Most researchers creating software for scientific purposes are not trained in Software Engineering. As a consequence, research software is often developed ad hoc without foll… ▽ More Modern science is relying on software more than ever. The behavior and outcomes of this software shape the scientific and public discourse on important topics like climate change, economic growth, or the spread of infections. Most researchers creating software for scientific purposes are not trained in Software Engineering. As a consequence, research software is often developed ad hoc without following stringent processes. With this paper, we want to characterize research software as a new application domain that needs attention from the Requirements Engineering community. We conducted an exploratory study based on 8 interviews with 12 researchers who develop software. We describe how researchers elicit, document, and analyze requirements for research software and what processes they follow. From this, we derive specific challenges and describe a vision of Requirements Engineering for research software. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted at the 32nd IEEE International Requirements Engineering 2024 (RE) conference

arXiv:2304.06569 [pdf, other]

counterfactuals: An R Package for Counterfactual Explanation Methods

Authors: Susanne Dandl, Andreas Hofheinz, Martin Binder, Bernd Bischl, Giuseppe Casalicchio

Abstract: Counterfactual explanation methods provide information on how feature values of individual observations must be changed to obtain a desired prediction. Despite the increasing amount of proposed methods in research, only a few implementations exist whose interfaces and requirements vary widely. In this work, we introduce the counterfactuals R package, which provides a modular and unified R6-based i… ▽ More Counterfactual explanation methods provide information on how feature values of individual observations must be changed to obtain a desired prediction. Despite the increasing amount of proposed methods in research, only a few implementations exist whose interfaces and requirements vary widely. In this work, we introduce the counterfactuals R package, which provides a modular and unified R6-based interface for counterfactual explanation methods. We implemented three existing counterfactual explanation methods and propose some optional methodological extensions to generalize these methods to different scenarios and to make them more comparable. We explain the structure and workflow of the package using real use cases and show how to integrate additional counterfactual explanation methods into the package. In addition, we compared the implemented methods for a variety of models and datasets with regard to the quality of their counterfactual explanations and their runtime behavior. △ Less

Submitted 15 September, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 49 pages LaTeX, updated benchmark results

arXiv:2303.03798 [pdf, other]

Automatically Classifying Kano Model Factors in App Reviews

Authors: Michelle Binder, Annika Vogt, Adrian Bajraktari, Andreas Vogelsang

Abstract: [Context and motivation] Requirements assessment by means of the Kano model is common practice. As suggested by the original authors, these assessments are done by interviewing stakeholders and asking them about the level of satisfaction if a certain feature is well implemented and the level of dissatisfaction if a feature is not or not well implemented. [Question/problem] Assessments via interv… ▽ More [Context and motivation] Requirements assessment by means of the Kano model is common practice. As suggested by the original authors, these assessments are done by interviewing stakeholders and asking them about the level of satisfaction if a certain feature is well implemented and the level of dissatisfaction if a feature is not or not well implemented. [Question/problem] Assessments via interviews are time-consuming, expensive, and can only capture the opinion of a limited set of stakeholders. [Principal ideas/results] We investigate the possibility to extract Kano model factors (basic needs, performance factors, and delighters) from a large set of user feedback (i.e., app reviews). We implemented, trained, and tested several classifiers on a set of 2,592 reviews. In a 10-fold cross-validation, a BERT-based classifier performed best with an accuracy of 0.928. To assess the classifiers' generalization, we additionally tested them on another independent set of 1,622 app reviews. The accuracy of the best classifier dropped to 0.725. We also show that misclassifications correlate with human disagreement on the labels. [Contribution] Our approach is a lightweight and automated alternative for identifying Kano model factors from a large set of user feedback. The limited accuracy of the approach is an inherent problem of missing information about the context in app reviews compared to comprehensive interviews, which also makes it hard for humans to extract the factors correctly. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2211.00708 [pdf]

Inferring school district learning modalities during the COVID-19 pandemic with a hidden Markov model

Authors: Mark J. Panaggio, Mike Fang, Hyunseung Bang, Paige A. Armstrong, Alison M. Binder, Julian E. Grass, Jake Magid, Marc Papazian, Carrie K Shapiro-Mendoza, Sharyn E. Parks

Abstract: In this study, learning modalities offered by public schools across the United States were investigated to track changes in the proportion of schools offering fully in-person, hybrid and fully remote learning over time. Learning modalities from 14,688 unique school districts from September 2020 to June 2021 were reported by Burbio, MCH Strategic Data, the American Enterprise Institute's Return to… ▽ More In this study, learning modalities offered by public schools across the United States were investigated to track changes in the proportion of schools offering fully in-person, hybrid and fully remote learning over time. Learning modalities from 14,688 unique school districts from September 2020 to June 2021 were reported by Burbio, MCH Strategic Data, the American Enterprise Institute's Return to Learn Tracker and individual state dashboards. A model was needed to combine and deconflict these data to provide a more complete description of modalities nationwide. A hidden Markov model (HMM) was used to infer the most likely learning modality for each district on a weekly basis. This method yielded higher spatiotemporal coverage than any individual data source and higher agreement with three of the four data sources than any other single source. The model output revealed that the percentage of districts offering fully in-person learning rose from 40.3% in September 2020 to 54.7% in June of 2021 with increases across 45 states and in both urban and rural districts. This type of probabilistic model can serve as a tool for fusion of incomplete and contradictory data sources in support of public health surveillance and research efforts. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 25 pages, 4 figures

arXiv:2206.07438 [pdf, other]

doi 10.1145/3610536

Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview

Authors: Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang, Eduardo C. Garrido-Merchán, Juergen Branke, Bernd Bischl

Abstract: Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metric… ▽ More Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies, both from the domain of evolutionary algorithms and Bayesian optimization. We illustrate the utility of MOO in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability and robustness. △ Less

Submitted 6 June, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Published at ACM TELO

Journal ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50

arXiv:2111.14756 [pdf, other]

Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers

Authors: Julia Moosbauer, Martin Binder, Lennart Schneider, Florian Pfisterer, Marc Becker, Michel Lang, Lars Kotthoff, Bernd Bischl

Abstract: Automated hyperparameter optimization (HPO) has gained great popularity and is an important ingredient of most automated machine learning frameworks. The process of designing HPO algorithms, however, is still an unsystematic and manual process: Limitations of prior work are identified and the improvements proposed are -- even though guided by expert knowledge -- still somewhat arbitrary. This rare… ▽ More Automated hyperparameter optimization (HPO) has gained great popularity and is an important ingredient of most automated machine learning frameworks. The process of designing HPO algorithms, however, is still an unsystematic and manual process: Limitations of prior work are identified and the improvements proposed are -- even though guided by expert knowledge -- still somewhat arbitrary. This rarely allows for gaining a holistic understanding of which algorithmic components are driving performance, and carries the risk of overlooking good algorithmic design choices. We present a principled approach to automated benchmark-driven algorithm design applied to multifidelity HPO (MF-HPO): First, we formalize a rich space of MF-HPO candidates that includes, but is not limited to common HPO algorithms, and then present a configurable framework covering this space. To find the best candidate automatically and systematically, we follow a programming-by-optimization approach and search over the space of algorithm candidates via Bayesian optimization. We challenge whether the found design choices are necessary or could be replaced by more naive and simpler ones by performing an ablation analysis. We observe that using a relatively simple configuration, in some ways simpler than established methods, performs very well as long as some critical configuration parameters have the right value. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: * Equal Contributions

arXiv:2109.03670 [pdf, other]

YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

Authors: Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder, Bernd Bischl

Abstract: When develo** and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total cons… ▽ More When develo** and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total constitute over 700 multi-fidelity hyperparameter optimization problems, which all enable multi-objective hyperparameter optimization. Furthermore, we empirically compare surrogate-based benchmarks to the more widely-used tabular benchmarks, and demonstrate that the latter may produce unfaithful results regarding the performance ranking of HPO methods. We examine and compare our benchmark collection with respect to defined requirements and propose a single-objective as well as a multi-objective benchmark suite on which we compare 7 single-objective and 7 multi-objective optimizers in a benchmark experiment. Our software is available at [https://github.com/slds-lmu/yahpo_gym]. △ Less

Submitted 30 July, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: Accepted at the First Conference on Automated Machine Learning (Main Track). 39 pages, 12 tables, 10 figures, 1 listing

arXiv:2107.07343 [pdf, other]

Mutation is all you need

Authors: Lennart Schneider, Florian Pfisterer, Martin Binder, Bernd Bischl

Abstract: Neural architecture search (NAS) promises to make deep learning accessible to non-experts by automating architecture engineering of deep neural networks. BANANAS is one state-of-the-art NAS method that is embedded within the Bayesian optimization framework. Recent experimental findings have demonstrated the strong performance of BANANAS on the NAS-Bench-101 benchmark being determined by its path e… ▽ More Neural architecture search (NAS) promises to make deep learning accessible to non-experts by automating architecture engineering of deep neural networks. BANANAS is one state-of-the-art NAS method that is embedded within the Bayesian optimization framework. Recent experimental findings have demonstrated the strong performance of BANANAS on the NAS-Bench-101 benchmark being determined by its path encoding and not its choice of surrogate model. We present experimental results suggesting that the performance of BANANAS on the NAS-Bench-301 benchmark is determined by its acquisition function optimizer, which minimally mutates the incumbent. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: Accepted for the 8th ICML Workshop on Automated Machine Learning (2021). 10 pages, 1 table, 3 figures

arXiv:2107.05847 [pdf, other]

Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Authors: Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, Difan Deng, Marius Lindauer

Abstract: Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for superv… ▽ More Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for supervised machine learning, can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. It gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with ML pipelines, runtime improvements, and parallelization. This work is accompanied by an appendix that contains information on specific software packages in R and Python, as well as information and recommended hyperparameter search spaces for specific learning algorithms. We also provide notebooks that demonstrate concepts from this work as supplementary files. △ Less

Submitted 24 November, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2004.11165 [pdf, other]

doi 10.1007/978-3-030-58112-1_31

Multi-Objective Counterfactual Explanations

Authors: Susanne Dandl, Christoph Molnar, Martin Binder, Bernd Bischl

Abstract: Counterfactual explanations are one of the most popular methods to make predictions of black box machine learning models interpretable by providing explanations in the form of `what-if scenarios'. Most current approaches optimize a collapsed, weighted sum of multiple objectives, which are naturally difficult to balance a-priori. We propose the Multi-Objective Counterfactuals (MOC) method, which tr… ▽ More Counterfactual explanations are one of the most popular methods to make predictions of black box machine learning models interpretable by providing explanations in the form of `what-if scenarios'. Most current approaches optimize a collapsed, weighted sum of multiple objectives, which are naturally difficult to balance a-priori. We propose the Multi-Objective Counterfactuals (MOC) method, which translates the counterfactual search into a multi-objective optimization problem. Our approach not only returns a diverse set of counterfactuals with different trade-offs between the proposed objectives, but also maintains diversity in feature space. This enables a more detailed post-hoc analysis to facilitate better understanding and also more options for actionable user responses to change the predicted outcome. Our approach is also model-agnostic and works for numerical and categorical input features. We show the usefulness of MOC in concrete cases and compare our approach with state-of-the-art methods for counterfactual explanations. △ Less

Submitted 24 June, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Journal ref: Parallel Problem Solving from Nature - PPSN XVI. PPSN 2020. Lecture Notes in Computer Science, vol 12269

arXiv:1912.12912 [pdf, other]

Multi-Objective Hyperparameter Tuning and Feature Selection using Filter Ensembles

Authors: Martin Binder, Julia Moosbauer, Janek Thomas, Bernd Bischl

Abstract: Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on pr… ▽ More Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model interpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well. We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data. △ Less

Submitted 13 February, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

Showing 1–11 of 11 results for author: Binder, M