Skip to main content

Showing 1–23 of 23 results for author: Pfisterer, F

.
  1. One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions

    Authors: Jan Simson, Florian Pfisterer, Christoph Kern

    Abstract: A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems' design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are ma… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Journal ref: FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (2024) 1305-1320

  2. Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML

    Authors: Hilde Weerts, Florian Pfisterer, Matthias Feurer, Katharina Eggensperger, Edward Bergman, Noor Awad, Joaquin Vanschoren, Mykola Pechenizkiy, Bernd Bischl, Frank Hutter

    Abstract: The field of automated machine learning (AutoML) introduces techniques that automate parts of the development of machine learning (ML) systems, accelerating the process and reducing barriers for novices. However, decisions derived from ML models can reproduce, amplify, or even introduce unfairness in our societies, causing harm to (groups of) individuals. In response, researchers have started to p… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 79 (2024) 639-677

  3. Mind the Gap: Measuring Generalization Performance Across Multiple Objectives

    Authors: Matthias Feurer, Katharina Eggensperger, Edward Bergman, Florian Pfisterer, Bernd Bischl, Frank Hutter

    Abstract: Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from th… ▽ More

    Submitted 9 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  4. arXiv:2211.09875  [pdf, other

    stat.CO

    Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods

    Authors: David Rügamer, Florian Pfisterer, Bernd Bischl, Bettina Grün

    Abstract: In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that all… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.06889

  5. arXiv:2208.00204  [pdf, other

    cs.LG cs.NE stat.ML

    Tackling Neural Architecture Search With Quality Diversity Optimization

    Authors: Lennart Schneider, Florian Pfisterer, Paul Kent, Juergen Branke, Bernd Bischl, Janek Thomas

    Abstract: Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progr… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: Accepted at the First Conference on Automated Machine Learning (Main Track). 30 pages, 8 tables, 13 figures

  6. arXiv:2207.00367  [pdf, other

    stat.ML cs.LG

    A geometric framework for outlier detection in high-dimensional data

    Authors: Moritz Herrmann, Florian Pfisterer, Fabian Scheipl

    Abstract: Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework that exploits the metric structure of a data set. Our approach rests on the manifold assumption, i.e., that the observed, nominally high-dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with mani… ▽ More

    Submitted 29 July, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: 24 page, 6 figures, extended introduction, contribution, and discussion sections, additional experiments added

  7. arXiv:2206.07438  [pdf, other

    cs.LG stat.ML

    Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview

    Authors: Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang, Eduardo C. Garrido-Merchán, Juergen Branke, Bernd Bischl

    Abstract: Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metric… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Published at ACM TELO

    Journal ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50

  8. arXiv:2206.03256  [pdf, other

    cs.CY cs.LG stat.AP stat.ME

    Flexible Group Fairness Metrics for Survival Analysis

    Authors: Raphael Sonabend, Florian Pfisterer, Alan Mishler, Moritz Schauer, Lukas Burk, Sumantrak Mukherjee, Sebastian Vollmer

    Abstract: Algorithmic fairness is an increasingly important field concerned with detecting and mitigating biases in machine learning models. There has been a wealth of literature for algorithmic fairness in regression and classification however there has been little exploration of the field for survival analysis. Survival analysis is the prediction task in which one attempts to predict the probability of an… ▽ More

    Submitted 22 July, 2022; v1 submitted 26 May, 2022; originally announced June 2022.

    Comments: Accepted in DSHealth 2022 (Workshop on Applied Data Science for Healthcare)

  9. arXiv:2204.14061  [pdf, other

    cs.LG

    A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models

    Authors: Lennart Schneider, Florian Pfisterer, Janek Thomas, Bernd Bischl

    Abstract: The goal of Quality Diversity Optimization is to generate a collection of diverse yet high-performing solutions to a given problem at hand. Typical benchmark problems are, for example, finding a repertoire of robot arm configurations or a collection of game playing strategies. In this paper, we propose a set of Quality Diversity Optimization problems that tackle hyperparameter optimization of mach… ▽ More

    Submitted 30 July, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Accepted at the GECCO'22 Workshop on Quality Diversity Algorithm Benchmarks. 7 pages, 6 tables, 7 figures

  10. arXiv:2111.14756  [pdf, other

    cs.LG stat.ML

    Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers

    Authors: Julia Moosbauer, Martin Binder, Lennart Schneider, Florian Pfisterer, Marc Becker, Michel Lang, Lars Kotthoff, Bernd Bischl

    Abstract: Automated hyperparameter optimization (HPO) has gained great popularity and is an important ingredient of most automated machine learning frameworks. The process of designing HPO algorithms, however, is still an unsystematic and manual process: Limitations of prior work are identified and the improvements proposed are -- even though guided by expert knowledge -- still somewhat arbitrary. This rare… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: * Equal Contributions

  11. arXiv:2109.03670  [pdf, other

    cs.LG stat.ML

    YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

    Authors: Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder, Bernd Bischl

    Abstract: When develo** and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total cons… ▽ More

    Submitted 30 July, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted at the First Conference on Automated Machine Learning (Main Track). 39 pages, 12 tables, 10 figures, 1 listing

  12. arXiv:2107.07343  [pdf, other

    cs.LG cs.NE

    Mutation is all you need

    Authors: Lennart Schneider, Florian Pfisterer, Martin Binder, Bernd Bischl

    Abstract: Neural architecture search (NAS) promises to make deep learning accessible to non-experts by automating architecture engineering of deep neural networks. BANANAS is one state-of-the-art NAS method that is embedded within the Bayesian optimization framework. Recent experimental findings have demonstrated the strong performance of BANANAS on the NAS-Bench-101 benchmark being determined by its path e… ▽ More

    Submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted for the 8th ICML Workshop on Automated Machine Learning (2021). 10 pages, 1 table, 3 figures

  13. Meta-Learning for Symbolic Hyperparameter Defaults

    Authors: Pieter Gijsbers, Florian Pfisterer, Jan N. van Rijn, Bernd Bischl, Joaquin Vanschoren

    Abstract: Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Pieter Gijsbers and Florian Pfisterer contributed equally to the paper. V1: Two page GECCO poster paper accepted at GECCO 2021. V2: The original full length paper (8 pages) with appendix

  14. arXiv:2104.02705  [pdf, other

    stat.ML cs.LG stat.CO

    deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

    Authors: David Rügamer, Chris Kolb, Cornelius Fritz, Florian Pfisterer, Philipp Kopper, Bernd Bischl, Ruolin Shen, Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, Philipp Baumann, Lucas Kook, Nadja Klein, Christian L. Müller

    Abstract: In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library \pkg{TensorFlow} for the fusion of various statistical and deep… ▽ More

    Submitted 10 March, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

  15. Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features

    Authors: Florian Pargent, Florian Pfisterer, Janek Thomas, Bernd Bischl

    Abstract: Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequen… ▽ More

    Submitted 4 March, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Comput Stat (2022)

  16. arXiv:2011.02407  [pdf, other

    cs.LG cs.CY econ.EM

    Debiasing classifiers: is reality at variance with expectation?

    Authors: Ashrya Agrawal, Florian Pfisterer, Bernd Bischl, Francois Buet-Golfouse, Srijan Sood, Jiahao Chen, Sameena Shah, Sebastian Vollmer

    Abstract: We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better. A rigorous evaluation of the debiasing treatment effect requires extensive cross-validation beyond what is usually done. We demonstrate that this phenomenon can be explained as a consequence of bias-va… ▽ More

    Submitted 30 May, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: 13 pages, under review

    MSC Class: 68T01; 68Q32; 68T05 ACM Class: G.4; I.2.0; J.4

  17. arXiv:2010.06889  [pdf, other

    stat.CO cs.LG stat.ML

    Neural Mixture Distributional Regression

    Authors: David Rügamer, Florian Pfisterer, Bernd Bischl

    Abstract: We present neural mixture distributional regression (NMDR), a holistic framework to estimate complex finite mixtures of distributional regressions defined by flexible additive predictors. Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings, allows for efficient and scalable optimization and can be applied to recent concepts… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  18. arXiv:1911.07511  [pdf, other

    stat.ML cs.LG

    Benchmarking time series classification -- Functional data vs machine learning approaches

    Authors: Florian Pfisterer, Laura Beggel, Xudong Sun, Fabian Scheipl, Bernd Bischl

    Abstract: Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common e… ▽ More

    Submitted 24 February, 2021; v1 submitted 18 November, 2019; originally announced November 2019.

  19. arXiv:1911.02391  [pdf, other

    cs.HC cs.AI

    Towards Human Centered AutoML

    Authors: Florian Pfisterer, Janek Thomas, Bernd Bischl

    Abstract: Building models from data is an integral part of the majority of data science workflows. While data scientists are often forced to spend the majority of the time available for a given project on data cleaning and exploratory analysis, the time available to practitioners to build actual models from data is often rather short due to time constraints for a given project. AutoML systems are currently… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: 4 pages

  20. arXiv:1908.10796  [pdf, other

    stat.ML cs.LG

    Multi-Objective Automatic Machine Learning with AutoxgboostMC

    Authors: Florian Pfisterer, Stefan Coors, Janek Thomas, Bernd Bischl

    Abstract: AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or ta… ▽ More

    Submitted 30 April, 2021; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: Accepted at Ecmlpkdd Workshop on Automating Data Science 2019

  21. arXiv:1902.08999  [pdf, other

    cs.LG stat.ML

    High Dimensional Restrictive Federated Model Selection with multi-objective Bayesian Optimization over shifted distributions

    Authors: Xudong Sun, Andrea Bommert, Florian Pfisterer, Jörg Rahnenführer, Michel Lang, Bernd Bischl

    Abstract: A novel machine learning optimization process coined Restrictive Federated Model Selection (RFMS) is proposed under the scenario, for example, when data from healthcare units can not leave the site it is situated on and it is forbidden to carry out training algorithms on remote data sites due to either technical or privacy and trust concerns. To carry out a clinical research under this scenario, a… ▽ More

    Submitted 8 August, 2019; v1 submitted 24 February, 2019; originally announced February 2019.

  22. arXiv:1811.09409  [pdf, other

    stat.ML cs.LG

    Learning Multiple Defaults for Machine Learning Algorithms

    Authors: Florian Pfisterer, Jan N. van Rijn, Philipp Probst, Andreas Müller, Bernd Bischl

    Abstract: The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, diffe… ▽ More

    Submitted 30 April, 2021; v1 submitted 23 November, 2018; originally announced November 2018.

  23. arXiv:1609.06146  [pdf, other

    cs.LG

    mlr Tutorial

    Authors: Julia Schiffner, Bernd Bischl, Michel Lang, Jakob Richter, Zachary M. Jones, Philipp Probst, Florian Pfisterer, Mason Gallo, Dominik Kirchhoff, Tobias Kühn, Janek Thomas, Lars Kotthoff

    Abstract: This document provides and in-depth introduction to the mlr framework for machine learning experiments in R.

    Submitted 17 September, 2016; originally announced September 2016.