Skip to main content

Showing 1–50 of 83 results for author: Bischl, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.18334  [pdf, other

    cs.LG stat.ML

    Efficient and Accurate Explanation Estimation with Distribution Compression

    Authors: Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl, Przemyslaw Biecek

    Abstract: Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad sp… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: To be presented at the ICML 2024 Workshop on DMLR

  2. arXiv:2406.09069  [pdf, other

    cs.LG stat.ML

    On the Robustness of Global Feature Effect Explanations

    Authors: Hubert Baniecki, Giuseppe Casalicchio, Bernd Bischl, Przemyslaw Biecek

    Abstract: We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at ECML PKDD 2024

  3. arXiv:2406.04098  [pdf, other

    stat.ML cs.LG

    A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

    Authors: Lukas Burk, John Zobolas, Bernd Bischl, Andreas Bender, Marvin N. Wright, Raphael Sonabend

    Abstract: This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on h… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 42 pages, 28 figures

  4. arXiv:2405.15393  [pdf, other

    stat.ML cs.LG

    Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization

    Authors: Thomas Nagler, Lennart Schneider, Bernd Bischl, Matthias Feurer

    Abstract: Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed c… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 39 pages, 4 tables, 29 figures

  5. arXiv:2405.02200  [pdf, other

    cs.LG stat.ML

    Position: Why We Must Rethink Empirical Research in Machine Learning

    Authors: Moritz Herrmann, F. Julian D. Lange, Katharina Eggensperger, Giuseppe Casalicchio, Marcel Wever, Matthias Feurer, David Rügamer, Eyke Hüllermeier, Anne-Laure Boulesteix, Bernd Bischl

    Abstract: We warn against a common but incomplete understanding of empirical research in machine learning that leads to non-replicable results, makes findings unreliable, and threatens to undermine progress in the field. To overcome this alarming situation, we call for more awareness of the plurality of ways of gaining knowledge experimentally but also of some epistemic limitations. In particular, we argue… ▽ More

    Submitted 25 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 20 pages, accepted for publication at ICML 2024, camera-ready version

  6. arXiv:2404.12862  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    A Guide to Feature Importance Methods for Scientific Inference

    Authors: Fiona Katharina Ewald, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, Gunnar König

    Abstract: While machine learning (ML) models are increasingly used due to their high predictive power, their use in understanding the data-generating process (DGP) is limited. Understanding the DGP requires insights into feature-target associations, which many ML models cannot directly provide, due to their opaque internal mechanisms. Feature importance (FI) methods provide useful insights into the DGP unde… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted at the 2nd World Conference on eXplainable Artificial Intelligence, xAI-2024

  7. arXiv:2404.03506  [pdf, other

    stat.ML cs.LG

    CountARFactuals -- Generating plausible model-agnostic counterfactual explanations with adversarial random forests

    Authors: Susanne Dandl, Kristin Blesch, Timo Freiesleben, Gunnar König, Jan Kapar, Bernd Bischl, Marvin Wright

    Abstract: Counterfactual explanations elucidate algorithmic decisions by pointing to scenarios that would have led to an alternative, desired outcome. Giving insight into the model's behavior, they hint users towards possible actions and give grounds for contesting decisions. As a crucial factor in achieving these goals, counterfactuals must be plausible, i.e., describing realistic alternative scenarios wit… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: SD, KB, TB, and GK contributed equally as first authors

  8. arXiv:2403.13150  [pdf, other

    cs.LG cs.AI stat.CO stat.ML

    Training Survival Models using Scoring Rules

    Authors: Philipp Kopper, David Rügamer, Raphael Sonabend, Bernd Bischl, Andreas Bender

    Abstract: Survival Analysis provides critical insights for partially incomplete time-to-event data in various domains. It is also an important example of probabilistic machine learning. The probabilistic nature of the predictions can be exploited by using (proper) scoring rules in the model fitting process instead of likelihood-based optimization. Our proposal does so in a generic manner and can be used for… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  9. arXiv:2403.04629  [pdf, other

    cs.LG cs.AI cs.HC cs.RO stat.ML

    Explaining Bayesian Optimization by Shapley Values Facilitates Human-AI Collaboration

    Authors: Julian Rodemann, Federico Croppi, Philipp Arens, Yusuf Sale, Julia Herbinger, Bernd Bischl, Eyke Hüllermeier, Thomas Augustin, Conor J. Walsh, Giuseppe Casalicchio

    Abstract: Bayesian optimization (BO) with Gaussian processes (GP) has become an indispensable algorithm for black box optimization problems. Not without a dash of irony, BO is often considered a black box itself, lacking ways to provide reasons as to why certain parameters are proposed to be evaluated. This is particularly relevant in human-in-the-loop applications of BO, such as in robotics. We address thi… ▽ More

    Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Preprint. Copyright by the authors. 19 pages, 24 figures

    ACM Class: I.2.6; I.2.9; F.2.2; J.6

  10. arXiv:2402.01484  [pdf, other

    cs.LG stat.CO stat.ML

    Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?

    Authors: Emanuel Sommer, Lisa Wimmer, Theodore Papamarkou, Ludwig Bothmann, Bernd Bischl, David Rügamer

    Abstract: A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, w… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  11. arXiv:2311.15395  [pdf, other

    cs.LG cs.CV stat.ML

    ConstraintMatch for Semi-constrained Clustering

    Authors: Jann Goschenhofer, Bernd Bischl, Zsolt Kira

    Abstract: Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In thi… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

  12. arXiv:2311.01349  [pdf, other

    cs.LG cs.CY stat.ML

    Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR Embeddings

    Authors: Tobias Weber, Michael Ingrisch, Bernd Bischl, David Rügamer

    Abstract: Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trai… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  13. arXiv:2310.15108  [pdf, other

    stat.ML cs.LG stat.AP stat.CO stat.ME

    Evaluating machine learning models in non-standard settings: An overview and new findings

    Authors: Roman Hornung, Malte Nalenz, Lennart Schneider, Andreas Bender, Ludwig Bothmann, Bernd Bischl, Thomas Augustin, Anne-Laure Boulesteix

    Abstract: Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines fo… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  14. arXiv:2310.02008  [pdf, other

    cs.LG econ.EM stat.ML

    fmeffects: An R Package for Forward Marginal Effects

    Authors: Holger Löwe, Christian A. Scholbeck, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio

    Abstract: Forward marginal effects (FMEs) have recently been introduced as a versatile and effective model-agnostic interpretation method. They provide comprehensible and actionable model explanations in the form of: If we change $x$ by an amount $h$, what is the change in predicted outcome $\widehat{y}$? We present the R package fmeffects, the first software implementation of FMEs. The relevant theoretical… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  15. arXiv:2309.02048  [pdf, other

    cs.LG stat.ML

    Probabilistic Self-supervised Learning via Scoring Rules Minimization

    Authors: Amirhossein Vahidi, Simon Schoßer, Lisa Wimmer, Yawei Li, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei

    Abstract: In this paper, we propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN), which leverages the power of probabilistic models to enhance representation quality and mitigate collapsing representations. Our proposed approach involves two neural networks; the online network and the target network, which collaborate and learn the diverse distribution of representa… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  16. arXiv:2308.14705  [pdf, other

    stat.ML cs.LG

    Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning

    Authors: Amirhossein Vahidi, Lisa Wimmer, Hüseyin Anil Gündüz, Bernd Bischl, Eyke Hüllermeier, Mina Rezaei

    Abstract: Ensembling a neural network is a widely recognized approach to enhance model performance, estimate uncertainty, and improve robustness in deep supervised learning. However, deep ensembles often come with high computational costs and memory demands. In addition, the efficiency of a deep ensemble is related to diversity among the ensemble members which is challenging for large, over-parameterized de… ▽ More

    Submitted 1 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

  17. arXiv:2307.08175  [pdf, other

    cs.LG cs.NE stat.ML

    Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models

    Authors: Lennart Schneider, Bernd Bischl, Janek Thomas

    Abstract: We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-obj… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: Extended version of the paper accepted at GECCO 2023. 16 pages, 7 tables, 7 figures

  18. arXiv:2307.07331  [pdf, other

    cs.CL cs.CY cs.LG stat.ML

    How Different Is Stereotypical Bias Across Languages?

    Authors: Ibrahim Tolga Öztürk, Rostislav Nedelchev, Christian Heumann, Esteban Garces Arias, Marius Roger, Bernd Bischl, Matthias Aßenmacher

    Abstract: Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the Engli… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted @ "3rd Workshop on Bias and Fairness in AI" (co-located with ECML PKDD 2023). This is the author's version of the work. The definite version of record will be published in the proceedings

  19. arXiv:2307.03571  [pdf, other

    cs.LG math.OC stat.ML

    Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization

    Authors: Chris Kolb, Christian L. Müller, Bernd Bischl, David Rügamer

    Abstract: We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity. These non-smooth and possibly non-convex problems typically rely on solvers tailored to specific models and regularizers. In contrast, our method enables fully differentiable and approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep l… ▽ More

    Submitted 26 April, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

  20. arXiv:2306.00541  [pdf, other

    stat.ML cs.LG

    Decomposing Global Feature Effects Based on Feature Interactions

    Authors: Julia Herbinger, Marvin N. Wright, Thomas Nagler, Bernd Bischl, Giuseppe Casalicchio

    Abstract: Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GAD… ▽ More

    Submitted 1 July, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  21. Deep Learning for Survival Analysis: A Review

    Authors: Simon Wiegrebe, Philipp Kopper, Raphael Sonabend, Bernd Bischl, Andreas Bender

    Abstract: The influx of deep learning (DL) techniques into the field of survival analysis in recent years has led to substantial methodological progress; for instance, learning from unstructured or high-dimensional data such as images, text or omics data. In this work, we conduct a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival-… ▽ More

    Submitted 22 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 29 pages, 7 figures, 2 tables, 1 interactive table

    Journal ref: Artif Intell Rev 57, 65 (2024)

  22. Interpretable Regional Descriptors: Hyperbox-Based Local Explanations

    Authors: Susanne Dandl, Giuseppe Casalicchio, Bernd Bischl, Ludwig Bothmann

    Abstract: This work introduces interpretable regional descriptors, or IRDs, for local, model-agnostic interpretations. IRDs are hyperboxes that describe how an observation's feature values can be changed without affecting its prediction. They justify a prediction by providing a set of "even if" arguments (semi-factual explanations), and they indicate which features affect a prediction and whether pointwise… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Journal ref: Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science, vol. 14171, p. 479-495

  23. arXiv:2304.06569  [pdf, other

    stat.ML cs.LG stat.CO

    counterfactuals: An R Package for Counterfactual Explanation Methods

    Authors: Susanne Dandl, Andreas Hofheinz, Martin Binder, Bernd Bischl, Giuseppe Casalicchio

    Abstract: Counterfactual explanation methods provide information on how feature values of individual observations must be changed to obtain a desired prediction. Despite the increasing amount of proposed methods in research, only a few implementations exist whose interfaces and requirements vary widely. In this work, we introduce the counterfactuals R package, which provides a modular and unified R6-based i… ▽ More

    Submitted 15 September, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 49 pages LaTeX, updated benchmark results

  24. arXiv:2304.02902  [pdf, other

    stat.ML cs.LG

    Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry

    Authors: Jonas Gregor Wiese, Lisa Wimmer, Theodore Papamarkou, Bernd Bischl, Stephan Günnemann, David Rügamer

    Abstract: Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  25. arXiv:2211.09875  [pdf, other

    stat.CO

    Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods

    Authors: David Rügamer, Florian Pfisterer, Bernd Bischl, Bettina Grün

    Abstract: In this work, we propose an efficient implementation of mixtures of experts distributional regression models which exploits robust estimation by using stochastic first-order optimization techniques with adaptive learning rate schedulers. We take advantage of the flexibility and scalability of neural network software and implement the proposed framework in mixdistreg, an R software package that all… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.06889

  26. arXiv:2210.07723  [pdf, other

    stat.ML cs.CR cs.LG

    Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models

    Authors: Daniel Schalk, Bernd Bischl, David Rügamer

    Abstract: Various privacy-preserving frameworks that respect the individual's privacy in the analysis of data have been developed in recent years. However, available model classes such as simple statistics or generalized linear models lack the flexibility required for a good approximation of the underlying data-generating process in practice. In this paper, we propose an algorithm for a distributed, privacy… ▽ More

    Submitted 10 March, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

  27. arXiv:2208.00204  [pdf, other

    cs.LG cs.NE stat.ML

    Tackling Neural Architecture Search With Quality Diversity Optimization

    Authors: Lennart Schneider, Florian Pfisterer, Paul Kent, Juergen Branke, Bernd Bischl, Janek Thomas

    Abstract: Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progr… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: Accepted at the First Conference on Automated Machine Learning (Main Track). 30 pages, 8 tables, 13 figures

  28. arXiv:2207.12560  [pdf, other

    cs.LG stat.ML

    AMLB: an AutoML Benchmark

    Authors: Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren

    Abstract: Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explo… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: UNDER REVIEW: Revised submission to JMLR, with updated results from June 2023

  29. arXiv:2206.07438  [pdf, other

    cs.LG stat.ML

    Multi-Objective Hyperparameter Optimization in Machine Learning -- An Overview

    Authors: Florian Karl, Tobias Pielok, Julia Moosbauer, Florian Pfisterer, Stefan Coors, Martin Binder, Lennart Schneider, Janek Thomas, Jakob Richter, Michel Lang, Eduardo C. Garrido-Merchán, Juergen Branke, Bernd Bischl

    Abstract: Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metric… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Published at ACM TELO

    Journal ref: ACM Transactions on Evolutionary Learning and Optimization 3.4 (2023): 1-50

  30. arXiv:2206.05447  [pdf, other

    cs.LG stat.ML

    Improving Accuracy of Interpretability Measures in Hyperparameter Optimization via Bayesian Algorithm Execution

    Authors: Julia Moosbauer, Giuseppe Casalicchio, Marius Lindauer, Bernd Bischl

    Abstract: Despite all the benefits of automated hyperparameter optimization (HPO), most modern HPO algorithms are black-boxes themselves. This makes it difficult to understand the decision process which leads to the selected configuration, reduces trust in HPO, and thus hinders its broad adoption. Here, we study the combination of HPO with interpretable machine learning (IML) methods such as partial depende… ▽ More

    Submitted 12 February, 2023; v1 submitted 11 June, 2022; originally announced June 2022.

  31. arXiv:2205.13080  [pdf, other

    stat.ML cs.LG stat.CO

    Factorized Structured Regression for Large-Scale Varying Coefficient Models

    Authors: David Rügamer, Andreas Bender, Simon Wiegrebe, Daniel Racek, Bernd Bischl, Christian L. Müller, Clemens Stachl

    Abstract: Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structur… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  32. arXiv:2203.10828  [pdf, other

    stat.CO cs.DC stat.AP

    Distributed non-disclosive validation of predictive models by a modified ROC-GLM

    Authors: Daniel Schalk, Verena S. Hoffmann, Bernd Bischl, Ulrich Mansmann

    Abstract: Distributed statistical analyses provide a promising approach for privacy protection when analysing data distributed over several databases. It brings the analysis to the data and not the data to the analysis. The analyst receives anonymous summary statistics which are combined to a aggregated result. We are interested to calculate the AUC of a prediction score based on a distributed approach with… ▽ More

    Submitted 14 March, 2023; v1 submitted 21 March, 2022; originally announced March 2022.

  33. arXiv:2202.07423  [pdf, other

    stat.ML cs.LG

    DeepPAMM: Deep Piecewise Exponential Additive Mixed Models for Complex Hazard Structures in Survival Analysis

    Authors: Philipp Kopper, Simon Wiegrebe, Bernd Bischl, Andreas Bender, David Rügamer

    Abstract: Survival analysis (SA) is an active field of research that is concerned with time-to-event outcomes and is prevalent in many domains, particularly biomedical applications. Despite its importance, SA remains challenging due to small-scale data sets and complex outcome distributions, concealed by truncation and censoring processes. The piecewise exponential additive mixed model (PAMM) is a model cla… ▽ More

    Submitted 12 February, 2022; originally announced February 2022.

    Comments: 13 pages, 2 figures, This work has been accepted by the 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2022)

  34. arXiv:2202.07254  [pdf, other

    stat.ML cs.LG

    REPID: Regional Effect Plots with implicit Interaction Detection

    Authors: Julia Herbinger, Bernd Bischl, Giuseppe Casalicchio

    Abstract: Machine learning models can automatically learn complex relationships, such as non-linear and interaction effects. Interpretable machine learning methods such as partial dependence plots visualize marginal feature effects but may lead to misleading interpretations when feature interactions are present. Hence, employing additional methods that can detect and measure the strength of interactions is… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  35. arXiv:2201.13192  [pdf, other

    stat.ML cs.LG

    Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning

    Authors: Emilio Dorigatti, Jann Goschenhofer, Benjamin Schubert, Mina Rezaei, Bernd Bischl

    Abstract: Positive-unlabeled learning (PUL) aims at learning a binary classifier from only positive and unlabeled training data. Even though real-world applications often involve imbalanced datasets where the majority of examples belong to one class, most contemporary approaches to PUL do not investigate performance in this setting, thus severely limiting their applicability in practice. In this work, we th… ▽ More

    Submitted 10 March, 2024; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: 25 pages, 4 figures

  36. arXiv:2201.08837  [pdf, other

    cs.LG econ.EM stat.AP stat.ME stat.ML

    Marginal Effects for Non-Linear Prediction Functions

    Authors: Christian A. Scholbeck, Giuseppe Casalicchio, Christoph Molnar, Bernd Bischl, Christian Heumann

    Abstract: Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  37. arXiv:2111.14756  [pdf, other

    cs.LG stat.ML

    Automated Benchmark-Driven Design and Explanation of Hyperparameter Optimizers

    Authors: Julia Moosbauer, Martin Binder, Lennart Schneider, Florian Pfisterer, Marc Becker, Michel Lang, Lars Kotthoff, Bernd Bischl

    Abstract: Automated hyperparameter optimization (HPO) has gained great popularity and is an important ingredient of most automated machine learning frameworks. The process of designing HPO algorithms, however, is still an unsystematic and manual process: Limitations of prior work are identified and the improvements proposed are -- even though guided by expert knowledge -- still somewhat arbitrary. This rare… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: * Equal Contributions

  38. arXiv:2111.04820  [pdf, other

    cs.LG stat.ML

    Explaining Hyperparameter Optimization via Partial Dependence Plots

    Authors: Julia Moosbauer, Julia Herbinger, Giuseppe Casalicchio, Marius Lindauer, Bernd Bischl

    Abstract: Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable… ▽ More

    Submitted 26 January, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: to be published in proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021); typos corrected, replaced N by N' in formula (6)

  39. arXiv:2110.03513  [pdf, other

    stat.CO cs.LG

    Accelerated Componentwise Gradient Boosting using Efficient Data Representation and Momentum-based Optimization

    Authors: Daniel Schalk, Bernd Bischl, David Rügamer

    Abstract: Componentwise boosting (CWB), also known as model-based boosting, is a variant of gradient boosting that builds on additive models as base learners to ensure interpretability. CWB is thus often used in research areas where models are employed as tools to explain relationships in data. One downside of CWB is its computational complexity in terms of memory and runtime. In this paper, we propose two… ▽ More

    Submitted 29 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  40. arXiv:2109.05583  [pdf, ps, other

    stat.ML cs.LG

    Automatic Componentwise Boosting: An Interpretable AutoML System

    Authors: Stefan Coors, Daniel Schalk, Bernd Bischl, David Rügamer

    Abstract: In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML system… ▽ More

    Submitted 16 October, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: 6 pages, 4 figures, ECML-PKDD Workshop on Automating Data Science 2021

  41. arXiv:2109.03670  [pdf, other

    cs.LG stat.ML

    YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

    Authors: Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder, Bernd Bischl

    Abstract: When develo** and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total cons… ▽ More

    Submitted 30 July, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted at the First Conference on Automated Machine Learning (Main Track). 39 pages, 12 tables, 10 figures, 1 listing

  42. Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

    Authors: Christoph Molnar, Timo Freiesleben, Gunnar König, Giuseppe Casalicchio, Marvin N. Wright, Bernd Bischl

    Abstract: Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (P… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Journal ref: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1901

  43. arXiv:2107.05847  [pdf, other

    stat.ML cs.LG

    Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

    Authors: Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, Difan Deng, Marius Lindauer

    Abstract: Most machine learning algorithms are configured by one or several hyperparameters that must be carefully chosen and often considerably impact performance. To avoid a time consuming and unreproducible manual trial-and-error process to find well-performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods, e.g., based on resampling error estimation for superv… ▽ More

    Submitted 24 November, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

  44. arXiv:2106.08086  [pdf, other

    stat.ML cs.LG

    Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT)

    Authors: Gunnar König, Timo Freiesleben, Bernd Bischl, Giuseppe Casalicchio, Moritz Grosse-Wentrup

    Abstract: Global model-agnostic feature importance measures either quantify whether features are directly used for a model's predictions (direct importance) or whether they contain prediction-relevant information (associative importance). Direct importance provides causal insight into the model's mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  45. Meta-Learning for Symbolic Hyperparameter Defaults

    Authors: Pieter Gijsbers, Florian Pfisterer, Jan N. van Rijn, Bernd Bischl, Joaquin Vanschoren

    Abstract: Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Pieter Gijsbers and Florian Pfisterer contributed equally to the paper. V1: Two page GECCO poster paper accepted at GECCO 2021. V2: The original full length paper (8 pages) with appendix

  46. Grouped Feature Importance and Combined Features Effect Plot

    Authors: Quay Au, Julia Herbinger, Clemens Stachl, Bernd Bischl, Giuseppe Casalicchio

    Abstract: Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effec… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Journal ref: Data Mining and Knowledge Discovery 36, 1401--1450 (2022)

  47. arXiv:2104.02705  [pdf, other

    stat.ML cs.LG stat.CO

    deepregression: a Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

    Authors: David Rügamer, Chris Kolb, Cornelius Fritz, Florian Pfisterer, Philipp Kopper, Bernd Bischl, Ruolin Shen, Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, Philipp Baumann, Lucas Kook, Nadja Klein, Christian L. Müller

    Abstract: In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library \pkg{TensorFlow} for the fusion of various statistical and deep… ▽ More

    Submitted 10 March, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

  48. Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features

    Authors: Florian Pargent, Florian Pfisterer, Janek Thomas, Bernd Bischl

    Abstract: Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequen… ▽ More

    Submitted 4 March, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Comput Stat (2022)

  49. arXiv:2011.05824  [pdf, other

    cs.LG cs.AI stat.ML

    Semi-Structured Deep Piecewise Exponential Models

    Authors: Philipp Kopper, Sebastian Pölsterl, Christian Wachinger, Bernd Bischl, Andreas Bender, David Rügamer

    Abstract: We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning. The presented framework is based on piecewise exponential models and thereby supports various survival tasks, such as competing risks and multi-state modeling, and further allows for estimation of time-varying effects and time-varying features. To also include multiple data so… ▽ More

    Submitted 1 March, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: 8 pages, 3 figures, Accepted at the AAAI spring symposium: Survival Prediction

  50. Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges

    Authors: Christoph Molnar, Giuseppe Casalicchio, Bernd Bischl

    Abstract: We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods, and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s. Recently, many new IML methods have been proposed, ma… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Journal ref: Koprinska I. et al. (eds) ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham