Skip to main content

Showing 1–17 of 17 results for author: Wright, M N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.04098  [pdf, other

    stat.ML cs.LG

    A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

    Authors: Lukas Burk, John Zobolas, Bernd Bischl, Andreas Bender, Marvin N. Wright, Raphael Sonabend

    Abstract: This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are often narrow in scope, focusing, for example, on h… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 42 pages, 28 figures

  2. arXiv:2404.12862  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    A Guide to Feature Importance Methods for Scientific Inference

    Authors: Fiona Katharina Ewald, Ludwig Bothmann, Marvin N. Wright, Bernd Bischl, Giuseppe Casalicchio, Gunnar König

    Abstract: While machine learning (ML) models are increasingly used due to their high predictive power, their use in understanding the data-generating process (DGP) is limited. Understanding the DGP requires insights into feature-target associations, which many ML models cannot directly provide, due to their opaque internal mechanisms. Feature importance (FI) methods provide useful insights into the DGP unde… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted at the 2nd World Conference on eXplainable Artificial Intelligence, xAI-2024

  3. arXiv:2404.11330  [pdf, other

    stat.ML cs.LG

    Toward Understanding the Disagreement Problem in Neural Network Feature Attribution

    Authors: Niklas Koenen, Marvin N. Wright

    Abstract: In recent years, neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data. However, understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions. Among the prominent approaches for explaining these black boxes are feature attribution methods, which assign relevance or contributio… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  4. arXiv:2403.10250  [pdf, other

    stat.ML cs.LG stat.ME

    Interpretable Machine Learning for Survival Analysis

    Authors: Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright

    Abstract: With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clin… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  5. arXiv:2311.07366  [pdf, other

    stat.ML cs.LG

    arfpy: A python package for density estimation and generative modeling with adversarial random forests

    Authors: Kristin Blesch, Marvin N. Wright

    Abstract: This paper introduces $\textit{arfpy}$, a python implementation of Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight procedure for synthesizing new data that resembles some given data. The software $\textit{arfpy}$ equips practitioners with straightforward functionalities for both density estimation and generative modeling. The method is particularly useful for tabular… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: The software is available at https://github.com/bips-hb/arfpy

  6. arXiv:2308.16113  [pdf, other

    cs.LG cs.AI stat.ML

    survex: an R package for explaining machine learning survival models

    Authors: Mikołaj Spytek, Mateusz Krzyziński, Sophie Hanna Langbein, Hubert Baniecki, Marvin N. Wright, Przemysław Biecek

    Abstract: Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explai… ▽ More

    Submitted 21 November, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  7. arXiv:2306.10822  [pdf, other

    stat.ML cs.LG

    Interpreting Deep Neural Networks with the Package innsight

    Authors: Niklas Koenen, Marvin N. Wright

    Abstract: The R package innsight offers a general toolbox for revealing variable-wise interpretations of deep neural networks' predictions with so-called feature attribution methods. Aside from the unified and user-friendly framework, the package stands out in three ways: It is generally the first R package implementing feature attribution methods for neural networks. Secondly, it operates independently of… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

  8. arXiv:2306.00541  [pdf, other

    stat.ML cs.LG

    Decomposing Global Feature Effects Based on Feature Interactions

    Authors: Julia Herbinger, Marvin N. Wright, Thomas Nagler, Bernd Bischl, Giuseppe Casalicchio

    Abstract: Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GAD… ▽ More

    Submitted 1 July, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

  9. Conditional Feature Importance for Mixed Data

    Authors: Kristin Blesch, David S. Watson, Marvin N. Wright

    Abstract: Despite the popularity of feature importance (FI) measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analyzing a variable's importance before and after adjusting for covariates - i.e., between $\textit{marginal}$ and $\textit{conditional}$ measures. Our work draws attention to thi… ▽ More

    Submitted 2 May, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Journal ref: AStA Advances in Statistical Analysis (2023)

  10. arXiv:2208.06151  [pdf, other

    cs.LG math.ST stat.ML

    Unifying local and global model explanations by functional decomposition of low dimensional structures

    Authors: Munir Hiabu, Joseph T. Meyer, Marvin N. Wright

    Abstract: We consider a global representation of a regression or classification function by decomposing it into the sum of main and interaction components of arbitrary order. We propose a new identification constraint that allows for the extraction of interventional SHAP values and partial dependence plots, thereby unifying local and global explanations. With our proposed identification, a feature's partial… ▽ More

    Submitted 23 February, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

  11. arXiv:2205.09435  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Adversarial random forests for density estimation and generative modeling

    Authors: David S. Watson, Kristin Blesch, Jan Kapar, Marvin N. Wright

    Abstract: We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which trees gradually learn structural properties of the data through alternating rounds of generation and discrimination. The method is provably consistent under minimal assumptions. Unlike classic tree-b… ▽ More

    Submitted 13 March, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

    Comments: Camera ready version (AISTATS 2023)

    Journal ref: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

  12. Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

    Authors: Christoph Molnar, Timo Freiesleben, Gunnar König, Giuseppe Casalicchio, Marvin N. Wright, Bernd Bischl

    Abstract: Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (P… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Journal ref: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1901

  13. arXiv:2107.04346  [pdf, other

    stat.ML cs.LG math.DS

    Generalization of the Change of Variables Formula with Applications to Residual Flows

    Authors: Niklas Koenen, Marvin N. Wright, Peter Maaß, Jens Behrmann

    Abstract: Normalizing flows leverage the Change of Variables Formula (CVF) to define flexible density models. Yet, the requirement of smooth transformations (diffeomorphisms) in the CVF poses a significant challenge in the construction of these models. To enlarge the design space of flows, we introduce $\mathcal{L}$-diffeomorphisms as generalized transformations which may violate these requirements on zero… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

  14. arXiv:1901.09917  [pdf, other

    stat.ME cs.LG stat.ML

    Testing Conditional Independence in Supervised Learning Algorithms

    Authors: David S. Watson, Marvin N. Wright

    Abstract: We propose the conditional predictive impact (CPI), a consistent and unbiased estimator of the association between one or several features and a given outcome, conditional on a reduced feature set. Building on the knockoff framework of Candès et al. (2018), we develop a novel testing procedure that works in conjunction with any valid knockoff sampler, supervised learning algorithm, and loss functi… ▽ More

    Submitted 13 May, 2021; v1 submitted 28 January, 2019; originally announced January 2019.

  15. arXiv:1901.06211  [pdf, other

    stat.ME

    A Random Forest Approach for Modeling Bounded Outcomes

    Authors: Leonie Weinhold, Matthias Schmid, Marvin N. Wright, Moritz Berger

    Abstract: Random forests have become an established tool for classification and regression, in particular in high-dimensional settings and in the presence of complex predictor-response relationships. For bounded outcome variables restricted to the unit interval, however, classical random forest approaches may severely suffer as they do not account for the heteroscedasticity in the data. A random forest appr… ▽ More

    Submitted 18 January, 2019; originally announced January 2019.

    Comments: 19 pages, 5 figures

  16. arXiv:1605.03391  [pdf, other

    stat.ML cs.LG

    Unbiased split variable selection for random survival forests using maximally selected rank statistics

    Authors: Marvin N. Wright, Theresa Dankowski, Andreas Ziegler

    Abstract: The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistics, which favors splitting variab… ▽ More

    Submitted 16 May, 2018; v1 submitted 11 May, 2016; originally announced May 2016.

    Journal ref: Wright, M. N., Dankowski, T. & Ziegler, A. (2017). Unbiased split variable selection for random survival forests using maximally selected rank statistics. Statistics in Medicine 36:1272-1284

  17. arXiv:1508.04409  [pdf, other

    stat.ML stat.CO

    ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

    Authors: Marvin N. Wright, Andreas Ziegler

    Abstract: We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software pr… ▽ More

    Submitted 17 May, 2018; v1 submitted 18 August, 2015; originally announced August 2015.

    Journal ref: Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software 77:1-17