Search | arXiv e-print repository

Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective

Authors: Ningsheng Zhao, Jia Yuan Yu, Krzysztof Dzieciolowski, Trang Bui

Abstract: Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outp… ▽ More Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outputs. Moreover, the mechanism and consequences of these drawbacks have not been discussed systematically. In this paper, we propose a novel error theoretical analysis framework, in which the explanation errors of SVAs are decomposed into two components: observation bias and structural bias. We further clarify the underlying causes of these two biases and demonstrate that there is a trade-off between them. Based on this error analysis framework, we develop two novel concepts: over-informative and underinformative explanations. We demonstrate how these concepts can be effectively used to understand potential errors of existing SVA methods. In particular, for the widely deployed assumption-based SVAs, we find that they can easily be under-informative due to the distribution drift caused by distributional assumptions. We propose a measurement tool to quantify such a distribution drift. Finally, our experiments illustrate how different existing SVA methods can be over- or under-informative. Our work sheds light on how errors incur in the estimation of SVAs and encourages new less error-prone methods. △ Less

Submitted 29 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2103.05059 [pdf, other]

Bias-Corrected Peaks-Over-Threshold Estimation of the CVaR

Authors: Dylan Troop, Frédéric Godin, Jia Yuan Yu

Abstract: The conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc. When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well due to limited data above the value-at-risk (VaR), the quantile corresponding to the CVaR level. To mitigate this problem, the CVaR can be estimated by ext… ▽ More The conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc. When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well due to limited data above the value-at-risk (VaR), the quantile corresponding to the CVaR level. To mitigate this problem, the CVaR can be estimated by extrapolating above a lower threshold than the VaR using a generalized Pareto distribution (GPD), which is often referred to as the peaks-over-threshold (POT) approach. This method often requires a very high threshold to fit well, leading to high variance in estimation, and can induce significant bias if the threshold is chosen too low. In this paper, we derive a new expression for the GPD approximation error of the CVaR, a bias term induced by the choice of threshold, as well as a bias correction method for the estimated GPD parameters. This leads to the derivation of a new estimator for the CVaR that we prove to be asymptotically unbiased. In a practical setting, we show through experiments that our estimator provides a significant performance improvement compared with competing CVaR estimators in finite samples. As a consequence of our bias correction method, it is also shown that a much lower threshold can be selected without introducing significant bias. This allows a larger portion of data to be be used in CVaR estimation compared with the typical POT approach, leading to more stable estimates. As secondary results, a new estimator for a second-order parameter of heavy-tailed distributions is derived, as well as a confidence interval for the CVaR which enables quantifying the level of variability in our estimator. △ Less

Submitted 8 March, 2021; originally announced March 2021.

arXiv:1912.01718 [pdf, other]

Risk-Averse Action Selection Using Extreme Value Theory Estimates of the CVaR

Authors: Dylan Troop, Frédéric Godin, Jia Yuan Yu

Abstract: In a wide variety of sequential decision making problems, it can be important to estimate the impact of rare events in order to minimize risk exposure. A popular risk measure is the conditional value-at-risk (CVaR), which is commonly estimated by averaging observations that occur beyond a quantile at a given confidence level. When this confidence level is very high, this estimation method can exhi… ▽ More In a wide variety of sequential decision making problems, it can be important to estimate the impact of rare events in order to minimize risk exposure. A popular risk measure is the conditional value-at-risk (CVaR), which is commonly estimated by averaging observations that occur beyond a quantile at a given confidence level. When this confidence level is very high, this estimation method can exhibit high variance due to the limited number of samples above the corresponding quantile. To mitigate this problem, extreme value theory can be used to derive an estimator for the CVaR that uses extrapolation beyond available samples. This estimator requires the selection of a threshold parameter to work well, which is a difficult challenge that has been widely studied in the extreme value theory literature. In this paper, we present an estimation procedure for the CVaR that combines extreme value theory and a recently introduced method of automated threshold selection by \cite{bader2018automated}. Under appropriate conditions, we estimate the tail risk using a generalized Pareto distribution. We compare empirically this estimation procedure with the commonly used method of sample averaging, and show an improvement in performance for some distributions. We finally show how the estimation procedure can be used in reinforcement learning by applying our method to the multi-arm bandit problem where the goal is to avoid catastrophic risk. △ Less

Submitted 10 December, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

arXiv:1907.05231 [pdf, other]

Variance-Based Risk Estimations in Markov Processes via Transformation with State Lum**

Authors: Shuai Ma, Jia Yuan Yu

Abstract: Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic tr… ▽ More Variance plays a crucial role in risk-sensitive reinforcement learning, and most risk measures can be analyzed via variance. In this paper, we consider two law-invariant risks as examples: mean-variance risk and exponential utility risk. With the aid of the state-augmentation transformation (SAT), we show that, the two risks can be estimated in Markov decision processes (MDPs) with a stochastic transition-based reward and a randomized policy. To relieve the enlarged state space, a novel definition of isotopic states is proposed for state lum**, considering the special structure of the transformed transition probability. In the numerical experiment, we illustrate state lum** in the SAT, errors from a naive reward simplification, and the validity of the SAT for the two risk estimations. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: 7 pages, 7 figures, SMC 2019 accepted. arXiv admin note: text overlap with arXiv:1907.04269

arXiv:1405.2432 [pdf, ps, other]

Functional Bandits

Authors: Long Tran-Thanh, Jia Yuan Yu

Abstract: We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combin… ▽ More We introduce the functional bandit problem, where the objective is to find an arm that optimises a known functional of the unknown arm-reward distributions. These problems arise in many settings such as maximum entropy methods in natural language processing, and risk-averse decision-making, but current best-arm identification techniques fail in these domains. We propose a new approach, that combines functional estimation and arm elimination, to tackle this problem. This method achieves provably efficient performance guarantees. In addition, we illustrate this method on a number of important functionals in risk management and information theory, and refine our generic theoretical results in those cases. △ Less

Submitted 10 May, 2014; originally announced May 2014.

arXiv:1105.4042 [pdf, other]

Adaptive and optimal online linear regression on $\ell^1$-balls

Authors: Sébastien Gerchinovitz, Jia Yuan Yu

Abstract: We consider the problem of online linear regression on individual sequences. The goal in this paper is for the forecaster to output sequential predictions which are, after $T$ time rounds, almost as good as the ones output by the best linear predictor in a given $\ell^1$-ball in $\\R^d$. We consider both the cases where the dimension~$d$ is small and large relative to the time horizon $T$. We firs… ▽ More We consider the problem of online linear regression on individual sequences. The goal in this paper is for the forecaster to output sequential predictions which are, after $T$ time rounds, almost as good as the ones output by the best linear predictor in a given $\ell^1$-ball in $\\R^d$. We consider both the cases where the dimension~$d$ is small and large relative to the time horizon $T$. We first present regret bounds with optimal dependencies on $d$, $T$, and on the sizes $U$, $X$ and $Y$ of the $\ell^1$-ball, the input data and the observations. The minimax regret is shown to exhibit a regime transition around the point $d = \sqrt{T} U X / (2 Y)$. Furthermore, we present efficient algorithms that are adaptive, \ie, that do not require the knowledge of $U$, $X$, $Y$, and $T$, but still achieve nearly optimal regret bounds. △ Less

Submitted 16 January, 2019; v1 submitted 20 May, 2011; originally announced May 2011.

Journal ref: Theoretical Computer Science, Elsevier, 2014, 519, pp.4-28

Showing 1–6 of 6 results for author: Yu, J Y