Search | arXiv e-print repository

Sequentially valid tests for forecast calibration

Authors: Sebastian Arnold, Alexander Henzi, Johanna F. Ziegel

Abstract: Forecasting and forecast evaluation are inherently sequential tasks. Predictions are often issued on a regular basis, such as every hour, day, or month, and their quality is monitored continuously. However, the classical statistical tools for forecast evaluation are static, in the sense that statistical tests for forecast calibration are only valid if the evaluation period is fixed in advance. Rec… ▽ More Forecasting and forecast evaluation are inherently sequential tasks. Predictions are often issued on a regular basis, such as every hour, day, or month, and their quality is monitored continuously. However, the classical statistical tools for forecast evaluation are static, in the sense that statistical tests for forecast calibration are only valid if the evaluation period is fixed in advance. Recently, e-values have been introduced as a new, dynamic method for assessing statistical significance. An e-value is a non-negative random variable with expected value at most one under a null hypothesis. Large e-values give evidence against the null hypothesis, and the multiplicative inverse of an e-value is a conservative p-value. E-values are particularly suitable for sequential forecast evaluation, since they naturally lead to statistical tests which are valid under optional stop**. This article proposes e-values for testing probabilistic calibration of forecasts, which is one of the most important notions of calibration. The proposed methods are also more generally applicable for sequential goodness-of-fit testing. We demonstrate that the e-values are competitive in terms of power when compared to extant methods, which do not allow sequential testing. Furthermore, they provide important and useful insights in the evaluation of probabilistic weather forecasts. △ Less

Submitted 1 July, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

arXiv:2106.15369 [pdf, other]

Isotonic regression for functionals of elicitation complexity greater than one

Authors: Anja Mühlemann, Johanna F. Ziegel

Abstract: We study the non-parametric isotonic regression problem for bivariate elicitable functionals that are given as an elicitable univariate functional and its Bayes risk. Prominent examples for functionals of this type are (mean, variance) and (Value-at-Risk, Expected Shortfall), where the latter pair consists of important risk measures in finance. We present our results for totally ordered covariates… ▽ More We study the non-parametric isotonic regression problem for bivariate elicitable functionals that are given as an elicitable univariate functional and its Bayes risk. Prominent examples for functionals of this type are (mean, variance) and (Value-at-Risk, Expected Shortfall), where the latter pair consists of important risk measures in finance. We present our results for totally ordered covariates but extenstions to partial orders are given in the appendix. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2103.08402 [pdf, other]

Valid sequential inference on probability forecast performance

Authors: Alexander Henzi, Johanna F. Ziegel

Abstract: Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts a numerical score such that a correct forecast achieves a minimal expected score. In this paper, we construct e-values for testing the statistical significance of score differences of competing forecasts in sequential settings. E-val… ▽ More Probability forecasts for binary events play a central role in many applications. Their quality is commonly assessed with proper scoring rules, which assign forecasts a numerical score such that a correct forecast achieves a minimal expected score. In this paper, we construct e-values for testing the statistical significance of score differences of competing forecasts in sequential settings. E-values have been proposed as an alternative to p-values for hypothesis testing, and they can easily be transformed into conservative p-values by taking the multiplicative inverse. The e-values proposed in this article are valid in finite samples without any assumptions on the data generating processes. They also allow optional stop**, so a forecast user may decide to interrupt evaluation taking into account the available data at any time and still draw statistically valid inference, which is generally not true for classical p-value based tests. In a case study on postprocessing of precipitation forecasts, state-of-the-art forecasts dominance tests and e-values lead to the same conclusions. △ Less

Submitted 1 July, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2006.09219 [pdf, other]

Distributional (Single) Index Models

Authors: Alexander Henzi, Gian-Reto Kleger, Johanna F. Ziegel

Abstract: A Distributional (Single) Index Model (DIM) is a semi-parametric model for distributional regression, that is, estimation of conditional distributions given covariates. The method is a combination of classical single index models for the estimation of the conditional mean of a response given covariates, and isotonic distributional regression. The model for the index is parametric, whereas the cond… ▽ More A Distributional (Single) Index Model (DIM) is a semi-parametric model for distributional regression, that is, estimation of conditional distributions given covariates. The method is a combination of classical single index models for the estimation of the conditional mean of a response given covariates, and isotonic distributional regression. The model for the index is parametric, whereas the conditional distributions are estimated non-parametrically under a stochastic ordering constraint. We show consistency of our estimators and apply them to a highly challenging data set on the length of stay (LoS) of patients in intensive care units. We use the model to provide skillful and calibrated probabilistic predictions for the LoS of individual patients, that outperform the available methods in the literature. △ Less

Submitted 3 August, 2022; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:1909.03725 [pdf, other]

doi 10.1111/rssb.12450

Isotonic Distributional Regression

Authors: Alexander Henzi, Johanna F. Ziegel, Tilmann Gneiting

Abstract: Isotonic distributional regression (IDR) is a powerful nonparametric technique for the estimation of conditional distributions under order restrictions. In a nutshell, IDR learns conditional distributions that are calibrated, and simultaneously optimal relative to comprehensive classes of relevant loss functions, subject to isotonicity constraints in terms of a partial order on the covariate space… ▽ More Isotonic distributional regression (IDR) is a powerful nonparametric technique for the estimation of conditional distributions under order restrictions. In a nutshell, IDR learns conditional distributions that are calibrated, and simultaneously optimal relative to comprehensive classes of relevant loss functions, subject to isotonicity constraints in terms of a partial order on the covariate space. Nonparametric isotonic quantile regression and nonparametric isotonic binary regression emerge as special cases. For prediction, we propose an interpolation method that generalizes extant specifications under the pool adjacent violators algorithm. We recommend the use of IDR as a generic benchmark technique in probabilistic forecast problems, as it does not involve any parameter tuning nor implementation choices, except for the selection of a partial order on the covariate space. The method can be combined with subsample aggregation, with the benefits of smoother regression functions and gains in computational efficiency. In a simulation study, we compare methods for distributional regression in terms of the continuous ranked probability score (CRPS) and $L_2$ estimation error, which are closely linked. In a case study on raw and postprocessed quantitative precipitation forecasts from a leading numerical weather prediction system, IDR is competitive with state of the art techniques. △ Less

Submitted 28 September, 2021; v1 submitted 9 September, 2019; originally announced September 2019.

arXiv:1904.04761 [pdf, other]

Optimal solutions to the isotonic regression problem

Authors: Alexander I. Jordan, Anja Mühlemann, Johanna F. Ziegel

Abstract: In general, the solution to a regression problem is the minimizer of a given loss criterion, and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutions can be found by solely specifying a functional. These solutions will then be minimizers under all loss functions simultaneously as long as the loss functions have the requested fu… ▽ More In general, the solution to a regression problem is the minimizer of a given loss criterion, and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutions can be found by solely specifying a functional. These solutions will then be minimizers under all loss functions simultaneously as long as the loss functions have the requested functional as the Bayes act. For the functional, the only requirement is that it can be defined via an identification function, with examples including the expectation, quantile, and expectile functionals. Generalizing classical results, we characterize the optimal solutions to the isotonic regression problem for such functionals, and extend the results from the case of totally ordered explanatory variables to partial orders. For total orders, we show that any solution resulting from the pool-adjacent-violators algorithm is optimal. It is noteworthy, that simultaneous optimality is unattainable in the unimodal regression problem, despite its close connection. △ Less

Submitted 5 May, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

MSC Class: 62G08

arXiv:1902.04489 [pdf, ps, other]

doi 10.1515/strm-2020-0037

Evaluating Range Value at Risk Forecasts

Authors: Tobias Fissler, Johanna F. Ziegel

Abstract: The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) -- a quantile -- and Expected Shortfall (ES) -- a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, tur… ▽ More The debate of what quantitative risk measure to choose in practice has mainly focused on the dichotomy between Value at Risk (VaR) -- a quantile -- and Expected Shortfall (ES) -- a tail expectation. Range Value at Risk (RVaR) is a natural interpolation between these two prominent risk measures, which constitutes a tradeoff between the sensitivity of the latter and the robustness of the former, turning it into a practically relevant risk measure on its own. As such, there is a need to statistically validate RVaR forecasts and to compare and rank the performance of different RVaR models, tasks subsumed under the term 'backtesting' in finance. The predictive performance is best evaluated and compared in terms of strictly consistent loss or scoring functions. That is, functions which are minimised in expectation by the correct RVaR forecast. Much like ES, it has been shown recently that RVaR does not admit strictly consistent scoring functions, i.e., it is not elicitable. Mitigating this negative result, this paper shows that a triplet of RVaR with two VaR components at different levels is elicitable. We characterise the class of strictly consistent scoring functions for this triplet. Additional properties of these scoring functions are examined, including the diagnostic tool of Murphy diagrams. The results are illustrated with a simulation study, and we put our approach in perspective with respect to the classical approach of trimmed least squares in robust regression. △ Less

Submitted 28 November, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: 25 pages, 2 figures An earlier version of this paper was circulated under the name 'Elicitability of Range Value at Risk'. The presentation has been made more concise and minor errors have been corrected. Statistics & Risk Modeling, 2021

MSC Class: 62C99; 62G35; 62P05; 91G70

Journal ref: Statistics & Risk Modeling, vol. 38, no. 1-2, 2021, pp. 25-46

arXiv:1902.04299 [pdf, ps, other]

Bivariate distributions with ordered marginals

Authors: Sebastian Arnold, Ilya Molchanov, Johanna F. Ziegel

Abstract: This paper provides a characterization of all possible dependency structures between two stochastically ordered random variables. The answer is given in terms of copulas that are compatible with the stochastic order and the marginal distributions. The extremal values for Kendall's $τ$ and Spearman's $ρ$ for all these copulas are given in closed form. We also find an explicit form for the joint dis… ▽ More This paper provides a characterization of all possible dependency structures between two stochastically ordered random variables. The answer is given in terms of copulas that are compatible with the stochastic order and the marginal distributions. The extremal values for Kendall's $τ$ and Spearman's $ρ$ for all these copulas are given in closed form. We also find an explicit form for the joint distribution with the maximal entropy. A multivariate extension and a generalization to random elements in partially ordered spaces are also provided. △ Less

Submitted 13 December, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: 14 pages, 4 figures

arXiv:1901.08826 [pdf]

doi 10.1214/20-AOS2014

Supplement to "Erratum: Higher Order Elicitability and Osband's Principle"

Authors: Tobias Fissler, Johanna F. Ziegel

Abstract: This note corrects conditions in Proposition 3.4 and Theorem 5.2(ii) and comments on imprecisions in Propositions 4.2 and 4.4 in Fissler and Ziegel (2016). This note corrects conditions in Proposition 3.4 and Theorem 5.2(ii) and comments on imprecisions in Propositions 4.2 and 4.4 in Fissler and Ziegel (2016). △ Less

Submitted 6 October, 2020; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: 12 pages, 1 figure, to appear as a supplement in the Annals of Statistics

MSC Class: 62C99; 91B06

Journal ref: Ann. Statist., Volume 49, Number 1 (2021), 614

arXiv:1812.09322 [pdf, ps, other]

Local Estimation of a Multivariate Density and its Derivatives

Authors: Christof Strähl, Johanna F. Ziegel, Lutz Duembgen

Abstract: We analyze four different approaches to estimate a multivariate probability density (or the log-density) and its first and second order derivatives. Two methods, local log-likelihood and local Hyvärinen score estimation, are in terms of weighted scoring rules with local quadratic models. The other two approaches are matching of local moments and kernel density estimation. All estimators depend on… ▽ More We analyze four different approaches to estimate a multivariate probability density (or the log-density) and its first and second order derivatives. Two methods, local log-likelihood and local Hyvärinen score estimation, are in terms of weighted scoring rules with local quadratic models. The other two approaches are matching of local moments and kernel density estimation. All estimators depend on a general kernel, and we use the Gaussian kernel to provide explicit examples. Asymptotic properties of the estimators are derived and compared. In terms of rates of convergence, a refined local moment matching estimator is the best. △ Less

Submitted 10 August, 2020; v1 submitted 21 December, 2018; originally announced December 2018.

Comments: 41 pages

MSC Class: 62G07; 62F12

arXiv:1808.07339 [pdf, ps, other]

Scenario-based Risk Evaluation

Authors: Ruodu Wang, Johanna F. Ziegel

Abstract: Risk measures such as Expected Shortfall (ES) and Value-at-Risk (VaR) have been prominent in banking regulation and financial risk management. Motivated by practical considerations in the assessment and management of risks, including tractability, scenario relevance and robustness, we consider theoretical properties of scenario-based risk evaluation. We propose several novel scenario-based risk me… ▽ More Risk measures such as Expected Shortfall (ES) and Value-at-Risk (VaR) have been prominent in banking regulation and financial risk management. Motivated by practical considerations in the assessment and management of risks, including tractability, scenario relevance and robustness, we consider theoretical properties of scenario-based risk evaluation. We propose several novel scenario-based risk measures, including various versions of Max-ES and Max-VaR, and study their properties. We establish axiomatic characterizations of scenario-based risk measures that are comonotonic-additive or coherent and an ES-based representation result is obtained. These results provide a theoretical foundation for the recent Basel III & IV market risk calculation formulas. We illustrate the theory with financial data examples. △ Less

Submitted 4 May, 2021; v1 submitted 22 August, 2018; originally announced August 2018.

arXiv:1805.09902 [pdf, other]

Generic Conditions for Forecast Dominance

Authors: Fabian Krüger, Johanna F. Ziegel

Abstract: Recent studies have analyzed whether one forecast method dominates another under a class of consistent scoring functions. While the existing literature focuses on empirical tests of forecast dominance, little is known about the theoretical conditions under which one forecast dominates another. To address this question, we derive a new characterization of dominance among forecasts of the mean funct… ▽ More Recent studies have analyzed whether one forecast method dominates another under a class of consistent scoring functions. While the existing literature focuses on empirical tests of forecast dominance, little is known about the theoretical conditions under which one forecast dominates another. To address this question, we derive a new characterization of dominance among forecasts of the mean functional. We present various scenarios under which dominance occurs. Unlike existing results, our results allow for the case that the forecasts' underlying information sets are not nested, and allow for uncalibrated forecasts that suffer, e.g., from model misspecification or parameter estimation error. We illustrate the empirical relevance of our results via data examples from finance and economics. △ Less

Submitted 18 December, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1712.05279 [pdf, ps, other]

Strictly proper kernel scores and characteristic kernels on compact spaces

Authors: Ingo Steinwart, Johanna F. Ziegel

Abstract: Strictly proper kernel scores are well-known tool in probabilistic forecasting, while characteristic kernels have been extensively investigated in the machine learning literature. We first show that both notions coincide, so that insights from one part of the literature can be used in the other. We then show that the metric induced by a characteristic kernel cannot reliably distinguish between dis… ▽ More Strictly proper kernel scores are well-known tool in probabilistic forecasting, while characteristic kernels have been extensively investigated in the machine learning literature. We first show that both notions coincide, so that insights from one part of the literature can be used in the other. We then show that the metric induced by a characteristic kernel cannot reliably distinguish between distributions that are far apart in the total variation norm as soon as the underlying space of measures is infinite dimensional. In addition, we provide a characterization of characteristic kernels in terms of eigenvalues and -functions and apply this characterization to the case of continuous kernels on (locally) compact spaces. In the compact case we further show that characteristic kernels exist if and only if the space is metrizable. As special cases of our general theory we investigate translation-invariant kernels on compact Abelian groups and isotropic kernels on spheres. The latter are of particular interest for forecast evaluation of probabilistic predictions on spherical domains as frequently encountered in meteorology and climatology. △ Less

Submitted 14 December, 2017; originally announced December 2017.

arXiv:1711.09628 [pdf, ps, other]

doi 10.1214/19-EJS1552

Order-Sensitivity and Equivariance of Scoring Functions

Authors: Tobias Fissler, Johanna F. Ziegel

Abstract: The relative performance of competing point forecasts is usually measured in terms of loss or scoring functions. It is widely accepted that these scoring function should be strictly consistent in the sense that the expected score is minimized by the correctly specified forecast for a certain statistical functional such as the mean, median, or a certain risk measure. Thus, strict consistency opens… ▽ More The relative performance of competing point forecasts is usually measured in terms of loss or scoring functions. It is widely accepted that these scoring function should be strictly consistent in the sense that the expected score is minimized by the correctly specified forecast for a certain statistical functional such as the mean, median, or a certain risk measure. Thus, strict consistency opens the way to meaningful forecast comparison, but is also important in regression and M-estimation. Usually strictly consistent scoring functions for an elicitable functional are not unique. To give guidance on the choice of a scoring function, this paper introduces two additional quality criteria. Order-sensitivity opens the possibility to compare two deliberately misspecified forecasts given that the forecasts are ordered in a certain sense. On the other hand, equivariant scoring functions obey similar equivariance properties as the functional at hand - such as translation invariance or positive homogeneity. In our study, we consider scoring functions for popular functionals, putting special emphasis on vector-valued functionals, e.g. the pair (mean, variance) or (Value at Risk, Expected Shortfall). △ Less

Submitted 27 November, 2017; originally announced November 2017.

Comments: 45 pages

MSC Class: 62C99; 62F07; 62G99; 91B06

Journal ref: Electronic Journal of Statistics, Volume 13, Number 1 (2019), 1166-1211

arXiv:1707.05108 [pdf, ps, other]

Dynamic Semiparametric Models for Expected Shortfall (and Value-at-Risk)

Authors: Andrew J. Patton, Johanna F. Ziegel, Rui Chen

Abstract: Expected Shortfall (ES) is the average return on a risky asset conditional on the return being below some quantile of its distribution, namely its Value-at-Risk (VaR). The Basel III Accord, which will be implemented in the years leading up to 2019, places new attention on ES, but unlike VaR, there is little existing work on modeling ES. We use recent results from statistical decision theory to ove… ▽ More Expected Shortfall (ES) is the average return on a risky asset conditional on the return being below some quantile of its distribution, namely its Value-at-Risk (VaR). The Basel III Accord, which will be implemented in the years leading up to 2019, places new attention on ES, but unlike VaR, there is little existing work on modeling ES. We use recent results from statistical decision theory to overcome the problem of "elicitability" for ES by jointly modelling ES and VaR, and propose new dynamic models for these risk measures. We provide estimation and inference methods for the proposed models, and confirm via simulation studies that the methods have good finite-sample properties. We apply these models to daily returns on four international equity indices, and find the proposed new ES-VaR models outperform forecasts based on GARCH or rolling window models. △ Less

Submitted 17 July, 2017; originally announced July 2017.

arXiv:1705.04537 [pdf, other]

Murphy Diagrams: Forecast Evaluation of Expected Shortfall

Authors: Johanna F. Ziegel, Fabian Krüger, Alexander Jordan, Fernando Fasciati

Abstract: Motivated by the Basel 3 regulations, recent studies have considered joint forecasts of Value-at-Risk and Expected Shortfall. A large family of scoring functions can be used to evaluate forecast performance in this context. However, little intuitive or empirical guidance is currently available, which renders the choice of scoring function awkward in practice. We therefore develop graphical checks… ▽ More Motivated by the Basel 3 regulations, recent studies have considered joint forecasts of Value-at-Risk and Expected Shortfall. A large family of scoring functions can be used to evaluate forecast performance in this context. However, little intuitive or empirical guidance is currently available, which renders the choice of scoring function awkward in practice. We therefore develop graphical checks (Murphy diagrams) of whether one forecast method dominates another under a relevant class of scoring functions, and propose an associated hypothesis test. We illustrate these tools with simulation examples and an empirical analysis of S&P 500 and DAX returns. △ Less

Submitted 12 May, 2017; originally announced May 2017.

Report number: Discussion paper nr. 632, AWI, Heidelberg University

arXiv:1608.05498 [pdf, ps, other]

Elicitability and backtesting: Perspectives for banking regulation

Authors: Natalia Nolde, Johanna F. Ziegel

Abstract: Conditional forecasts of risk measures play an important role in internal risk management of financial institutions as well as in regulatory capital calculations. In order to assess forecasting performance of a risk measurement procedure, risk measure forecasts are compared to the realized financial losses over a period of time and a statistical test of correctness of the procedure is conducted. T… ▽ More Conditional forecasts of risk measures play an important role in internal risk management of financial institutions as well as in regulatory capital calculations. In order to assess forecasting performance of a risk measurement procedure, risk measure forecasts are compared to the realized financial losses over a period of time and a statistical test of correctness of the procedure is conducted. This process is known as backtesting. Such traditional backtests are concerned with assessing some optimality property of a set of risk measure estimates. However, they are not suited to compare different risk estimation procedures. We investigate the proposal of comparative backtests, which are better suited for method comparisons on the basis of forecasting accuracy, but necessitate an elicitable risk measure. We argue that supplementing traditional backtests with comparative backtests will enhance the existing trading book regulatory framework for banks by providing the correct incentive for accuracy of risk measure forecasts. In addition, the comparative backtesting framework could be used by banks internally as well as by researchers to guide selection of forecasting methods. The discussion focuses on three risk measures, Value-at-Risk, expected shortfall and expectiles, and is supported by a simulation study and data analysis. △ Less

Submitted 21 February, 2017; v1 submitted 19 August, 2016; originally announced August 2016.

arXiv:1603.06727 [pdf, ps, other]

Derivatives of isotropic positive definite functions on spheres

Authors: Mara Trübner, Johanna F. Ziegel

Abstract: We show that isotropic positive definite functions on the $d$-dimensional sphere which are $2k$ times differentiable at zero have $2k+[(d-1)/2]$ continuous derivatives on $(0,π)$. This result is analogous to the result for radial positive definite functions on Euclidean spaces. We prove optimality of the result for all odd dimensions. The proof relies on montée, descente and turning bands operator… ▽ More We show that isotropic positive definite functions on the $d$-dimensional sphere which are $2k$ times differentiable at zero have $2k+[(d-1)/2]$ continuous derivatives on $(0,π)$. This result is analogous to the result for radial positive definite functions on Euclidean spaces. We prove optimality of the result for all odd dimensions. The proof relies on montée, descente and turning bands operators on spheres which parallel the corresponding operators originating in the work of Matheron for radial positive definite functions on Euclidian spaces. △ Less

Submitted 22 March, 2016; originally announced March 2016.

arXiv:1507.00244 [pdf, ps, other]

Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting

Authors: Tobias Fissler, Johanna F. Ziegel, Tilmann Gneiting

Abstract: In this note, we comment on the relevance of elicitability for backtesting risk measure estimates. In particular, we propose the use of Diebold-Mariano tests, and show how they can be implemented for Expected Shortfall (ES), based on the recent result of Fissler and Ziegel (2015) that ES is jointly elicitable with Value at Risk. In this note, we comment on the relevance of elicitability for backtesting risk measure estimates. In particular, we propose the use of Diebold-Mariano tests, and show how they can be implemented for Expected Shortfall (ES), based on the recent result of Fissler and Ziegel (2015) that ES is jointly elicitable with Value at Risk. △ Less

Submitted 12 July, 2015; v1 submitted 1 July, 2015; originally announced July 2015.

Journal ref: Risk, January 2016, 58-61

arXiv:1505.05314 [pdf, other]

Cross-calibration of probabilistic forecasts

Authors: Christof Strähl, Johanna F. Ziegel

Abstract: When providing probabilistic forecasts for uncertain future events, it is common to strive for calibrated forecasts, that is, the predictive distribution should be compatible with the observed outcomes. Several notions of calibration are available in the case of a single forecaster alongside with diagnostic tools and statistical tests to assess calibration in practice. Often, there is more than on… ▽ More When providing probabilistic forecasts for uncertain future events, it is common to strive for calibrated forecasts, that is, the predictive distribution should be compatible with the observed outcomes. Several notions of calibration are available in the case of a single forecaster alongside with diagnostic tools and statistical tests to assess calibration in practice. Often, there is more than one forecaster providing predictions, and these forecasters may use information of the others and therefore influence one another. We extend common notions of calibration, where each forecaster is analysed individually, to notions of cross-calibration where each forecaster is analysed with respect to the other forecasters in a natural way. It is shown theoretically and in simulation studies that cross-calibration is a stronger requirement on a forecaster than calibration. Analogously to calibration for individual forecasters, we provide diagnostic tools and statistical tests to assess forecasters in terms of cross-calibration. The methods are illustrated in simulation examples and applied to probabilistic forecasts for inflation rates by the Bank of England. △ Less

Submitted 20 May, 2015; originally announced May 2015.

arXiv:1503.08123 [pdf, ps, other]

doi 10.1214/16-AOS1439

Higher order elicitability and Osband's principle

Authors: Tobias Fissler, Johanna F. Ziegel

Abstract: A statistical functional, such as the mean or the median, is called elicitable if there is a scoring function or loss function such that the correct forecast of the functional is the unique minimizer of the expected score. Such scoring functions are called strictly consistent for the functional. The elicitability of a functional opens the possibility to compare competing forecasts and to rank them… ▽ More A statistical functional, such as the mean or the median, is called elicitable if there is a scoring function or loss function such that the correct forecast of the functional is the unique minimizer of the expected score. Such scoring functions are called strictly consistent for the functional. The elicitability of a functional opens the possibility to compare competing forecasts and to rank them in terms of their realized scores. In this paper, we explore the notion of elicitability for multi-dimensional functionals and give both necessary and sufficient conditions for strictly consistent scoring functions. We cover the case of functionals with elicitable components, but we also show that one-dimensional functionals that are not elicitable can be a component of a higher order elicitable functional. In the case of the variance this is a known result. However, an important result of this paper is that spectral risk measures with a spectral measure with finite support are jointly elicitable if one adds the `correct' quantiles. A direct consequence of applied interest is that the pair (Value at Risk, Expected Shortfall) is jointly elicitable under mild conditions that are usually fulfilled in risk management applications. △ Less

Submitted 30 September, 2015; v1 submitted 27 March, 2015; originally announced March 2015.

Comments: 32 pages

MSC Class: 62C99; 91B06

Journal ref: The Annals of Statistics 2016, Vol. 44, No. 4, 1680-1707

arXiv:1411.0426 [pdf, ps, other]

Risk measures with the CxLS property

Authors: Freddy Delbaen, Fabio Bellini, Valeria Bignozzi, Johanna F. Ziegel

Abstract: In the present contribution we characterize law determined convex risk measures that have convex level sets at the level of distributions. By relaxing the assumptions in Weber (2006), we show that these risk measures can be identified with a class of generalized shortfall risk measures. As a direct consequence, we are able to extend the results in Ziegel (2014) and Bellini and Bignozzi (2014) on c… ▽ More In the present contribution we characterize law determined convex risk measures that have convex level sets at the level of distributions. By relaxing the assumptions in Weber (2006), we show that these risk measures can be identified with a class of generalized shortfall risk measures. As a direct consequence, we are able to extend the results in Ziegel (2014) and Bellini and Bignozzi (2014) on convex elicitable risk measures and confirm that expectiles are the only elicitable coherent risk measures. Further, we provide a simple characterization of robustness for convex risk measures in terms of a weak notion of mixture continuity. △ Less

Submitted 3 November, 2014; originally announced November 2014.

arXiv:1405.3769

Distortion Risk Measures and Elicitability

Authors: Ruodu Wang, Johanna F. Ziegel

Abstract: We discuss equivalent axiomatic characterizations of distortion risk measures, and give a novel and concise proof of the characterization of elicitable distortion risk measures. Elicitability has recently been discussed as a desirable criterion for risk measures, motivated by statistical considerations of forecasting. We reveal the mathematical conflict between the requirements of elicitability an… ▽ More We discuss equivalent axiomatic characterizations of distortion risk measures, and give a novel and concise proof of the characterization of elicitable distortion risk measures. Elicitability has recently been discussed as a desirable criterion for risk measures, motivated by statistical considerations of forecasting. We reveal the mathematical conflict between the requirements of elicitability and comonotonic additivity which intuitively explains why only Value-at-Risk and the mean are elicitable distortion risk measures in a general sense. △ Less

Submitted 24 May, 2014; v1 submitted 15 May, 2014; originally announced May 2014.

Comments: The paper has been withdrawn by the authors because it is not properly written to be cited by the research community

arXiv:1307.7650 [pdf, ps, other]

Copula Calibration

Authors: Johanna F. Ziegel, Tilmann Gneiting

Abstract: We propose notions of calibration for probabilistic forecasts of general multivariate quantities. Probabilistic copula calibration is a natural analogue of probabilistic calibration in the univariate setting. It can be assessed empirically by checking for the uniformity of the copula probability integral transform (CopPIT), which is invariant under coordinate permutations and coordinatewise strict… ▽ More We propose notions of calibration for probabilistic forecasts of general multivariate quantities. Probabilistic copula calibration is a natural analogue of probabilistic calibration in the univariate setting. It can be assessed empirically by checking for the uniformity of the copula probability integral transform (CopPIT), which is invariant under coordinate permutations and coordinatewise strictly monotone transformations of the predictive distribution and the outcome. The CopPIT histogram can be interpreted as a generalization and variant of the multivariate rank histogram, which has been used to check the calibration of ensemble forecasts. Climatological copula calibration is an analogue of marginal calibration in the univariate setting. Methods and tools are illustrated in a simulation study and applied to compare raw numerical model and statistically postprocessed ensemble forecasts of bivariate wind vectors. △ Less

Submitted 29 July, 2013; originally announced July 2013.

arXiv:1303.1690 [pdf, ps, other]

Coherence and elicitability

Authors: Johanna F. Ziegel

Abstract: The risk of a financial position is usually summarized by a risk measure. As this risk measure has to be estimated from historical data, it is important to be able to verify and compare competing estimation procedures. In statistical decision theory, risk measures for which such verification and comparison is possible, are called elicitable. It is known that quantile based risk measures such as va… ▽ More The risk of a financial position is usually summarized by a risk measure. As this risk measure has to be estimated from historical data, it is important to be able to verify and compare competing estimation procedures. In statistical decision theory, risk measures for which such verification and comparison is possible, are called elicitable. It is known that quantile based risk measures such as value at risk are elicitable. In this paper we show that law-invariant spectral risk measures such as expected shortfall are not elicitable unless they reduce to minus the expected value. Hence, it is unclear how to perform forecast verification or comparison. However, the class of elicitable law-invariant coherent risk measures does not reduce to minus the expected value. We show that it consists of certain expectiles. △ Less

Submitted 31 March, 2014; v1 submitted 7 March, 2013; originally announced March 2013.

arXiv:1210.0358 [pdf, ps, other]

doi 10.1214/13-AAP983

Limit theorems for nondegenerate U-statistics of continuous semimartingales

Authors: Mark Podolskij, Christian Schmidt, Johanna F. Ziegel

Abstract: This paper presents the asymptotic theory for nondegenerate $U$-statistics of high frequency observations of continuous Itô semimartingales. We prove uniform convergence in probability and show a functional stable central limit theorem for the standardized version of the $U$-statistic. The limiting process in the central limit theorem turns out to be conditionally Gaussian with mean zero. Finally,… ▽ More This paper presents the asymptotic theory for nondegenerate $U$-statistics of high frequency observations of continuous Itô semimartingales. We prove uniform convergence in probability and show a functional stable central limit theorem for the standardized version of the $U$-statistic. The limiting process in the central limit theorem turns out to be conditionally Gaussian with mean zero. Finally, we indicate potential statistical applications of our probabilistic results. △ Less

Submitted 9 September, 2014; v1 submitted 1 October, 2012; originally announced October 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-AAP983 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP983

Journal ref: Annals of Applied Probability 2014, Vol. 24, No. 6, 2491-2526

Showing 1–26 of 26 results for author: Ziegel, J F