-
On new tests for the Poisson distribution based on empirical weight functions
Authors:
Winnie Kirui,
Elzanie Bothma,
Marius Smuts,
Anke Steyn,
Jaco Visagie
Abstract:
We propose new goodness-of-fit tests for the Poisson distribution. The testing procedure entails fitting a weighted Poisson distribution, which has the Poisson as a special case, to observed data. Based on sample data, we calculate an empirical weight function which is compared to its theoretical counterpart under the Poisson assumption. Weighted Lp distances between these empirical and theoretica…
▽ More
We propose new goodness-of-fit tests for the Poisson distribution. The testing procedure entails fitting a weighted Poisson distribution, which has the Poisson as a special case, to observed data. Based on sample data, we calculate an empirical weight function which is compared to its theoretical counterpart under the Poisson assumption. Weighted Lp distances between these empirical and theoretical functions are proposed as test statistics and closed form expressions are derived for L1, L2 and L1 distances. A Monte Carlo study is included in which the newly proposed tests are shown to be powerful when compared to existing tests, especially in the case of overdispersed alternatives. We demonstrate the use of the tests with two practical examples.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Revisiting the memoryless property -- testing for the Pareto type I distribution
Authors:
Lethani Ndwandwe,
James Allison,
Leonard Santana,
Jaco Visagie
Abstract:
We propose new goodness-of-fit tests for the Pareto type I distribution. These tests are based on a multiplicative version of the memoryless property which characterises this distribution. We present the results of a Monte Carlo power study demonstrating that the proposed tests are powerful compared to existing tests. As a result of independent interest, we demonstrate that tests specifically deve…
▽ More
We propose new goodness-of-fit tests for the Pareto type I distribution. These tests are based on a multiplicative version of the memoryless property which characterises this distribution. We present the results of a Monte Carlo power study demonstrating that the proposed tests are powerful compared to existing tests. As a result of independent interest, we demonstrate that tests specifically developed for the Pareto type I distribution substantially outperform tests for exponentiality applied to log-transformed data (since Pareto type I distributed values can be transformed to exponentiality via a simple log-transformation). Specifically, the newly proposed tests based on the multiplicative memoryless property of the Pareto distribution substantially outperform a test based on the memoryless property of the exponential distribution. The practical use of tests is illustrated by testing the hypothesis that two sets of observed golfers' earnings (those of the PGA and LIV tours) are realised from Pareto distributions.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
A critical review of existing and new population stability testing procedures in credit risk scoring
Authors:
Johan du Pisanie,
James Allison,
Christian Budde,
Jaco Visagie
Abstract:
Credit scorecards are models used for the modelling of the probability of default of clients. The decision to extend credit to an applicant, as well as the price of the credit, is often based on these models. In order to ensure that scorecards remain accurate over time, the hypothesis of population stability is tested periodically; that is, the hypothesis that the distributions of the attributes o…
▽ More
Credit scorecards are models used for the modelling of the probability of default of clients. The decision to extend credit to an applicant, as well as the price of the credit, is often based on these models. In order to ensure that scorecards remain accurate over time, the hypothesis of population stability is tested periodically; that is, the hypothesis that the distributions of the attributes of clients at the time when the scorecard was developed is still representative of these distributions at review is tested. A number of measures of population stability are used in practice, with several being proposed in the recent literature. This paper provides a critical review of several testing procedures for the mentioned hypothesis. The widely used population stability index is discussed alongside two recently proposed techniques. Additionally, the use of classical goodness-of-fit techniques is considered and the problems associated with large samples are investigated. In addition to the existing testing procedures, we propose two new techniques which can be used to test population stability. The first is based on the calculation of effect sizes which does not suffer the same problems as classical goodness-of-fit techniques when faced with large samples. The second proposed procedure is the so-called overlap** statistic. We argue that this simple measure can be useful due to its intuitive interpretation. In order to demonstrate the use of the various measures, as well as to highlight their strengths and weaknesses, several numerical examples are included.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
On a new class of tests for the Pareto distribution using Fourier methods
Authors:
L. Ndwandwe,
J. S. Allison,
M. Smuts,
I. J. H. Visagie
Abstract:
We propose new classes of tests for the Pareto type I distribution using the empirical characteristic function. These tests are $U$ and $V$ statistics based on a characterisation of the Pareto distribution involving the distribution of the sample minimum. In addition to deriving simple computational forms for the proposed test statistics, we prove consistency against a wide range of fixed alternat…
▽ More
We propose new classes of tests for the Pareto type I distribution using the empirical characteristic function. These tests are $U$ and $V$ statistics based on a characterisation of the Pareto distribution involving the distribution of the sample minimum. In addition to deriving simple computational forms for the proposed test statistics, we prove consistency against a wide range of fixed alternatives. A Monte Carlo study is included in which the newly proposed tests are shown to produce high powers. These powers include results relating to fixed alternatives as well as local powers against mixture distributions. The use of the proposed tests is illustrated using an observed data set.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Testing for the Pareto type I distribution: A comparative study
Authors:
L. Ndwandwe,
J. S. Allison,
L. Santana,
I. J. H. Visagie
Abstract:
Pareto distributions are widely used models in economics, finance and actuarial sciences. As a result, a number of goodness-of-fit tests have been proposed for these distributions in the literature. We provide an overview of the existing tests for the Pareto distribution, focussing specifically on the Pareto type I distribution. To date, only a single overview paper on goodness-of-fit testing for…
▽ More
Pareto distributions are widely used models in economics, finance and actuarial sciences. As a result, a number of goodness-of-fit tests have been proposed for these distributions in the literature. We provide an overview of the existing tests for the Pareto distribution, focussing specifically on the Pareto type I distribution. To date, only a single overview paper on goodness-of-fit testing for Pareto distributions has been published. However, the mentioned paper has a much wider scope than is the case for the current paper as it covers multiple types of Pareto distributions. The current paper differs in a number of respects. First, the narrower focus on the Pareto type I distribution allows a larger number of tests to be included. Second, the current paper is concerned with composite hypotheses compared to the simple hypotheses (specifying the parameters of the Pareto distribution in question) considered in the mentioned overview. Third, the sample sizes considered in the two papers differ substantially.
In addition, we consider two different methods of fitting the Pareto Type I distribution; the method of maximum likelihood and a method closely related to moment matching. It is demonstrated that the method of estimation has a profound effect, not only on the powers achieved by the various tests, but also on the way in which numerical critical values are calculated. We show that, when using maximum likelihood, the resulting critical values are shape invariant and can be obtained using a Monte Carlo procedure. This is not the case when moment matching is employed.
The paper includes an extensive Monte Carlo power study. Based on the results obtained, we recommend the use of a test based on the phi divergence together with maximum likelihood estimation.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
On fitting the Lomax distribution: a comparison between minimum distance estimators and other estimation techniques
Authors:
Thobeka Nombebe,
James Allison,
Leonard Santana,
Jaco Visagie
Abstract:
In this paper we investigate the performance of a variety of estimation techniques for the scale and shape parameter of the Lomax distribution. These methods include traditional methods such as the maximum likelihood estimator and the method of moments estimator. A version of the maximum likelihood estimator adjusted for bias is also included. Furthermore, alternative moment-based estimation techn…
▽ More
In this paper we investigate the performance of a variety of estimation techniques for the scale and shape parameter of the Lomax distribution. These methods include traditional methods such as the maximum likelihood estimator and the method of moments estimator. A version of the maximum likelihood estimator adjusted for bias is also included. Furthermore, alternative moment-based estimation techniques such as the $L$-moment estimator and the probability weighted moments estimator are included along with three different minimum distance estimators. The finite sample performances of each of these estimators is compared via an extensive Monte Carlo study. We find that no single estimator outperforms its competitors uniformly. We recommend one of the minimum distance estimators for use with smaller samples, while a bias reduced version of maximum likelihood estimation is recommended for use with larger samples. In addition, the desirable asymptotic properties of traditional maximum likelihood estimators make them appealing for larger samples. We also include a practical application demonstrating the use of the techniques on observed data.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
A proposed simulation technique for population stability testing in credit risk scorecards
Authors:
J. du Pisanie,
J. S. Allison,
I. J. H. Visagie
Abstract:
Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of a potential customer. In order to ensure that a scorecard remains a representative model of the population one tests the hypothesis of population stability; specifying that the distribution of clients' attributes remains constant ov…
▽ More
Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of a potential customer. In order to ensure that a scorecard remains a representative model of the population one tests the hypothesis of population stability; specifying that the distribution of clients' attributes remains constant over time. Simulating realistic data sets for this purpose is nontrivial as these data sets are multivariate and contain intricate dependencies. The simulation of these data sets are of practical interest for both practitioners and for researchers; practitioners may wish to consider the effect that a specified change in the properties of the data has on the scorecard and its usefulness from a business perspective, while researchers may wish to test a newly developed technique in credit scoring.
We propose a simulation technique based on the specification of bad ratios, this is explained below. Practitioners can generally not be expected to provide realistic parameter values for a scorecard; these models are simply too complex and contain too many parameters to make such a specification viable. However, practitioners can often confidently specify the bad ratio associated with two different levels of a specific attribute. That is, practitioners are often comfortable with making statements such as "on average a new customer is 1.5 times as likely to default as an existing customer with similar attributes". We propose a method which can be used to obtain parameter values for a scorecard based on specified bad ratios. The proposed technique is demonstrated using a realistic example and we show that the simulated data sets adhere closely to the specified bad ratios. The paper provides a link to a github project in which the R code used in order to generate the results shown can be found.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
A new omnibus test of fit based on a characterisation of the uniform distribution
Authors:
Bruno Ebner,
Shawn Liebenberg,
Jaco Visagie
Abstract:
In this paper, we revisit the classical goodness-of-fit problems for univariate distributions; we propose a new testing procedure based on a characterisation of the uniform distribution. Asymptotic theory for the simple hypothesis case is provided in a Hilbert-Space setting, including the asymptotic null distribution as well as values for the first four cumulants of this distribution, which are us…
▽ More
In this paper, we revisit the classical goodness-of-fit problems for univariate distributions; we propose a new testing procedure based on a characterisation of the uniform distribution. Asymptotic theory for the simple hypothesis case is provided in a Hilbert-Space setting, including the asymptotic null distribution as well as values for the first four cumulants of this distribution, which are used to fit a Pearson system of distributions as an approximation to the limit distribution. Numerical results indicate that the null distribution of the test converges quickly to its asymptotic distribution, making the critical values obtained using the Pearson system particularly useful. Consistency of the test is shown against any fixed alternative distribution and we derive the limiting behaviour under fixed alternatives with an application to power approximation. We demonstrate the applicability of the newly proposed test when testing composite hypotheses. A Monte Carlo power study compares the finite sample power performance of the newly proposed test to existing omnibus tests in both the simple and composite hypothesis settings. This power study includes results related to testing for the uniform, normal and Pareto distributions. The empirical results obtained indicate that the test is competitive. An application of the newly proposed test in financial modelling is also included.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Kaplan-Meier based tests for exponentiality in the presence of censoring
Authors:
E. Bothma,
J. S. Allison,
M. Cockeran,
I. J. H. Visagie
Abstract:
In this paper we test the composite hypothesis that lifetimes follow an exponential distribution based on observed randomly right censored data. Testing this hypothesis is complicated by the presence of this censoring, due to the fact that not all lifetimes are observed. To account for this complication, we propose modifications to tests based on the empirical characteristic function and Laplace t…
▽ More
In this paper we test the composite hypothesis that lifetimes follow an exponential distribution based on observed randomly right censored data. Testing this hypothesis is complicated by the presence of this censoring, due to the fact that not all lifetimes are observed. To account for this complication, we propose modifications to tests based on the empirical characteristic function and Laplace transform. In the full sample case these empirical functions can be expressed as integrals with respect to the empirical distribution function of the lifetimes. We propose replacing this estimate of the distribution function by the Kaplan-Meier estimate. The resulting test statistics can be expressed in easily calculable forms in terms of summations of functionals of the observed data. Additionally, a general framework for goodness-of-fit testing, in the presence of random right censoring, is outlined. A Monte Carlo study is performed, the results of which indicate that the newly modified tests generally outperform the existing tests. A practical application, concerning initial remission times of leukemia patients, is discussed along with some concluding remarks and avenues for future research.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
New weighted $L^2$-type tests for the inverse Gaussian distribution
Authors:
J. S. Allison,
S. Betsch,
B. Ebner,
I. J. H. Visagie
Abstract:
We propose a new class of goodness-of-fit tests for the inverse Gaussian distribution. The proposed tests are weighted $L^2$-type tests depending on a tuning parameter. We develop the asymptotic theory under the null hypothesis and under a broad class of alternative distributions. These results are used to show that the parametric bootstrap procedure, which we employ to implement the test, is asym…
▽ More
We propose a new class of goodness-of-fit tests for the inverse Gaussian distribution. The proposed tests are weighted $L^2$-type tests depending on a tuning parameter. We develop the asymptotic theory under the null hypothesis and under a broad class of alternative distributions. These results are used to show that the parametric bootstrap procedure, which we employ to implement the test, is asymptotically valid and that the whole test procedure is consistent. A comparative simulation study for finite sample sizes shows that the new procedure is competitive to classical and recent tests, outperforming these other methods almost uniformly over a large set of alternative distributions. The use of the newly proposed test is illustrated with two observed data sets.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
On the conditional distribution of the mean of the two closest among a set of three observations
Authors:
I. J. H. Visagie,
F. Lombard
Abstract:
Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the "best of three" method involves taking three…
▽ More
Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the "best of three" method involves taking three measurements and using the average of the two closest ones as estimate of the true value.
In this paper, we consider another method which potentially involves three measurements. Initially two measurements are obtained and if their difference is sufficiently small, their average is taken as estimate of the true value. However, if the difference is too large then a third independent measurement is obtained. The estimator is then defined as the average between the third observation and the one among the first two which is closest to it.
Our focus in the paper is the conditional distribution of the estimate in cases where the initial difference is too large. We find that the conditional distributions are markedly different under the assumption of a normal distribution and a Laplace distribution.
△ Less
Submitted 28 June, 2019;
originally announced June 2019.
-
Testing for normality in any dimension based on a partial differential equation involving the moment generating function
Authors:
Norbert Henze,
Jaco Visagie
Abstract:
We use a system of first-order partial differential equations that characterize the moment generating function of the $d$-variate standard normal distribution to construct a class of affine invariant tests for normality in any dimension. We derive the limit null distribution of the resulting test statistics, and we prove consistency of the tests against general alternatives. In the case $d > 1$, a…
▽ More
We use a system of first-order partial differential equations that characterize the moment generating function of the $d$-variate standard normal distribution to construct a class of affine invariant tests for normality in any dimension. We derive the limit null distribution of the resulting test statistics, and we prove consistency of the tests against general alternatives. In the case $d > 1$, a certain limit of these tests is connected with two measures of multivariate skewness. The new tests show strong power performance when compared to well-known competitors, especially against heavy-tailed distributions, and they are illustrated by means of a real data set.
△ Less
Submitted 13 January, 2019;
originally announced January 2019.