-
On a new class of tests for the Pareto distribution using Fourier methods
Authors:
L. Ndwandwe,
J. S. Allison,
M. Smuts,
I. J. H. Visagie
Abstract:
We propose new classes of tests for the Pareto type I distribution using the empirical characteristic function. These tests are $U$ and $V$ statistics based on a characterisation of the Pareto distribution involving the distribution of the sample minimum. In addition to deriving simple computational forms for the proposed test statistics, we prove consistency against a wide range of fixed alternat…
▽ More
We propose new classes of tests for the Pareto type I distribution using the empirical characteristic function. These tests are $U$ and $V$ statistics based on a characterisation of the Pareto distribution involving the distribution of the sample minimum. In addition to deriving simple computational forms for the proposed test statistics, we prove consistency against a wide range of fixed alternatives. A Monte Carlo study is included in which the newly proposed tests are shown to produce high powers. These powers include results relating to fixed alternatives as well as local powers against mixture distributions. The use of the proposed tests is illustrated using an observed data set.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Testing for the Pareto type I distribution: A comparative study
Authors:
L. Ndwandwe,
J. S. Allison,
L. Santana,
I. J. H. Visagie
Abstract:
Pareto distributions are widely used models in economics, finance and actuarial sciences. As a result, a number of goodness-of-fit tests have been proposed for these distributions in the literature. We provide an overview of the existing tests for the Pareto distribution, focussing specifically on the Pareto type I distribution. To date, only a single overview paper on goodness-of-fit testing for…
▽ More
Pareto distributions are widely used models in economics, finance and actuarial sciences. As a result, a number of goodness-of-fit tests have been proposed for these distributions in the literature. We provide an overview of the existing tests for the Pareto distribution, focussing specifically on the Pareto type I distribution. To date, only a single overview paper on goodness-of-fit testing for Pareto distributions has been published. However, the mentioned paper has a much wider scope than is the case for the current paper as it covers multiple types of Pareto distributions. The current paper differs in a number of respects. First, the narrower focus on the Pareto type I distribution allows a larger number of tests to be included. Second, the current paper is concerned with composite hypotheses compared to the simple hypotheses (specifying the parameters of the Pareto distribution in question) considered in the mentioned overview. Third, the sample sizes considered in the two papers differ substantially.
In addition, we consider two different methods of fitting the Pareto Type I distribution; the method of maximum likelihood and a method closely related to moment matching. It is demonstrated that the method of estimation has a profound effect, not only on the powers achieved by the various tests, but also on the way in which numerical critical values are calculated. We show that, when using maximum likelihood, the resulting critical values are shape invariant and can be obtained using a Monte Carlo procedure. This is not the case when moment matching is employed.
The paper includes an extensive Monte Carlo power study. Based on the results obtained, we recommend the use of a test based on the phi divergence together with maximum likelihood estimation.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
A proposed simulation technique for population stability testing in credit risk scorecards
Authors:
J. du Pisanie,
J. S. Allison,
I. J. H. Visagie
Abstract:
Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of a potential customer. In order to ensure that a scorecard remains a representative model of the population one tests the hypothesis of population stability; specifying that the distribution of clients' attributes remains constant ov…
▽ More
Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of a potential customer. In order to ensure that a scorecard remains a representative model of the population one tests the hypothesis of population stability; specifying that the distribution of clients' attributes remains constant over time. Simulating realistic data sets for this purpose is nontrivial as these data sets are multivariate and contain intricate dependencies. The simulation of these data sets are of practical interest for both practitioners and for researchers; practitioners may wish to consider the effect that a specified change in the properties of the data has on the scorecard and its usefulness from a business perspective, while researchers may wish to test a newly developed technique in credit scoring.
We propose a simulation technique based on the specification of bad ratios, this is explained below. Practitioners can generally not be expected to provide realistic parameter values for a scorecard; these models are simply too complex and contain too many parameters to make such a specification viable. However, practitioners can often confidently specify the bad ratio associated with two different levels of a specific attribute. That is, practitioners are often comfortable with making statements such as "on average a new customer is 1.5 times as likely to default as an existing customer with similar attributes". We propose a method which can be used to obtain parameter values for a scorecard based on specified bad ratios. The proposed technique is demonstrated using a realistic example and we show that the simulated data sets adhere closely to the specified bad ratios. The paper provides a link to a github project in which the R code used in order to generate the results shown can be found.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
New classes of tests for the Weibull distribution using Stein's method in the presence of random right censoring
Authors:
E Bothma,
JS Allison,
IJH Visagie
Abstract:
We develop two new classes of tests for the Weibull distribution based on Stein's method. The proposed tests are applied in the full sample case as well as in the framework of random right censoring. We investigate the finite sample performance of the new tests using a comprehensive Monte Carlo study. In both the absence and presence of censoring, it is found that the newly proposed classes of tes…
▽ More
We develop two new classes of tests for the Weibull distribution based on Stein's method. The proposed tests are applied in the full sample case as well as in the framework of random right censoring. We investigate the finite sample performance of the new tests using a comprehensive Monte Carlo study. In both the absence and presence of censoring, it is found that the newly proposed classes of tests outperform competing tests against the majority of the distributions considered. In the cases where censoring is present we consider various censoring distributions. Some remarks on the asymptotic properties of the proposed tests are included. The paper presents another result of independent interest; the test initially proposed in Krit (2014) for use with full samples is amended to allow for testing for the Weibull distribution in the presence of censoring. The techniques developed in the paper are illustrated using two practical examples. In the first, we consider the survival times of patients with a certain type of leukemia. The second example is concerned with the initial remission times of leukemia patients, where the observed remission times are subject to random right censoring. We further include some concluding remarks along with avenues for future research.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Kaplan-Meier based tests for exponentiality in the presence of censoring
Authors:
E. Bothma,
J. S. Allison,
M. Cockeran,
I. J. H. Visagie
Abstract:
In this paper we test the composite hypothesis that lifetimes follow an exponential distribution based on observed randomly right censored data. Testing this hypothesis is complicated by the presence of this censoring, due to the fact that not all lifetimes are observed. To account for this complication, we propose modifications to tests based on the empirical characteristic function and Laplace t…
▽ More
In this paper we test the composite hypothesis that lifetimes follow an exponential distribution based on observed randomly right censored data. Testing this hypothesis is complicated by the presence of this censoring, due to the fact that not all lifetimes are observed. To account for this complication, we propose modifications to tests based on the empirical characteristic function and Laplace transform. In the full sample case these empirical functions can be expressed as integrals with respect to the empirical distribution function of the lifetimes. We propose replacing this estimate of the distribution function by the Kaplan-Meier estimate. The resulting test statistics can be expressed in easily calculable forms in terms of summations of functionals of the observed data. Additionally, a general framework for goodness-of-fit testing, in the presence of random right censoring, is outlined. A Monte Carlo study is performed, the results of which indicate that the newly modified tests generally outperform the existing tests. A practical application, concerning initial remission times of leukemia patients, is discussed along with some concluding remarks and avenues for future research.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
New weighted $L^2$-type tests for the inverse Gaussian distribution
Authors:
J. S. Allison,
S. Betsch,
B. Ebner,
I. J. H. Visagie
Abstract:
We propose a new class of goodness-of-fit tests for the inverse Gaussian distribution. The proposed tests are weighted $L^2$-type tests depending on a tuning parameter. We develop the asymptotic theory under the null hypothesis and under a broad class of alternative distributions. These results are used to show that the parametric bootstrap procedure, which we employ to implement the test, is asym…
▽ More
We propose a new class of goodness-of-fit tests for the inverse Gaussian distribution. The proposed tests are weighted $L^2$-type tests depending on a tuning parameter. We develop the asymptotic theory under the null hypothesis and under a broad class of alternative distributions. These results are used to show that the parametric bootstrap procedure, which we employ to implement the test, is asymptotically valid and that the whole test procedure is consistent. A comparative simulation study for finite sample sizes shows that the new procedure is competitive to classical and recent tests, outperforming these other methods almost uniformly over a large set of alternative distributions. The use of the newly proposed test is illustrated with two observed data sets.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
On the conditional distribution of the mean of the two closest among a set of three observations
Authors:
I. J. H. Visagie,
F. Lombard
Abstract:
Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the "best of three" method involves taking three…
▽ More
Chemical analyses of raw materials are often repeated in duplicate or triplicate. The assay values obtained are then combined using a predetermined formula to obtain an estimate of the true value of the material of interest. When duplicate observations are obtained, their average typically serves as an estimate of the true value. On the other hand, the "best of three" method involves taking three measurements and using the average of the two closest ones as estimate of the true value.
In this paper, we consider another method which potentially involves three measurements. Initially two measurements are obtained and if their difference is sufficiently small, their average is taken as estimate of the true value. However, if the difference is too large then a third independent measurement is obtained. The estimator is then defined as the average between the third observation and the one among the first two which is closest to it.
Our focus in the paper is the conditional distribution of the estimate in cases where the initial difference is too large. We find that the conditional distributions are markedly different under the assumption of a normal distribution and a Laplace distribution.
△ Less
Submitted 28 June, 2019;
originally announced June 2019.