-
A maximum penalised likelihood approach for semiparametric accelerated failure time models with time-varying covariates and partly interval censoring
Authors:
Aishwarya Bhaskaran,
Ding Ma,
Benoit Liquet,
Angela Hong,
Serigne N Lo,
Stephane Heritier,
Jun Ma
Abstract:
Accelerated failure time (AFT) models are frequently used for modelling survival data. This approach is attractive as it quantifies the direct relationship between the time until an event occurs and various covariates. It asserts that the failure times experience either acceleration or deceleration through a multiplicative factor when these covariates are present. While existing literature provide…
▽ More
Accelerated failure time (AFT) models are frequently used for modelling survival data. This approach is attractive as it quantifies the direct relationship between the time until an event occurs and various covariates. It asserts that the failure times experience either acceleration or deceleration through a multiplicative factor when these covariates are present. While existing literature provides numerous methods for fitting AFT models with time-fixed covariates, adapting these approaches to scenarios involving both time-varying covariates and partly interval-censored data remains challenging. In this paper, we introduce a maximum penalised likelihood approach to fit a semiparametric AFT model. This method, designed for survival data with partly interval-censored failure times, accommodates both time-fixed and time-varying covariates. We utilise Gaussian basis functions to construct a smooth approximation of the nonparametric baseline hazard and fit the model via a constrained optimisation approach. To illustrate the effectiveness of our proposed method, we conduct a comprehensive simulation study. We also present an implementation of our approach on a randomised clinical trial dataset on advanced melanoma patients.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions
Authors:
Dimitris Bertsimas,
Ryan Cory-Wright,
Sean Lo,
Jean Pauphilet
Abstract:
Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We…
▽ More
Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We reformulate these low-rank problems as convex problems over the non-convex set of projection matrices and implement a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often tight class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing that two-by-two minors in each rank-one matrix have determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts, and our disjunctive branch-and-bound scheme solves nxn rank-r matrix completion problems to certifiable optimality in hours for n<=150 and r<=5.
△ Less
Submitted 26 January, 2024; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Applying of the Extreme Value Theory for determining extreme claims in the automobile insurance sector: Case of a China car insurance
Authors:
Daouda Diawara,
Ladji Kane,
Soumaila Dembele,
Gane Samb Lo
Abstract:
According to the Chinese Health Statistics Yearbook, in 2005, the number of traffic accidents was 187781 with total direct property losses of 103691.7 (10000 Yuan). This research aims to fill the gap in the literature by investigating the extreme claim sizes not only for the entire portfolio. This empirical study investigates the behavior of the upper tail of the claim size by class of policyholde…
▽ More
According to the Chinese Health Statistics Yearbook, in 2005, the number of traffic accidents was 187781 with total direct property losses of 103691.7 (10000 Yuan). This research aims to fill the gap in the literature by investigating the extreme claim sizes not only for the entire portfolio. This empirical study investigates the behavior of the upper tail of the claim size by class of policyholders.
△ Less
Submitted 25 September, 2022; v1 submitted 21 September, 2022;
originally announced September 2022.
-
The exact probability law for the approximated similarity from the Minhashing method
Authors:
Soumaila Dembele,
Gane Samb Lo
Abstract:
We propose a probabilistic setting in which we study the probability law of the Rajaraman and Ullman \textit{RU} algorithm and a modified version of it denoted by \textit{RUM}. These algorithms aim at estimating the similarity index between huge texts in the context of the web. We give a foundation of this method by showing, in the ideal case of carefully chosen probability laws, the exact similar…
▽ More
We propose a probabilistic setting in which we study the probability law of the Rajaraman and Ullman \textit{RU} algorithm and a modified version of it denoted by \textit{RUM}. These algorithms aim at estimating the similarity index between huge texts in the context of the web. We give a foundation of this method by showing, in the ideal case of carefully chosen probability laws, the exact similarity is the mathematical expectation of the random similarity provided by the algorithm. Some extensions are given.
\noindent \textbf{Résumé.} Nous proposons un cadre probabilistique dans lequel nous étudions la loi de probabilité de l'algorithme de Rajaraman et Ullman \textit{RU} ainsi qu'une version modifiée de cet algorithme notée \textit{RUM}. Ces alogrithmes visent à estimer l'indice de la similarité entre des textes de grandes tailles dans le contexte du Web. Nous donnons une base de validité de cette méthode en montrant que pour des lois de probabilités minutieusement choisies, la similarité exacte est l'espérance mathématique de la similarité aléatoire donnée par l'algorithme \textit{RUM}. Des généralisations sont abordées.
△ Less
Submitted 25 September, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
A single risk approach to the semiparametric copula competing risks model
Authors:
Simon M. S. Lo,
Ralf A. Wilke
Abstract:
A typical situation in competing risks analysis is that the researcher is only interested in a subset of risks. This paper considers a depending competing risks model with the distribution of one risk being a parametric or semi-parametric model, while the model for the other risks being unknown. Identifiability is shown for popular classes of parametric models and the semiparametric proportional h…
▽ More
A typical situation in competing risks analysis is that the researcher is only interested in a subset of risks. This paper considers a depending competing risks model with the distribution of one risk being a parametric or semi-parametric model, while the model for the other risks being unknown. Identifiability is shown for popular classes of parametric models and the semiparametric proportional hazards model. The identifiability of the parametric models does not require a covariate, while the semiparametric model requires at least one. Estimation approaches are suggested which are shown to be $\sqrt{n}$-consistent. Applicability and attractive finite sample performance are demonstrated with the help of simulations and data examples.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Moments estimators and omnibus chi-square tests for some usual probability laws
Authors:
Gorgui Gning,
Aladji Babacar Niang,
Modou Ngom,
Gane Samb Lo
Abstract:
For many probability laws, in parametric models, the estimation of the parameters can be done in the frame of the maximum likelihood method, or in the frame of moment estimation methods, or by using the plug-in method, etc. Usually, for estimating more than one parameter, the same frame is used. We focus on the moment estimation method in this paper. We use the instrumental tool of the functional…
▽ More
For many probability laws, in parametric models, the estimation of the parameters can be done in the frame of the maximum likelihood method, or in the frame of moment estimation methods, or by using the plug-in method, etc. Usually, for estimating more than one parameter, the same frame is used. We focus on the moment estimation method in this paper. We use the instrumental tool of the functional empirical process (fep) in Lo (2016) to show how it is practical to derive, almost algebraically, the joint distribution Gaussian law and to derive omnibus chi-square asymptotic laws from it. We choose four distributions to illustrate the method (Gamma law, beta law, Uniform law and Fisher law) and completely describe the asymptotic laws of the moment estimators whenever possible. Simulations studies are performed to investigate for each case the smallest sizes for which the obtained statistical tests are recommendable. Generally, the omnibus chi-square test proposed here work fine with sample sizes around fifty
△ Less
Submitted 8 December, 2021;
originally announced December 2021.
-
A Novel Interaction-based Methodology Towards Explainable AI with Better Understanding of Pneumonia Chest X-ray Images
Authors:
Shaw-Hwa Lo,
Yiqiao Yin
Abstract:
In the field of eXplainable AI (XAI), robust "blackbox" algorithms such as Convolutional Neural Networks (CNNs) are known for making high prediction performance. However, the ability to explain and interpret these algorithms still require innovation in the understanding of influential and, more importantly, explainable features that directly or indirectly impact the performance of predictivity. A…
▽ More
In the field of eXplainable AI (XAI), robust "blackbox" algorithms such as Convolutional Neural Networks (CNNs) are known for making high prediction performance. However, the ability to explain and interpret these algorithms still require innovation in the understanding of influential and, more importantly, explainable features that directly or indirectly impact the performance of predictivity. A number of methods existing in literature focus on visualization techniques but the concepts of explainability and interpretability still require rigorous definition. In view of the above needs, this paper proposes an interaction-based methodology -- Influence Score (I-score) -- to screen out the noisy and non-informative variables in the images hence it nourishes an environment with explainable and interpretable features that are directly associated to feature predictivity. We apply the proposed method on a real world application in Pneumonia Chest X-ray Image data set and produced state-of-the-art results. We demonstrate how to apply the proposed approach for more general big data problems by improving the explainability and interpretability without sacrificing the prediction performance. The contribution of this paper opens a novel angle that moves the community closer to the future pipelines of XAI problems.
△ Less
Submitted 15 June, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
High-resolution Probabilistic Precipitation Prediction for use in Climate Simulations
Authors:
Sherman Lo,
Peter Watson,
Peter Dueben,
Ritabrata Dutta
Abstract:
The accurate prediction of precipitation is important to allow for reliable warnings of flood or drought risk in a changing climate. However, to make trust-worthy predictions of precipitation, at a local scale, is one of the most difficult challenges for today's weather and climate models. This is because important features, such as individual clouds and high-resolution topography, cannot be resol…
▽ More
The accurate prediction of precipitation is important to allow for reliable warnings of flood or drought risk in a changing climate. However, to make trust-worthy predictions of precipitation, at a local scale, is one of the most difficult challenges for today's weather and climate models. This is because important features, such as individual clouds and high-resolution topography, cannot be resolved explicitly within simulations due to the significant computational cost of high-resolution simulations. Climate models are typically run at $\sim$50-100 km resolution which is insufficient to represent local precipitation events in satisfying detail. Here, we develop a method to make probabilistic precipitation predictions based on features that climate models can resolve well and that is not highly sensitive to the approximations used in individual models. To predict, we will use a temporal compound Poisson distribution dependent on the output of climate models at a location. We use the output of Earth System models at coarse resolution $\sim$50 km as input and train the statistical models towards precipitation observations over Wales at $\sim$10 km resolution. A Bayesian inferential scheme is provided so that the compound-Poisson model can be inferred using a Gibbs-within-Metropolis-Elliptic-Slice sampling scheme which enables us to quantify the uncertainty of our predictions. In addition, we use a Gaussian process regressor on the posterior samples of the model parameters, to infer a spatially coherent model and hence to produce spatially coherent rainfall prediction. We illustrate the prediction performance of our model by training over 5 years of the data up to 31st December 1999 and predicting precipitation for 20 years afterwards for Cardiff and Wales.
△ Less
Submitted 25 February, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Overview description of the Gambian GABECE Educational Data and associated algorithms and unsupervized learning process
Authors:
Ousmane Saine,
Soumaila Dembélé,
Gane Samb Lo,
Mohamed Cheikh Haidara
Abstract:
As the first paper of a series of exploratory analysis and statistical investigation works on the Gambian \textit{GABECE} data based on a variety of statistical tools, we wish to begin with a thorough unsupervised learning process through descriptive and exploratory methods. This will lead to a variety of discoveries and hypotheses that will direct future research works related to this data.
As the first paper of a series of exploratory analysis and statistical investigation works on the Gambian \textit{GABECE} data based on a variety of statistical tools, we wish to begin with a thorough unsupervised learning process through descriptive and exploratory methods. This will lead to a variety of discoveries and hypotheses that will direct future research works related to this data.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
A Supervised Hybrid Statistical Catch-up System Built on Gabece Gambian Data
Authors:
Tagbo Innocent Aroh,
Ousman Saine,
Soumaila Dembélé,
Gane Samb Lo
Abstract:
In this paper we want to find a statistical rule that assigns a passing or failing grade to students who undertook at least three exams out of four in a national exam, instead of completely dismissing them students. While it is cruel to declare them as failing, especially if the reason for their absence it not intentional, they should have demonstrated enough merit in the three exams taken to dese…
▽ More
In this paper we want to find a statistical rule that assigns a passing or failing grade to students who undertook at least three exams out of four in a national exam, instead of completely dismissing them students. While it is cruel to declare them as failing, especially if the reason for their absence it not intentional, they should have demonstrated enough merit in the three exams taken to deserve a chance to be declared passing. We use a special classification method and nearest neighbors methods based on the average grade and on the most modal grade to build a statistical rule in a supervised learning process. The study is built on the national GABECE educational data which is a considerable data covering seven years and all the six regions of the Gambia.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
MultAV: Multiplicative Adversarial Videos
Authors:
Shao-Yuan Lo,
Vishal M. Patel
Abstract:
The majority of adversarial machine learning research focuses on additive attacks, which add adversarial perturbation to input data. On the other hand, unlike image recognition problems, only a handful of attack approaches have been explored in the video domain. In this paper, we propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV), which impos…
▽ More
The majority of adversarial machine learning research focuses on additive attacks, which add adversarial perturbation to input data. On the other hand, unlike image recognition problems, only a handful of attack approaches have been explored in the video domain. In this paper, we propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV), which imposes perturbation on video data by multiplication. MultAV has different noise distributions to the additive counterparts and thus challenges the defense methods tailored to resisting additive adversarial attacks. Moreover, it can be generalized to not only Lp-norm attacks with a new adversary constraint called ratio bound, but also different types of physically realizable attacks. Experimental results show that the model adversarially trained against additive attack is less robust to MultAV.
△ Less
Submitted 10 October, 2021; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Defending Against Multiple and Unforeseen Adversarial Videos
Authors:
Shao-Yuan Lo,
Vishal M. Patel
Abstract:
Adversarial robustness of deep neural networks has been actively investigated. However, most existing defense approaches are limited to a specific type of adversarial perturbations. Specifically, they often fail to offer resistance to multiple attack types simultaneously, i.e., they lack multi-perturbation robustness. Furthermore, compared to image recognition problems, the adversarial robustness…
▽ More
Adversarial robustness of deep neural networks has been actively investigated. However, most existing defense approaches are limited to a specific type of adversarial perturbations. Specifically, they often fail to offer resistance to multiple attack types simultaneously, i.e., they lack multi-perturbation robustness. Furthermore, compared to image recognition problems, the adversarial robustness of video recognition models is relatively unexplored. While several studies have proposed how to generate adversarial videos, only a handful of approaches about defense strategies have been published in the literature. In this paper, we propose one of the first defense strategies against multiple types of adversarial videos for video recognition. The proposed method, referred to as MultiBN, performs adversarial training on multiple adversarial video types using multiple independent batch normalization (BN) layers with a learning-based BN selection module. With a multiple BN structure, each BN brach is responsible for learning the distribution of a single perturbation type and thus provides more precise distribution estimations. This mechanism benefits dealing with multiple perturbation types. The BN selection module detects the attack type of an input video and sends it to the corresponding BN branch, making MultiBN fully automatic and allowing end-to-end training. Compared to present adversarial training approaches, the proposed MultiBN exhibits stronger multi-perturbation robustness against different and even unforeseen adversarial video types, ranging from Lp-bounded attacks and physically realizable attacks. This holds true on different datasets and target models. Moreover, we conduct an extensive analysis to study the properties of the multiple BN structure.
△ Less
Submitted 14 December, 2021; v1 submitted 11 September, 2020;
originally announced September 2020.
-
Extremes, extremal index estimation, records, moment problem for the Pseudo-Lindley distribution and applications
Authors:
Gane Samb Lo,
Modou Ngom,
Moumouni Diallo
Abstract:
The pseudo-Lindley distribution which was introduced in Zeghdoudi and Nedjar (2016) is studied with regards to its upper tail. In that regard, and when the underlying distribution function follows the Pseudo-Lindley law, we investigate the behavior of its values, the asymptotic normality of the Hill estimator and the double-indexed generalized Hill statistic process (Ngom and Lo), the asymptotic n…
▽ More
The pseudo-Lindley distribution which was introduced in Zeghdoudi and Nedjar (2016) is studied with regards to its upper tail. In that regard, and when the underlying distribution function follows the Pseudo-Lindley law, we investigate the behavior of its values, the asymptotic normality of the Hill estimator and the double-indexed generalized Hill statistic process (Ngom and Lo), the asymptotic normality of the records values and the moment problem.
△ Less
Submitted 14 December, 2019;
originally announced December 2019.
-
Statistical tests for the Pseudo-Lindley distribution and applications
Authors:
Gane Samb Lo,
Tchilabalo Abozou Kpanzou,
Cheikh Mohamed Haidara
Abstract:
The pseudo-Lindley distribution was introduced as a useful generalization of the Lindley distribution in Zeghdoudi and Nedjar (2016) who showed interesting properties of their new laws and efficiencies in modeling data in Reliability and Survival Analysis. In this paper we study the estimators of the pair of parameters and determine their asymptotic law from which a chi-square law is derived. From…
▽ More
The pseudo-Lindley distribution was introduced as a useful generalization of the Lindley distribution in Zeghdoudi and Nedjar (2016) who showed interesting properties of their new laws and efficiencies in modeling data in Reliability and Survival Analysis. In this paper we study the estimators of the pair of parameters and determine their asymptotic law from which a chi-square law is derived. From both asymptotic laws, statistical tests are built. Simulation studies on the tests conclude to their efficiency for data sizes generally used in Reliability. R codes related to statistical analysis on that law are given in an appropriate archive repository code paper in Arxiv.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Non parametric estimation of joint, Renyi-Stallis entropies and mutual information and asymptotic limits
Authors:
Amadou Diadie Ba,
Gane Samb Lo,
Cheikh Tidiane Seck
Abstract:
This paper proposes a new method for estimating the joint probability mass function of a pair of discrete random variables. This estimator is used to construct joint Shannon Rényi-Tsallis entropies, and the mutual information estimates of a pair of discrete random variables. Almost sure consistency and central limit Theorems are established. Our theorical results are validated by simulations.
This paper proposes a new method for estimating the joint probability mass function of a pair of discrete random variables. This estimator is used to construct joint Shannon Rényi-Tsallis entropies, and the mutual information estimates of a pair of discrete random variables. Almost sure consistency and central limit Theorems are established. Our theorical results are validated by simulations.
△ Less
Submitted 10 January, 2020; v1 submitted 15 June, 2019;
originally announced June 2019.
-
Asymptotic laws for upper and strong record values in the extreme domain of attraction and beyond
Authors:
Gane Samb Lo,
Mohammad ahsanullah
Abstract:
Asymptotic laws of records values have usually been investigated as limits in type. In this paper, we use functional representations of the tail of cumulative distribution functions in the extreme value domain of attraction to directly establish asymptotic laws of records value, not necessarily as limits in type. Results beyond the extreme value value domain are provided. Explicit asymptotic laws…
▽ More
Asymptotic laws of records values have usually been investigated as limits in type. In this paper, we use functional representations of the tail of cumulative distribution functions in the extreme value domain of attraction to directly establish asymptotic laws of records value, not necessarily as limits in type. Results beyond the extreme value value domain are provided. Explicit asymptotic laws concerning very usual laws are listed as well. Some of these laws are expected to be used in fitting distribution
△ Less
Submitted 9 May, 2019; v1 submitted 8 May, 2019;
originally announced May 2019.
-
A multinomial Asymptotic Representation of Zenga's Discrete Index, its Influence Function and Data-driven Applications
Authors:
Tchilabalo Abozou Kpanzou,
Diam Ba,
Cherif Moctar Mamadou Traoré,
Gane Samb Lo
Abstract:
In this paper, we consider the Zenga index, one of the most recent inequality index. We keep the finite-valued original form and address the asymptotic theory. The asymptotic normality is established through a multinomial representation. The Influence function is also given. Th results are simulated and applied to Senegalese data.
In this paper, we consider the Zenga index, one of the most recent inequality index. We keep the finite-valued original form and address the asymptotic theory. The asymptotic normality is established through a multinomial representation. The Influence function is also given. Th results are simulated and applied to Senegalese data.
△ Less
Submitted 4 March, 2019;
originally announced March 2019.
-
On some properties of the new Sine-skewed Cardioid Distribution
Authors:
Cherif Mamadou Moctar Traoré,
Moumouni Diallo,
Gane Samb Lo,
Mouhamad Ahsanullah,
Okereke Lois Chinwendu
Abstract:
The new Sine Skewed Cardioid (ssc) distribution been just introduced and characterized by Ahsanullah (2018). Here, we study the asymptotic properties of its tails by determining its extreme value domain, the characteristic function, the moments and likelihood estimators of the two parameters, the asymptotic normality of the moments estimators and the random generation of data from the \textit{ssc}…
▽ More
The new Sine Skewed Cardioid (ssc) distribution been just introduced and characterized by Ahsanullah (2018). Here, we study the asymptotic properties of its tails by determining its extreme value domain, the characteristic function, the moments and likelihood estimators of the two parameters, the asymptotic normality of the moments estimators and the random generation of data from the \textit{ssc} distribution. Finally, we proceed to a simulation study to show the performance of the random generation method and the quality of the moments estimation of the parameters.
△ Less
Submitted 25 November, 2018;
originally announced November 2018.
-
Weak Convergence (IIA) - Functional and Random Aspects of the Univariate Extreme Value Theory
Authors:
Gane Samb Lo,
Modou Ngom,
Tchilabola Abozou Kpanzou,
Mouminou Diallo
Abstract:
The univariate extreme value theory deals with the convergence in type of powers of elements of sequences of cumulative distribution functions on the real line when the power index gets infinite. In terms of convergence of random variables, this amounts to the the weak convergence, in the sense of probability measures weak convergence, of the partial maximas of a sequence of independent and identi…
▽ More
The univariate extreme value theory deals with the convergence in type of powers of elements of sequences of cumulative distribution functions on the real line when the power index gets infinite. In terms of convergence of random variables, this amounts to the the weak convergence, in the sense of probability measures weak convergence, of the partial maximas of a sequence of independent and identically distributed random variables. In this monograph, this theory is comprehensively studied in the broad frame of weak convergence of random vectors as exposed in Lo et al.(2016). It has two main parts. The first is devoted to its nice mathematical foundation. Most of the materials of this part is taken from the most essential Loève(1936,177) and Haan (1970), based on the stunning theory of regular, pi or gamma variation. To prepare the statistical applications, a number contributions I made in my PhD and my Doctorate of Sciences are added in the last chapter of the last chapter of that part. Our real concern is to put these materials together with others, among them those of the authors from his PhD dissertations and Science doctorate thesis, in a way to have an almost full coverage of the theory on the real line that may serve as a master course of one semester in our universities. As well, it will help the second part of the monograph. This second part will deal with statistical estimations problems related to extreme values. It addresses various estimation questions and should be considered as the beginning of a survey study to be updated progressively. Research questions are tackled therein. Many results of the author, either unpublished or not sufficiently known, are stated and/or updated therein.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Geared Rotationally Identical and Invariant Convolutional Neural Network Systems
Authors:
ShihChung B. Lo,
Ph. D.,
Matthew T. Freedman,
M. D.,
Seong K. Mun,
Ph. D.,
Heang-** Chan,
Ph. D
Abstract:
Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting ne…
▽ More
Theorems and techniques to form different types of transformationally invariant processing and to produce the same output quantitatively based on either transformationally invariant operators or symmetric operations have recently been introduced by the authors. In this study, we further propose to compose a geared rotationally identical CNN system (GRI-CNN) with a small step angle by connecting networks of participated processes at the first flatten layer. Using an ordinary CNN structure as a base, requirements for constructing a GRI-CNN include the use of either symmetric input vector or kernels with an angle increment that can form a complete cycle as a "gearwheel". Four basic GRI-CNN structures were studied. Each of them can produce quantitatively identical output results when a rotation angle of the input vector is evenly divisible by the step angle of the gear. Our study showed when an input vector rotated with an angle does not match to a step angle, the GRI-CNN can also produce a highly consistent result. With a design of using an ultra-fine gear-tooth step angle (e.g., 1 degree or 0.1 degree), all four GRI-CNN systems can be constructed virtually isotropically.
△ Less
Submitted 10 August, 2018; v1 submitted 2 August, 2018;
originally announced August 2018.
-
Transformationally Identical and Invariant Convolutional Neural Networks by Combining Symmetric Operations or Input Vectors
Authors:
ShihChung B. Lo,
Matthew T. Freedman,
Seong K. Mun
Abstract:
Transformationally invariant processors constructed by transformed input vectors or operators have been suggested and applied to many applications. In this study, transformationally identical processing based on combining results of all sub-processes with corresponding transformations at one of the processing steps or at the beginning step were found to be equivalent for a given condition. This pr…
▽ More
Transformationally invariant processors constructed by transformed input vectors or operators have been suggested and applied to many applications. In this study, transformationally identical processing based on combining results of all sub-processes with corresponding transformations at one of the processing steps or at the beginning step were found to be equivalent for a given condition. This property can be applied to most convolutional neural network (CNN) systems. Specifically, a transformationally identical CNN can be constructed by arranging internally symmetric operations in parallel with the same transformation family that includes a flatten layer with weights sharing among their corresponding transformation elements. Other transformationally identical CNNs can be constructed by averaging transformed input vectors of the family at the input layer followed by an ordinary CNN process or by a set of symmetric operations. Interestingly, we found that both types of transformationally identical CNN systems are mathematically equivalent by either applying an averaging operation to corresponding elements of all sub-channels before the activation function or without using a non-linear activation function.
△ Less
Submitted 20 August, 2018; v1 submitted 29 July, 2018;
originally announced July 2018.
-
On the influence function for the Theil-like class of inequality measures
Authors:
Tchilabalo Abozou Kpanzou,
Diam Ba,
Pape Djiby Mergane,
Gane Samb Lo
Abstract:
On one hand, a large class of inequality measures, which includes the generalized entropy, the Atkinson, the Gini, etc., for example, has been introduced in Mergane and Lo (2013). On the other hand, the influence function of statistics is an important tool in the asymptotics of a nonparametric statistic. This function has been and is being determined and analysed in various aspects for a large num…
▽ More
On one hand, a large class of inequality measures, which includes the generalized entropy, the Atkinson, the Gini, etc., for example, has been introduced in Mergane and Lo (2013). On the other hand, the influence function of statistics is an important tool in the asymptotics of a nonparametric statistic. This function has been and is being determined and analysed in various aspects for a large number of statistics. We proceed to a unifying study of the IF of all the members of the so-called Theil-like family and regroup those IF's in one formula. Comparative studies become easier.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
A Theil-like Class of Inequality Measures, its Asymptotic Normality Theory and Applications
Authors:
Pape Djiby Mergane,
Tchilabalo Abozou Kpanzou,
Diam Ba,
Gane Samb Lo
Abstract:
In this paper, we consider a coherent theory about the asymptotic representations for a family of inequality indices called Theil-Like Inequality Measures (TLIM), within a Gaussian field. The theory uses the functional empirical process approach. We provide the finite-distribution and uniform asymptotic normality of the elements of the TLIM class in a unified approach rather than in a case by case…
▽ More
In this paper, we consider a coherent theory about the asymptotic representations for a family of inequality indices called Theil-Like Inequality Measures (TLIM), within a Gaussian field. The theory uses the functional empirical process approach. We provide the finite-distribution and uniform asymptotic normality of the elements of the TLIM class in a unified approach rather than in a case by case one. The results are then applied to some UEMOA countries databases.
△ Less
Submitted 20 July, 2018;
originally announced July 2018.
-
Asymptotic Representations of Statistics in the Functional Empirical process : A portal and some applications
Authors:
Gane Samb Lo,
Pape Djiby Mergane,
Thilabola Atozou Kpanzou,
Mohamed Cheikh Haidara
Abstract:
In this research monograph, we deal with a very general asymptotic representation for statistics named GRI expressed in the functional empirical process, both one-dimensional and multidimensional, and another call residual empirical process. Most of statistics in form of combination of L-statistics are covered by the asymptotic theory dealt here. This treatise is conceived to be a kind of \textbf{…
▽ More
In this research monograph, we deal with a very general asymptotic representation for statistics named GRI expressed in the functional empirical process, both one-dimensional and multidimensional, and another call residual empirical process. Most of statistics in form of combination of L-statistics are covered by the asymptotic theory dealt here. This treatise is conceived to be a kind of \textbf{spaceship} on which modules are hanged. The spaceship is a functional Gaussian process and each module is the asymptotic representation of one statistic in terms of that Gaussian process. In that way, it is possible to navigate from one module to another, that is, to find the joint distribution of any pair of statistics, to compare them with respect to the areas and the times. In order to be able to do so, we should have a broad conception at the beginning. Within the constructed frame, the asymptotic joint law of any finite number of other statistics is automatically given as well as the joint distribution of its spatial variation or temporal variation, in absolute or relative values. We also deal with the general problem of decomposability of statistics by comparing statistical decomposability, a new view we introduce, versus functional decomposability. A general result only based on the GRI is provided. \noindent This monograph is also the portal of a handbook of GRI that will cover the largest number possible of statistics. In prevision of that, we treat three important examples as show cases. It is expected that this portal and the handbook will attract the attention of researchers working in the asymptotic area and will furnish useful tools to scientists who are interested in application of asymptotic tests, completed by computer packages.
△ Less
Submitted 24 March, 2018;
originally announced March 2018.
-
Measuring inequality: application of semi-parametric methods to real life data
Authors:
Tchilabalo Abozou Kpanzou,
Tertius de Wet,
Gane Samb Lo
Abstract:
A number of methods have been introduced in order to measure the inequality in various situations such as income and expenditure. In order to curry out statistical inference, one often needs to estimate the available measures of inequality. Many estimators are available in the literature, the most used ones being the non parametric estimators. kpanzou(2011) has developed semi-parametric estimators…
▽ More
A number of methods have been introduced in order to measure the inequality in various situations such as income and expenditure. In order to curry out statistical inference, one often needs to estimate the available measures of inequality. Many estimators are available in the literature, the most used ones being the non parametric estimators. kpanzou(2011) has developed semi-parametric estimators for measures of inequality and showed that these are very appropriate especially for heavy tailed distributions. In this paper we apply such semi-parametric methods to a practical data set and show how they compare to the non parametric estimators. A guidance is also given on the choice of parametric distributions to fit in the tails of the data
△ Less
Submitted 25 December, 2017;
originally announced December 2017.
-
Uniform Rates of Convergence of Some Representations of Extremes : a first approach
Authors:
Tchilabalo Atozou Kpanzou,
Modou Ngom,
Cherif Mamadou Moctar Traoré,
Moumouni Diallo,
Gane Samb Lo
Abstract:
Uniform convergence rates are provided for asymptotic representations of sample extremes. These bounds which are universal in the sense that they do not depend on the extreme value index are meant to be extended to arbitrary samples extremes in coming papers.
Uniform convergence rates are provided for asymptotic representations of sample extremes. These bounds which are universal in the sense that they do not depend on the extreme value index are meant to be extended to arbitrary samples extremes in coming papers.
△ Less
Submitted 28 February, 2020; v1 submitted 25 December, 2017;
originally announced December 2017.
-
Uniform weak convergence of poverty measures with relative poverty lines
Authors:
Cheikh Tidiane Seck,
Gane Samb Lo
Abstract:
This paper introduces a general continuous form of poverty index that encompasses most of the existing formulas in the literature. We then propose a consistent estimator for this index in case the poverty line is a functional of the distribution. We also establish a uniform functional Central Limit Theorem for the proposed estimator over a suitable product class of real-valued functions. As a cons…
▽ More
This paper introduces a general continuous form of poverty index that encompasses most of the existing formulas in the literature. We then propose a consistent estimator for this index in case the poverty line is a functional of the distribution. We also establish a uniform functional Central Limit Theorem for the proposed estimator over a suitable product class of real-valued functions. As a consequence, testing procedures based either on single or simultaneously several poverty indices can be developed. A simulation study showing the asymptotic normality of the estimator is given as well as an application to real data for estimating the effect of relative poverty lines on the variance of the poverty estimates.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Estimating the theoretical error rate for prediction
Authors:
Herman Chernoff,
Shaw-Hwa Lo,
Tian Zheng,
Adeline Lo
Abstract:
Prediction for very large data sets is typically carried out in two stages, variable selection and pattern recognition. Ordinarily variable selection involves seeing how well individual explanatory variables are correlated with the dependent variable. This practice neglects the possible interactions among the variables. Simulations have shown that a statistic I, that we used for variable selection…
▽ More
Prediction for very large data sets is typically carried out in two stages, variable selection and pattern recognition. Ordinarily variable selection involves seeing how well individual explanatory variables are correlated with the dependent variable. This practice neglects the possible interactions among the variables. Simulations have shown that a statistic I, that we used for variable selection is much better correlated with predictivity than significance levels. We explain this by defining theoretical predictivity and show how I is related to predictivity. We calculate the biases of the overoptimistic training estimate of predictivity and of the pessimistic out of sample estimate. Corrections for the bias lead to improved estimates of the potential predictivity using small groups of possibly interacting variables. These results support the use of I in the variable selection phase of prediction for data sets such as in GWAS (Genome wide association studies) where there are very many explanatory variables and modest sample sizes. Reference is made to another publication using I, which led to a reduction in the error rate of prediction from 30% to 8%, for a data set with, 4,918 variables and 97 subjects. This data set had been previously studied by scientists for over 10 years.
△ Less
Submitted 8 September, 2017;
originally announced September 2017.
-
Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes
Authors:
Gane Samb Lo,
Amadou Diadié Ba,
Diam Ba
Abstract:
In this paper we provide the asymptotic theory of the general of $φ$-divergences measures, which includes the most common divergence measures : Renyi and Tsallis families and the Kullback-Leibler measure. Instead of using the Parzen nonparametric estimators of the probability density functions whose discrepancy is estimated, we use the wavelets approach and the geometry of Besov spaces. One-sided…
▽ More
In this paper we provide the asymptotic theory of the general of $φ$-divergences measures, which includes the most common divergence measures : Renyi and Tsallis families and the Kullback-Leibler measure. Instead of using the Parzen nonparametric estimators of the probability density functions whose discrepancy is estimated, we use the wavelets approach and the geometry of Besov spaces. One-sided and two-sided statistical tests are derived as well as symmetrized estimators. Almost sure rates of convergence and asymptotic normality theorem are obtained in the general case, and next particularized for the Renyi and Tsallis families and for the Kullback-Leibler measure as well. The applicability of the results to usual distribution functions is addressed.
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
Asymptotic Theory and Statistical Decomposability gap Estimation for Takayama's Index
Authors:
Pape Djiby Mergane,
Cheikh Mohamed Haidara,
Cheikh Tidiane Seck,
Gane Samb Lo
Abstract:
In the spirit of recent asymptotic works on the General Poverty Index (GPI) in the field of Welfare Analysis, the asymptotic representation of the non-decomposable Takayama's index, which has failed to be incorporated in the unified GPI approach, is addressed and established here. This representation allows also to extend to it, recent results of statistical decomposability gaps estimations. The t…
▽ More
In the spirit of recent asymptotic works on the General Poverty Index (GPI) in the field of Welfare Analysis, the asymptotic representation of the non-decomposable Takayama's index, which has failed to be incorporated in the unified GPI approach, is addressed and established here. This representation allows also to extend to it, recent results of statistical decomposability gaps estimations. The theoretical results are applied to real databases. The conclusions of the undertaken applications recommend to use Takayama's index as a practically decomposable one, in virtue of the low decomposability gaps with respect to the large values of the index.
△ Less
Submitted 17 January, 2017;
originally announced January 2017.
-
On the joint distribution of variations of the Gini index and Welfare indices
Authors:
Gane Samb Lo,
Pape Djiby Mergane,
Tchilabalo Abozou Kpanzou
Abstract:
The aim of this paper is to establish the asymptotic behavior of the mutual influence of the Gini index and the poverty measures by using the Gaussian fields described in Mergane and Lo(2013). The results are given as representation theorems using the Gaussian fields of the unidimensional or the bidimensional functional Brownian bridges. Such representations, when combined with those already avail…
▽ More
The aim of this paper is to establish the asymptotic behavior of the mutual influence of the Gini index and the poverty measures by using the Gaussian fields described in Mergane and Lo(2013). The results are given as representation theorems using the Gaussian fields of the unidimensional or the bidimensional functional Brownian bridges. Such representations, when combined with those already available, lead to joint asymptotic distributions with other statistics of interest like growth, welfare and inequality indices and then, unveil interesting results related to the mutual influence between them. The results are also appropriate for studying whether a growth is fair or not, depending on the variation of the inequality measure. Datadriven applications are also available. Although the variances may seem complicated at a first sight, their computations which are needed to get confidence intervals of the indices, are possible with the help of R codes we provide. Beyond the current results, the provided representations are useful in connection with different ones of other statistics.
△ Less
Submitted 25 May, 2017; v1 submitted 30 October, 2016;
originally announced October 2016.
-
Asymptotic confidence bands for copulas based on the transformation kernel estimator
Authors:
Diam Ba,
Cheikh Tidiane Seck,
Gane Samb Lo
Abstract:
In this paper we establish asymptotic simultaneous confidence bands for the transformation kernel estimator of copulas introduced in Omelka et al.(2009). To this aim, we prove a uniform in bandwidth law of the iterated logarithm for the maximal deviation of this estimator from its expectation, under smoothness conditions on the copula function. We also study the bias, which tends asymptotically an…
▽ More
In this paper we establish asymptotic simultaneous confidence bands for the transformation kernel estimator of copulas introduced in Omelka et al.(2009). To this aim, we prove a uniform in bandwidth law of the iterated logarithm for the maximal deviation of this estimator from its expectation, under smoothness conditions on the copula function. We also study the bias, which tends asymptotically and uniformly to zero with the same precise rate. Some simulation experiments are finally provided to support our results
△ Less
Submitted 18 August, 2016;
originally announced August 2016.
-
A note on the asymptotic normality of sums of extreme values
Authors:
Gane Samb Lo
Abstract:
Let $X_1$, $X_2$,... be a sequence of independent random variables with common distribution function $F$ in the domain of attraction of a Gumbel extreme value distribution and for each integer $n\geq 1$, let $X_{1,n} \leq ... X_{n,n}$ denote the order statistics based on the first $n$ of these random variables. Along with related results it is shown that for any sequence of positive integers…
▽ More
Let $X_1$, $X_2$,... be a sequence of independent random variables with common distribution function $F$ in the domain of attraction of a Gumbel extreme value distribution and for each integer $n\geq 1$, let $X_{1,n} \leq ... X_{n,n}$ denote the order statistics based on the first $n$ of these random variables. Along with related results it is shown that for any sequence of positive integers $k_n \rightarrow +\infty$ and $k_{n}/n \rightarrow 0$ as $n \rightarrow 0$ the sum of the upper $k_n$ extreme values $X_{n-k_{n},n}+...+X_{n,n}$, when properly centered and normalized, converges in distribution to a standard normal random variable $N(0, 1)$. These results constitute an extension of results by S. Csörgő and D.M. Mason (1985).
△ Less
Submitted 17 July, 2016;
originally announced July 2016.
-
How to use the functional empirical process for deriving asymptotic laws for functions of the sample
Authors:
Gane Samb Lo
Abstract:
The functional empirical process is a very powerful tool for deriving asymptotic laws for almost any kind of statistics whenever we know how to express them into functions of the sample. Since this method seems to be applied more and more in the very recent future, this paper is intended to provide a complete but short description and justification of the method and to illustrate it with a non-tri…
▽ More
The functional empirical process is a very powerful tool for deriving asymptotic laws for almost any kind of statistics whenever we know how to express them into functions of the sample. Since this method seems to be applied more and more in the very recent future, this paper is intended to provide a complete but short description and justification of the method and to illustrate it with a non-trivial example using bivariate data. It may also serve for citation without repeating the arguments.
△ Less
Submitted 5 September, 2021; v1 submitted 10 July, 2016;
originally announced July 2016.
-
A double-indexed functional Hill process and applications
Authors:
Modou Ngom,
Gane Samb Lo
Abstract:
Let $X_{1,n} \leq .... \leq X_{n,n}$ be the order statistics associated with a sample $X_{1}, ...., X_{n}$ whose pertaining distribution function (% \textit{df}) is $F$. We are concerned with the functional asymptotic behaviour of the sequence of stochastic processes \begin{equation} T_{n}(f,s)=\sum_{j=1}^{j=k}f(j)\left(\log X_{n-j+1,n}-\log X_{n-j,n}\right)^{s}, \label{fme} \end{equation} indexed…
▽ More
Let $X_{1,n} \leq .... \leq X_{n,n}$ be the order statistics associated with a sample $X_{1}, ...., X_{n}$ whose pertaining distribution function (% \textit{df}) is $F$. We are concerned with the functional asymptotic behaviour of the sequence of stochastic processes \begin{equation} T_{n}(f,s)=\sum_{j=1}^{j=k}f(j)\left(\log X_{n-j+1,n}-\log X_{n-j,n}\right)^{s}, \label{fme} \end{equation} indexed by some classes $\mathcal{F}$ of functions $f:\mathbb{N}% ^{\ast}\longmapsto \mathbb{R}_{+}$ and $s \in ]0,+\infty[$ and where $k=k(n)$ satisfies \begin{equation*} 1\leq k\leq n,k/n\rightarrow 0\text{as}n\rightarrow \infty . \end{equation*}
\noindent We show that this is a stochastic process whose margins generate estimators of the extreme value index when $F$ is in the extreme domain of attraction. We focus in this paper on its finite-dimension asymptotic law and provide a class of new estimators of the extreme value index whose performances are compared to analogous ones. The results are next particularized for one explicit class $\mathcal{F}$.
△ Less
Submitted 16 April, 2016;
originally announced April 2016.
-
Asymptotic confidence bands for copulas based on the local linear kernel estimator
Authors:
Diam Ba,
Cheikh Tidiane Seck,
Gane Samb Lo
Abstract:
In this paper we establish asymptotic simultaneous confidence bands for copulas based on the local linear kernel estimator proposed by Chen and Huang [1]. For this, we prove under smoothness conditions on the copula function, a uniform in bandwidth law of the iterated logarithm for the maximal deviation of this estimator from its expectation. We also show that the bias term converges uniformly to…
▽ More
In this paper we establish asymptotic simultaneous confidence bands for copulas based on the local linear kernel estimator proposed by Chen and Huang [1]. For this, we prove under smoothness conditions on the copula function, a uniform in bandwidth law of the iterated logarithm for the maximal deviation of this estimator from its expectation. We also show that the bias term converges uniformly to zero with a precise rate. The performance of these bands is illustrated in a simulation study. An application based on pseudo-panel data is also provided for modeling dependence.
△ Less
Submitted 30 September, 2015;
originally announced October 2015.
-
Consistency bands for the mean excess function and application to graphical goodness of fit test for financial data
Authors:
Gane Samb Lo,
Diadie Ba,
Elhadji Deme,
Cheikh Seck
Abstract:
In this paper, we use the modern setting of functional empirical processes and recent techniques on uniform estimation for non parametric objects to derive consistency bands for the mean excess function in the i.i.d. case. We apply our results for modelling financial data, in particular Dow Jones data basis to see how good the Generalized hyperbolic distribution models fit monthly data.
In this paper, we use the modern setting of functional empirical processes and recent techniques on uniform estimation for non parametric objects to derive consistency bands for the mean excess function in the i.i.d. case. We apply our results for modelling financial data, in particular Dow Jones data basis to see how good the Generalized hyperbolic distribution models fit monthly data.
△ Less
Submitted 13 October, 2015; v1 submitted 21 September, 2015;
originally announced September 2015.
-
Probabilistic, statistical and algorithmic aspects of the similarity of texts and application to Gospels comparison
Authors:
Gane Samb Lo,
Soumaila Dembele
Abstract:
The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, books, with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of $k$\textit{-shinglings}, a $k$\textit{-shingling} being defined as a sequence of…
▽ More
The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, books, with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of $k$\textit{-shinglings}, a $k$\textit{-shingling} being defined as a sequence of $k$ consecutive characters that are extracted from a text ($k\geq 1$ ). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman and al (% \cite{AnandJeffrey}), denoted here as (RUM). The Jaccard index is the one used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.
△ Less
Submitted 15 August, 2015;
originally announced August 2015.
-
Strong limits related to the oscillation modulus of the empirical process based on the k-spacing process
Authors:
Gane Samb Lo
Abstract:
Recently, several strong limit theorems for the oscillation moduli of the empirical process have been given in the iid-case. We show that, with very slight differences, those strong results are also obtained for some representation of the reduced empirical process based on the (non-overlap**) k-spacings generated by a sequence of independent random variables (rv's) uniformly distributed on…
▽ More
Recently, several strong limit theorems for the oscillation moduli of the empirical process have been given in the iid-case. We show that, with very slight differences, those strong results are also obtained for some representation of the reduced empirical process based on the (non-overlap**) k-spacings generated by a sequence of independent random variables (rv's) uniformly distributed on $(0,1)$. This yields weak limits for the mentioned process. Our study includes the case where the step k is unbounded. The results are mainly derived from several properties concerning the increments of gamma functions with parameters k and one.
△ Less
Submitted 28 June, 2014;
originally announced June 2014.
-
Gaussian Approximations and Related Questions for the Spacings process
Authors:
Gane Samb Lo
Abstract:
All the available results on the approximation of the k-spacings process to Gaussian processes have only used one approach, that is the Shorack and Pyke's one. Here, it is shown that this approach cannot yield a rate better than $% \left( N/\log \log N\right) ^{-\frac{1}{4}}\left( \log N\right) ^{\frac{1}{2}% }$. Strong and weak bounds for that rate are specified both where k is fixed and where…
▽ More
All the available results on the approximation of the k-spacings process to Gaussian processes have only used one approach, that is the Shorack and Pyke's one. Here, it is shown that this approach cannot yield a rate better than $% \left( N/\log \log N\right) ^{-\frac{1}{4}}\left( \log N\right) ^{\frac{1}{2}% }$. Strong and weak bounds for that rate are specified both where k is fixed and where $k\rightarrow +\infty $. A Glivenko-Cantelli Theorem is given while Stute's result for the increments of the empirical process based on independent and indentically distributed random variables is extended to the spacings process. One of the Mason-Wellner-Shorack cases is also obtained.
△ Less
Submitted 28 June, 2014;
originally announced June 2014.
-
The weak limiting behavior of the de Haan-Resnick estimator of the exponent of a stable distribution
Authors:
Gane Samb Lo
Abstract:
The problem of estimating the exponent of a stable law received a considerable attention in the recent literature. Here, we deal with an estimate of such a exponent introduced by De Haan and Resnick when the corresponding distribution function belongs to the Gumbel's domain of attraction. This study permits to construct new statistical tests. Examples and simulations are given. The limiting law ar…
▽ More
The problem of estimating the exponent of a stable law received a considerable attention in the recent literature. Here, we deal with an estimate of such a exponent introduced by De Haan and Resnick when the corresponding distribution function belongs to the Gumbel's domain of attraction. This study permits to construct new statistical tests. Examples and simulations are given. The limiting law are shown to be the Gumbel's law and particular cases are given with norming constants expressed with iterated logarithms and exponentials.
△ Less
Submitted 26 May, 2014;
originally announced May 2014.
-
Asymptotic Representation Theorems for Poverty Indices
Authors:
Gane Samb Lo,
Serigne Touba Sall
Abstract:
We set general conditions under which the general poverty index, which summarizes all the available indices, is asymptotically represented with some empirical processes. This representation theorem offers a general key, in most directions, for the asymptotics of the bulk of poverty indices and issues in poverty analysis. Our representation results uniformly hold on a large collection of poverty in…
▽ More
We set general conditions under which the general poverty index, which summarizes all the available indices, is asymptotically represented with some empirical processes. This representation theorem offers a general key, in most directions, for the asymptotics of the bulk of poverty indices and issues in poverty analysis. Our representation results uniformly hold on a large collection of poverty indices. They enable the continuous measure of poverty with longitudinal data.
△ Less
Submitted 21 May, 2014;
originally announced May 2014.
-
A simple note on some empirical stochastic process as a tool in uniform L-statistics weak laws
Authors:
Gane Samb Lo
Abstract:
In this paper, we are concerned with the stochastic process \begin{equation} β_{n}(q_{t},t)=β_{n}(t)=\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\left\{G_{t,n}(Y(t))-G_{t}(Y_{j}(t))\right\} q_{t}(Y_{j}(t)), \tag{A} \end{equation} where for $n\geq1$ and $T>0$, the sequences $\{Y_{1}(t),Y_{2}(t),...,Y_{n}(t),t\in [0,T]\}$ are independant observations of some real stochastic process ${Y(t),t\in [0,T]}$, for each…
▽ More
In this paper, we are concerned with the stochastic process \begin{equation} β_{n}(q_{t},t)=β_{n}(t)=\frac{1}{\sqrt{n}}\sum_{j=1}^{n}\left\{G_{t,n}(Y(t))-G_{t}(Y_{j}(t))\right\} q_{t}(Y_{j}(t)), \tag{A} \end{equation} where for $n\geq1$ and $T>0$, the sequences $\{Y_{1}(t),Y_{2}(t),...,Y_{n}(t),t\in [0,T]\}$ are independant observations of some real stochastic process ${Y(t),t\in [0,T]}$, for each $t \in [0,T]$, $G_{t}$ is the distribution function of $% Y(t)$ and $G_{t,n}$ is the empirical distribution function based on $% Y_{1}(t),Y_{2}(t),...,Y_{n}(t)$, and finally $q_{t}$ is a bounded real fonction defined on $\mathbb{R}$. This process appears when investigating some time-dependent L-Statistics which are expressed as a function of some functional empirical process and the process (A). Since the functional empirical process is widely investigated in the literature, the process reveals itself as an important key for L-Statistics laws. In this paper, we state an extended study of this process, give complete calculations of the first moments, the covariance function and find conditions for asymptotic tightness.
△ Less
Submitted 21 May, 2014;
originally announced May 2014.
-
High moments Jarque-Bera tests for arbitrary distribution functions
Authors:
Gane Samb Lo,
Oumar Thiam,
Mohamed Cheikh Haidara
Abstract:
The Jarque-Bera's fitting test for normality is a celebrated and powerful one. In this paper, we consider general Jarque-Bera tests for any distribution function df having at least 4k finite moments for k greater than 2. The tests use as many moments as possible whereas the JB classical test is supposed to test only skewness and kurtosis for normal variates. But our results unveil the relations be…
▽ More
The Jarque-Bera's fitting test for normality is a celebrated and powerful one. In this paper, we consider general Jarque-Bera tests for any distribution function df having at least 4k finite moments for k greater than 2. The tests use as many moments as possible whereas the JB classical test is supposed to test only skewness and kurtosis for normal variates. But our results unveil the relations between the coeffients in the JB classical test and the moments, showing that it really depends on the first eight moments. This is a new explanation for the powerfulness of such tests. General Chi-square tests for an arbitraty model, not only normal, are also derived. We make use of the modern functional empirical processes approach that makes it easier to handle statistics based on the high moments and allows the generalization of the JB test both in the number of involved moments and in the underlying distribution. Simulation studies are provided and comparison cases with the Kolmogorov-Smirnov's tests and the classical JB test are given.
△ Less
Submitted 1 December, 2014; v1 submitted 21 May, 2014;
originally announced May 2014.
-
A Review on asymptotic normality of sums of associated random variables
Authors:
Gane Samb Lo,
Harouna Sangaré,
Cheikhna Hamallah Ndiaye
Abstract:
In this document, we make a round up of the theory of asymptotic normality of sums of associated random variables, in a coherent approach in view of further contributions for new researchers in the field. (Version 01)
In this document, we make a round up of the theory of asymptotic normality of sums of associated random variables, in a coherent approach in view of further contributions for new researchers in the field. (Version 01)
△ Less
Submitted 17 November, 2018; v1 submitted 16 May, 2014;
originally announced May 2014.
-
A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions
Authors:
Ruixue Fan,
Shaw-Hwa Lo
Abstract:
Recently more and more evidence suggests that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genet…
▽ More
Recently more and more evidence suggests that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G-G) and gene-environmental (G-E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G-G or G-E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G-G and G-E interactions.
△ Less
Submitted 2 December, 2013;
originally announced December 2013.
-
A supermartingale argument for characterizing the Functional Hill process weak law for small parameters
Authors:
Gane Samb Lo,
Adja Mbarka Fall,
Cheikhna Hamallah Ndiaye,
Akym Adekpejou
Abstract:
The paper deals with the asymptotic laws of functional of standard random variables. These classes of statistics are closely related to estimators of the extreme value index when the underlying distribution function is in the Weibull domain of attraction. We use techniques based on martingales theory to describe the non Gaussian asymptotic distribution of the aforementioned statistics. We provide…
▽ More
The paper deals with the asymptotic laws of functional of standard random variables. These classes of statistics are closely related to estimators of the extreme value index when the underlying distribution function is in the Weibull domain of attraction. We use techniques based on martingales theory to describe the non Gaussian asymptotic distribution of the aforementioned statistics. We provide results of a simulation study as well as statistical tests that may be of interest with the proposed results.
△ Less
Submitted 19 November, 2016; v1 submitted 23 June, 2013;
originally announced June 2013.
-
Functional weak laws for the weighted mean losses or gains and applications
Authors:
Gane Samb Lo,
Serigne Touba Sall,
Pape Djiby Mergane
Abstract:
We show in this paper that many risk measures arising in Actuarial Sciences, Finance, Medicine, Welfare analysis, etc. are garthered in classes of Weighted Mean Loss or Gain (WMLG) statistics. Some of them are Upper Threshold Based (UTH) or Lower Threshold Based (LTH). These statistics may be time-dependent when the scene is monitored in the time and depend on specific functions $w$ and $d$. This…
▽ More
We show in this paper that many risk measures arising in Actuarial Sciences, Finance, Medicine, Welfare analysis, etc. are garthered in classes of Weighted Mean Loss or Gain (WMLG) statistics. Some of them are Upper Threshold Based (UTH) or Lower Threshold Based (LTH). These statistics may be time-dependent when the scene is monitored in the time and depend on specific functions $w$ and $d$. This paper provides time-dependent and uniformly functional weak asymptotic laws that allow temporal and spatial studies of the risk as well as comparison between statistics in terms of dependence and mutual influence. The results are particularised for usual statistics of that kind such that the Kakwani and Shorrocks ones. Datadriven applications based on pseudo-panel data are provided.
△ Less
Submitted 21 May, 2014; v1 submitted 23 June, 2013;
originally announced June 2013.
-
On the influence of the Theil-like inequality measure on the growth
Authors:
Pape Djiby Mergane,
Gane Samb LO
Abstract:
We set in this paper a coherent theory based on functional empirical processes to consider both the poverty and the inequality indices in one Gaussian field enabling to study the influence of the one on the other. We use the General Poverty Index (\textit{GPI}), that is a class of poverty indices covering the most common ones and a functional class of inequality measure including the Entropy Measu…
▽ More
We set in this paper a coherent theory based on functional empirical processes to consider both the poverty and the inequality indices in one Gaussian field enabling to study the influence of the one on the other. We use the General Poverty Index (\textit{GPI}), that is a class of poverty indices covering the most common ones and a functional class of inequality measure including the Entropy Measure, the Mean Logarithmic Deviation, the different inequality measures of Atkinson, Champernowne, Kolm and Theil called Theil-like Inequality Measures \textit{TLIM}. Our results are given in a unified approach with respect to the two classes instead of their particular elements. We provide the asymptotic laws of the variations of each class over two given periods and the ratio of the variation and derive confidence intervals for them. Although the variances may seem somehow complicated, we provide R codes for their computations and apply the results for the pseudo-panel data for Senegal with simple analysis.
△ Less
Submitted 11 October, 2012;
originally announced October 2012.