Search | arXiv e-print repository

Boarding for ISS: Imbalanced Self-Supervised: Discovery of a Scaled Autoencoder for Mixed Tabular Datasets

Authors: Samuel Stocksieker, Denys Pommeret, Arthur Charpentier

Abstract: The field of imbalanced self-supervised learning, especially in the context of tabular data, has not been extensively studied. Existing research has predominantly focused on image datasets. This paper aims to fill this gap by examining the specific challenges posed by data imbalance in self-supervised learning in the domain of tabular data, with a primary focus on autoencoders. Autoencoders are wi… ▽ More The field of imbalanced self-supervised learning, especially in the context of tabular data, has not been extensively studied. Existing research has predominantly focused on image datasets. This paper aims to fill this gap by examining the specific challenges posed by data imbalance in self-supervised learning in the domain of tabular data, with a primary focus on autoencoders. Autoencoders are widely employed for learning and constructing a new representation of a dataset, particularly for dimensionality reduction. They are also often used for generative model learning, as seen in variational autoencoders. When dealing with mixed tabular data, qualitative variables are often encoded using a one-hot encoder with a standard loss function (MSE or Cross Entropy). In this paper, we analyze the drawbacks of this approach, especially when categorical variables are imbalanced. We propose a novel metric to balance learning: a Multi-Supervised Balanced MSE. This approach reduces the reconstruction error by balancing the influence of variables. Finally, we empirically demonstrate that this new metric, compared to the standard MSE: i) outperforms when the dataset is imbalanced, especially when the learning process is insufficient, and ii) provides similar results in the opposite case. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2402.07790 [pdf, other]

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

Authors: Agathe Fernandes Machado, Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic, François Hu

Abstract: The assessment of binary classifier performance traditionally centers on discriminative ability using metrics, such as accuracy. However, these metrics often disregard the model's inherent uncertainty, especially when dealing with sensitive decision-making domains, such as finance or healthcare. Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for… ▽ More The assessment of binary classifier performance traditionally centers on discriminative ability using metrics, such as accuracy. However, these metrics often disregard the model's inherent uncertainty, especially when dealing with sensitive decision-making domains, such as finance or healthcare. Given that model-predicted scores are commonly seen as event probabilities, calibration is crucial for accurate interpretation. In our study, we analyze the sensitivity of various calibration measures to score distortions and introduce a refined metric, the Local Calibration Score. Comparing recalibration methods, we advocate for local regressions, emphasizing their dual role as effective recalibration tools and facilitators of smoother visualizations. We apply these findings in a real-world scenario using Random Forest classifier and regressor to predict credit default while simultaneously measuring calibration during performance optimization. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.16197 [pdf, other]

Geospatial Disparities: A Case Study on Real Estate Prices in Paris

Authors: Agathe Fernandes Machado, François Hu, Philipp Ratz, Ewen Gallic, Arthur Charpentier

Abstract: Driven by an increasing prevalence of trackers, ever more IoT sensors, and the declining cost of computing power, geospatial information has come to play a pivotal role in contemporary predictive models. While enhancing prognostic performance, geospatial data also has the potential to perpetuate many historical socio-economic patterns, raising concerns about a resurgence of biases and exclusionary… ▽ More Driven by an increasing prevalence of trackers, ever more IoT sensors, and the declining cost of computing power, geospatial information has come to play a pivotal role in contemporary predictive models. While enhancing prognostic performance, geospatial data also has the potential to perpetuate many historical socio-economic patterns, raising concerns about a resurgence of biases and exclusionary practices, with their disproportionate impacts on society. Addressing this, our paper emphasizes the crucial need to identify and rectify such biases and calibration errors in predictive models, particularly as algorithms become more intricate and less interpretable. The increasing granularity of geospatial information further introduces ethical concerns, as choosing different geographical scales may exacerbate disparities akin to redlining and exclusionary zoning. To address these issues, we propose a toolkit for identifying and mitigating biases arising from geospatial data. Extending classical fairness definitions, we incorporate an ordinal regression case with spatial attributes, deviating from the binary classification focus. This extension allows us to gauge disparities stemming from data aggregation levels and advocates for a less interfering correction approach. Illustrating our methodology using a Parisian real estate dataset, we showcase practical applications and scrutinize the implications of choosing geographical aggregation levels for fairness and calibration measures. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2311.11900 [pdf, other]

Measuring and Mitigating Biases in Motor Insurance Pricing

Authors: Mulah Moriah, Franck Vermet, Arthur Charpentier

Abstract: The non-life insurance sector operates within a highly competitive and tightly regulated framework, confronting a pivotal juncture in the formulation of pricing strategies. Insurers are compelled to harness a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy while accommodating the dynamics of market com… ▽ More The non-life insurance sector operates within a highly competitive and tightly regulated framework, confronting a pivotal juncture in the formulation of pricing strategies. Insurers are compelled to harness a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy while accommodating the dynamics of market competition. Given the fundamental societal role played by insurance, premium rates are subject to rigorous scrutiny by regulatory authorities. These rates must conform to principles of transparency, explainability, and ethical considerations. Consequently, the act of pricing transcends mere statistical calculations and carries the weight of strategic and societal factors. These multifaceted concerns may drive insurers to establish equitable premiums, taking into account various variables. For instance, regulations mandate the provision of equitable premiums, considering factors such as policyholder gender or mutualist group dynamics in accordance with respective corporate strategies. Age-based premium fairness is also mandated. In certain insurance domains, variables such as the presence of serious illnesses or disabilities are emerging as new dimensions for evaluating fairness. Regardless of the motivating factor prompting an insurer to adopt fairer pricing strategies for a specific variable, the insurer must possess the capability to define, measure, and ultimately mitigate any ethical biases inherent in its pricing practices while upholding standards of consistency and performance. This study seeks to provide a comprehensive set of tools for these endeavors and assess their effectiveness through practical application in the context of automobile insurance. △ Less

Submitted 20 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2310.20508 [pdf, other]

Parametric Fairness with Statistical Guarantees

Authors: François HU, Philipp Ratz, Arthur Charpentier

Abstract: Algorithmic fairness has gained prominence due to societal and regulatory concerns about biases in Machine Learning models. Common group fairness metrics like Equalized Odds for classification or Demographic Parity for both classification and regression are widely used and a host of computationally advantageous post-processing methods have been developed around them. However, these metrics often l… ▽ More Algorithmic fairness has gained prominence due to societal and regulatory concerns about biases in Machine Learning models. Common group fairness metrics like Equalized Odds for classification or Demographic Parity for both classification and regression are widely used and a host of computationally advantageous post-processing methods have been developed around them. However, these metrics often limit users from incorporating domain knowledge. Despite meeting traditional fairness criteria, they can obscure issues related to intersectional fairness and even replicate unwanted intra-group biases in the resulting fair solution. To avoid this narrow perspective, we extend the concept of Demographic Parity to incorporate distributional properties in the predictions, allowing expert knowledge to be used in the fair solution. We illustrate the use of this new metric through a practical example of wages, and develop a parametric method that efficiently addresses practical challenges like limited training data and constraints on total spending, offering a robust solution for real-life applications. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2309.06627 [pdf, other]

doi 10.1609/aaai.v38i11.29143

A Sequentially Fair Mechanism for Multiple Sensitive Attributes

Authors: François Hu, Philipp Ratz, Arthur Charpentier

Abstract: In the standard use case of Algorithmic Fairness, the goal is to eliminate the relationship between a sensitive variable and a corresponding score. Throughout recent years, the scientific community has developed a host of definitions and tools to solve this task, which work well in many practical applications. However, the applicability and effectivity of these tools and definitions becomes less s… ▽ More In the standard use case of Algorithmic Fairness, the goal is to eliminate the relationship between a sensitive variable and a corresponding score. Throughout recent years, the scientific community has developed a host of definitions and tools to solve this task, which work well in many practical applications. However, the applicability and effectivity of these tools and definitions becomes less straightfoward in the case of multiple sensitive attributes. To tackle this issue, we propose a sequential framework, which allows to progressively achieve fairness across a set of sensitive features. We accomplish this by leveraging multi-marginal Wasserstein barycenters, which extends the standard notion of Strong Demographic Parity to the case with multiple sensitive characteristics. This method also provides a closed-form solution for the optimal, sequentially fair predictor, permitting a clear interpretation of inter-sensitive feature correlations. Our approach seamlessly extends to approximate fairness, envelo** a framework accommodating the trade-off between risk and unfairness. This extension permits a targeted prioritization of fairness improvements for a specific attribute within a set of sensitive attributes, allowing for a case specific adaptation. A data-driven estimation procedure for the derived solution is developed, and comprehensive numerical experiments are conducted on both synthetic and real datasets. Our empirical findings decisively underscore the practical efficacy of our post-processing approach in fostering fair decision-making. △ Less

Submitted 14 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.11090 [pdf, other]

Fairness Explainability using Optimal Transport with Applications in Image Classification

Authors: Philipp Ratz, François Hu, Arthur Charpentier

Abstract: Ensuring trust and accountability in Artificial Intelligence systems demands explainability of its outcomes. Despite significant progress in Explainable AI, human biases still taint a substantial portion of its training data, raising concerns about unfairness or discriminatory tendencies. Current approaches in the field of Algorithmic Fairness focus on mitigating such biases in the outcomes of a m… ▽ More Ensuring trust and accountability in Artificial Intelligence systems demands explainability of its outcomes. Despite significant progress in Explainable AI, human biases still taint a substantial portion of its training data, raising concerns about unfairness or discriminatory tendencies. Current approaches in the field of Algorithmic Fairness focus on mitigating such biases in the outcomes of a model, but few attempts have been made to try to explain \emph{why} a model is biased. To bridge this gap between the two fields, we propose a comprehensive approach that uses optimal transport theory to uncover the causes of discrimination in Machine Learning applications, with a particular emphasis on image classification. We leverage Wasserstein barycenters to achieve fair predictions and introduce an extension to pinpoint bias-associated regions. This allows us to derive a cohesive system which uses the enforced fairness to measure each features influence \emph{on} the bias. Taking advantage of this interplay of enforcing and explaining fairness, our method hold significant implications for the development of trustworthy and unbiased AI systems, fostering transparency, accountability, and fairness in critical decision-making scenarios across diverse domains. △ Less

Submitted 31 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.02966 [pdf, other]

Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory

Authors: Samuel Stocksieker, Denys Pommeret, Arthur Charpentier

Abstract: In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. This situation leads to a learning difficulty for standard algorithms. Research and solutions in imbalanced learning have mainly focused on classification tasks. Despite its importance, very few solutions exist for imbalanced regression. In this paper, we propose a data augmentation procedure, the GOLIATH… ▽ More In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. This situation leads to a learning difficulty for standard algorithms. Research and solutions in imbalanced learning have mainly focused on classification tasks. Despite its importance, very few solutions exist for imbalanced regression. In this paper, we propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates which can be used in classification and regression. This general approach encompasses two large families of synthetic oversampling: those based on perturbations, such as Gaussian Noise, and those based on interpolations, such as SMOTE. It also provides an explicit form of these machine learning algorithms and an expression of their conditional densities, in particular for SMOTE. New synthetic data generators are deduced. We apply GOLIATH in imbalanced regression combining such generator procedures with a wild-bootstrap resampling technique for the target values. We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations. We empirically evaluate and compare our approach and demonstrate significant improvement over existing state-of-the-art techniques. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: This paper focuses specifically on the Imbalanced Regression issues but could be used for Imbalanced classification tasks

arXiv:2306.13633 [pdf, other]

Optimal Vaccination Policy to Prevent Endemicity: A Stochastic Model

Authors: Félix Foutel-Rodier, Arthur Charpentier, Hélène Guérin

Abstract: We examine here the effects of recurrent vaccination and waning immunity on the establishment of an endemic equilibrium in a population. An individual-based model that incorporates memory effects for transmission rate during infection and subsequent immunity is introduced, considering stochasticity at the individual level. By letting the population size going to infinity, we derive a set of equati… ▽ More We examine here the effects of recurrent vaccination and waning immunity on the establishment of an endemic equilibrium in a population. An individual-based model that incorporates memory effects for transmission rate during infection and subsequent immunity is introduced, considering stochasticity at the individual level. By letting the population size going to infinity, we derive a set of equations describing the large scale behavior of the epidemic. The analysis of the model's equilibria reveals a criterion for the existence of an endemic equilibrium, which depends on the rate of immunity loss and the distribution of time between booster doses. The outcome of a vaccination policy in this context is influenced by the efficiency of the vaccine in blocking transmissions and the distribution pattern of booster doses within the population. Strategies with evenly spaced booster shots at the individual level prove to be more effective in preventing disease spread compared to irregularly spaced boosters, as longer intervals without vaccination increase susceptibility and facilitate more efficient disease transmission. We provide an expression for the critical fraction of the population required to adhere to the vaccination policy in order to eradicate the disease, that resembles a well-known threshold for preventing an outbreak with an imperfect vaccine. We also investigate the consequences of unequal vaccine access in a population and prove that, under reasonable assumptions, fair vaccine allocation is the optimal strategy to prevent endemicity. △ Less

Submitted 5 April, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

Comments: 51 pages, 7 figures

arXiv:2306.12912 [pdf, other]

Mitigating Discrimination in Insurance with Wasserstein Barycenters

Authors: Arthur Charpentier, François Hu, Philipp Ratz

Abstract: The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least m… ▽ More The insurance industry is heavily reliant on predictions of risks based on characteristics of potential customers. Although the use of said models is common, researchers have long pointed out that such practices perpetuate discrimination based on sensitive features such as gender or race. Given that such discrimination can often be attributed to historical data biases, an elimination or at least mitigation is desirable. With the shift from more traditional models to machine-learning based predictions, calls for greater mitigation have grown anew, as simply excluding sensitive variables in the pricing process can be shown to be ineffective. In this article, we first investigate why predictions are a necessity within the industry and why correcting biases is not as straightforward as simply identifying a sensitive variable. We then propose to ease the biases through the use of Wasserstein barycenters instead of simple scaling. To demonstrate the effects and effectiveness of the approach we employ it on real data and discuss its implications. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.10155 [pdf, other]

doi 10.1007/978-3-031-43415-0_18

Fairness in Multi-Task Learning via Wasserstein Barycenters

Authors: François Hu, Philipp Ratz, Arthur Charpentier

Abstract: Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge t… ▽ More Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making. △ Less

Submitted 6 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2302.09288 [pdf, other]

Data Augmentation for Imbalanced Regression

Authors: Samuel Stocksieker, Denys Pommeret, Arthur Charpentier

Abstract: In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring… ▽ More In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: paper accepted at the AISTATS 2023 conference, to be published in PMLR (Proceedings of Machine Learning Research)

arXiv:2301.07755 [pdf, other]

Optimal Transport for Counterfactual Estimation: A Method for Causal Inference

Authors: Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic

Abstract: Many problems ask a question that can be formulated as a causal question: "what would have happened if...?" For example, "would the person have had surgery if he or she had been Black?" To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such as skin color) has on a specific individual,… ▽ More Many problems ask a question that can be formulated as a causal question: "what would have happened if...?" For example, "would the person have had surgery if he or she had been Black?" To address this kind of questions, calculating an average treatment effect (ATE) is often uninformative, because one would like to know how much impact a variable (such as skin color) has on a specific individual, characterized by certain covariates. Trying to calculate a conditional ATE (CATE) seems more appropriate. In causal inference, the propensity score approach assumes that the treatment is influenced by x, a collection of covariates. Here, we will have the dual view: doing an intervention, or changing the treatment (even just hypothetically, in a thought experiment, for example by asking what would have happened if a person had been Black) can have an impact on the values of x. We will see here that optimal transport allows us to change certain characteristics that are influenced by the variable we are trying to quantify the effect of. We propose here a mutatis mutandis version of the CATE, which will be done simply in dimension one by saying that the CATE must be computed relative to a level of probability, associated to the proportion of x (a single covariate) in the control population, and by looking for the equivalent quantile in the test population. In higher dimension, it will be necessary to go through transport, and an application will be proposed on the impact of some variables on the probability of having an unnatural birth (the fact that the mother smokes, or that the mother is Black). △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.09868 [pdf, other]

Quantifying fairness and discrimination in predictive models

Authors: Arthur Charpentier

Abstract: The analysis of discrimination has long interested economists and lawyers. In recent years, the literature in computer science and machine learning has become interested in the subject, offering an interesting re-reading of the topic. These questions are the consequences of numerous criticisms of algorithms used to translate texts or to identify people in images. With the arrival of massive data,… ▽ More The analysis of discrimination has long interested economists and lawyers. In recent years, the literature in computer science and machine learning has become interested in the subject, offering an interesting re-reading of the topic. These questions are the consequences of numerous criticisms of algorithms used to translate texts or to identify people in images. With the arrival of massive data, and the use of increasingly opaque algorithms, it is not surprising to have discriminatory algorithms, because it has become easy to have a proxy of a sensitive variable, by enriching the data indefinitely. According to Kranzberg (1986), "technology is neither good nor bad, nor is it neutral", and therefore, "machine learning won't give you anything like gender neutrality `for free' that you didn't explicitely ask for", as claimed by Kearns et a. (2019). In this article, we will come back to the general context, for predictive models in classification. We will present the main concepts of fairness, called group fairness, based on independence between the sensitive variable and the prediction, possibly conditioned on this or that information. We will finish by going further, by presenting the concepts of individual fairness. Finally, we will see how to correct a potential discrimination, in order to guarantee that a model is more ethical △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: Classifier; Demographic Parity; Discrimination; Equal Opportunity; Fairness; Penalized regression; Proxy; Statistical Discrimination

arXiv:2212.09192 [pdf, other]

Multiarmed Bandits Problem Under the Mean-Variance Setting

Authors: Hongda Hu, Arthur Charpentier, Mario Ghossoub, Alexander Schied

Abstract: The classical multi-armed bandit (MAB) problem involves a learner and a collection of K independent arms, each with its own ex ante unknown independent reward distribution. At each one of a finite number of rounds, the learner selects one arm and receives new information. The learner often faces an exploration-exploitation dilemma: exploiting the current information by playing the arm with the hig… ▽ More The classical multi-armed bandit (MAB) problem involves a learner and a collection of K independent arms, each with its own ex ante unknown independent reward distribution. At each one of a finite number of rounds, the learner selects one arm and receives new information. The learner often faces an exploration-exploitation dilemma: exploiting the current information by playing the arm with the highest estimated reward versus exploring all arms to gather more reward information. The design objective aims to maximize the expected cumulative reward over all rounds. However, such an objective does not account for a risk-reward tradeoff, which is often a fundamental precept in many areas of applications, most notably in finance and economics. In this paper, we build upon Sani et al. (2012) and extend the classical MAB problem to a mean-variance setting. Specifically, we relax the assumptions of independent arms and bounded rewards made in Sani et al. (2012) by considering sub-Gaussian arms. We introduce the Risk Aware Lower Confidence Bound (RALCB) algorithm to solve the problem, and study some of its properties. Finally, we perform a number of numerical simulations to demonstrate that, in both independent and dependent scenarios, our suggested approach performs better than the algorithm suggested by Sani et al. (2012). △ Less

Submitted 3 May, 2024; v1 submitted 18 December, 2022; originally announced December 2022.

arXiv:2207.01010 [pdf, other]

Government Intervention in Catastrophe Insurance Markets: A Reinforcement Learning Approach

Authors: Menna Hassan, Nourhan Sakr, Arthur Charpentier

Abstract: This paper designs a sequential repeated game of a micro-founded society with three types of agents: individuals, insurers, and a government. Nascent to economics literature, we use Reinforcement Learning (RL), closely related to multi-armed bandit problems, to learn the welfare impact of a set of proposed policy interventions per $1 spent on them. The paper rigorously discusses the desirability o… ▽ More This paper designs a sequential repeated game of a micro-founded society with three types of agents: individuals, insurers, and a government. Nascent to economics literature, we use Reinforcement Learning (RL), closely related to multi-armed bandit problems, to learn the welfare impact of a set of proposed policy interventions per $1 spent on them. The paper rigorously discusses the desirability of the proposed interventions by comparing them against each other on a case-by-case basis. The paper provides a framework for algorithmic policy evaluation using calibrated theoretical models which can assist in feasibility studies. △ Less

Submitted 3 July, 2022; originally announced July 2022.

arXiv:2205.08112 [pdf, ps, other]

The Fairness of Machine Learning in Insurance: New Rags for an Old Man?

Authors: Laurence Barry, Arthur Charpentier

Abstract: Since the beginning of their history, insurers have been known to use data to classify and price risks. As such, they were confronted early on with the problem of fairness and discrimination associated with data. This issue is becoming increasingly important with access to more granular and behavioural data, and is evolving to reflect current technologies and societal concerns. By looking into ear… ▽ More Since the beginning of their history, insurers have been known to use data to classify and price risks. As such, they were confronted early on with the problem of fairness and discrimination associated with data. This issue is becoming increasingly important with access to more granular and behavioural data, and is evolving to reflect current technologies and societal concerns. By looking into earlier debates on discrimination, we show that some algorithmic biases are a renewed version of older ones, while others show a reversal of the previous order. Paradoxically, while the insurance practice has not deeply changed nor are most of these biases new, the machine learning era still deeply shakes the conception of insurance fairness. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2202.12008 [pdf, other]

A Fair Pricing Model via Adversarial Learning

Authors: Vincent Grari, Arthur Charpentier, Marcin Detyniecki

Abstract: At the core of insurance business lies classification between risky and non-risky insureds, actuarial fairness meaning that risky insureds should contribute more and pay a higher premium than non-risky or less-risky ones. Actuaries, therefore, use econometric or machine learning techniques to classify, but the distinction between a fair actuarial classification and "discrimination" is subtle. For… ▽ More At the core of insurance business lies classification between risky and non-risky insureds, actuarial fairness meaning that risky insureds should contribute more and pay a higher premium than non-risky or less-risky ones. Actuaries, therefore, use econometric or machine learning techniques to classify, but the distinction between a fair actuarial classification and "discrimination" is subtle. For this reason, there is a growing interest about fairness and discrimination in the actuarial community Lindholm, Richman, Tsanakas, and Wuthrich (2022). Presumably, non-sensitive characteristics can serve as substitutes or proxies for protected attributes. For example, the color and model of a car, combined with the driver's occupation, may lead to an undesirable gender bias in the prediction of car insurance prices. Surprisingly, we will show that debiasing the predictor alone may be insufficient to maintain adequate accuracy (1). Indeed, the traditional pricing model is currently built in a two-stage structure that considers many potentially biased components such as car or geographic risks. We will show that this traditional structure has significant limitations in achieving fairness. For this reason, we have developed a novel pricing model approach. Recently some approaches have Blier-Wong, Cossette, Lamontagne, and Marceau (2021); Wuthrich and Merz (2021) shown the value of autoencoders in pricing. In this paper, we will show that (2) this can be generalized to multiple pricing factors (geographic, car type), (3) it perfectly adapted for a fairness context (since it allows to debias the set of pricing components): We extend this main idea to a general framework in which a single whole pricing model is trained by generating the geographic and car pricing components needed to predict the pure premium while mitigating the unwanted bias according to the desired metric. △ Less

Submitted 26 December, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: 20 pages, 12 figures

arXiv:2108.04737 [pdf, other]

Weighted asymmetric least squares regression with fixed-effects

Authors: Amadou Barry, Karim Oualkacha, Arthur Charpentier

Abstract: The fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to summarize the variable relationships in the presence of heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects model and propose a new model: expectile regression with fixed-effects $(\ERFE).$ The $\ERFE$ model applies the within… ▽ More The fixed-effects model estimates the regressor effects on the mean of the response, which is inadequate to summarize the variable relationships in the presence of heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixed-effects model and propose a new model: expectile regression with fixed-effects $(\ERFE).$ The $\ERFE$ model applies the within transformation strategy to concentrate out the incidental parameter and estimates the regressor effects on the expectiles of the response distribution. The $\ERFE$ model captures the data heteroscedasticity and eliminates any bias resulting from the correlation between the regressors and the omitted factors. We derive the asymptotic properties of the $\ERFE$ estimators and suggest robust estimators of its covariance matrix. Our simulations show that the $\ERFE$ estimator is unbiased and outperforms its competitors. Our real data analysis shows its ability to capture data heteroscedasticity (see our R package, \url{github.com/AmBarry/erfe}). △ Less

Submitted 10 August, 2021; originally announced August 2021.

MSC Class: 62Jxx; 62J05 ACM Class: G.3.2

arXiv:2107.07668 [pdf, other]

doi 10.5194/nhess-22-2401-2022

Predicting Drought and Subsidence Risks in France

Authors: Arthur Charpentier, Molly James, Hani Ali

Abstract: The economic consequences of drought episodes are increasingly important, although they are often difficult to apprehend in part because of the complexity of the underlying mechanisms. In this article, we will study one of the consequences of drought, namely the risk of subsidence (or more specifically clay shrinkage induced subsidence), for which insurance has been mandatory in France for several… ▽ More The economic consequences of drought episodes are increasingly important, although they are often difficult to apprehend in part because of the complexity of the underlying mechanisms. In this article, we will study one of the consequences of drought, namely the risk of subsidence (or more specifically clay shrinkage induced subsidence), for which insurance has been mandatory in France for several decades. Using data obtained from several insurers, representing about a quarter of the household insurance market, over the past twenty years, we propose some statistical models to predict the frequency but also the intensity of these droughts, for insurers, showing that climate change will have probably major economic consequences on this risk. But even if we use more advanced models than standard regression-type models (here random forests to capture non linearity and cross effects), it is still difficult to predict the economic cost of subsidence claims, even if all geophysical and climatic information is available. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2107.02764 [pdf, other]

Collaborative Insurance Sustainability and Network Structure

Authors: Arthur Charpentier, Lariosse Kouakou, Matthias Löwe, Philipp Ratz, Franck Vermet

Abstract: The peer-to-peer (P2P) economy has been growing with the advent of the Internet, with well known brands such as Uber or Airbnb being examples thereof. In the insurance sector the approach is still in its infancy, but some companies have started to explore P2P-based collaborative insurance products (eg. Lemonade in the U.S. or Inspeer in France). The actuarial literature only recently started to co… ▽ More The peer-to-peer (P2P) economy has been growing with the advent of the Internet, with well known brands such as Uber or Airbnb being examples thereof. In the insurance sector the approach is still in its infancy, but some companies have started to explore P2P-based collaborative insurance products (eg. Lemonade in the U.S. or Inspeer in France). The actuarial literature only recently started to consider those risk sharing mechanisms, as in Denuit and Robert (2021) or Feng et al. (2021). In this paper, describe and analyse such a P2P product, with some reciprocal risk sharing contracts. Here, we consider the case where policyholders still have an insurance contract, but the first self-insurance layer, below the deductible, can be shared with friends. We study the impact of the shape of the network (through the distribution of degrees) on the risk reduction. We consider also some optimal setting of the reciprocal commitments, and discuss the introduction of contracts with friends of friends to mitigate some possible drawbacks of having people without enough connections to exchange risks. △ Less

Submitted 12 September, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2103.03635 [pdf, other]

Autocalibration and Tweedie-dominance for Insurance Pricing with Machine Learning

Authors: Michel Denuit, Arthur Charpentier, Julien Trufin

Abstract: Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, there are nevertheless endless debates about the choice of the right loss function to be used to train the machine learning model, as well as about the appropriate metric to assess the performances of competing models. Also, the sum of fitted values can depart from… ▽ More Boosting techniques and neural networks are particularly effective machine learning methods for insurance pricing. Often in practice, there are nevertheless endless debates about the choice of the right loss function to be used to train the machine learning model, as well as about the appropriate metric to assess the performances of competing models. Also, the sum of fitted values can depart from the observed totals to a large extent and this often confuses actuarial analysts. The lack of balance inherent to training models by minimizing deviance outside the familiar GLM with canonical link setting has been empirically documented in Wüthrich (2019, 2020) who attributes it to the early stop** rule in gradient descent methods for model fitting. The present paper aims to further study this phenomenon when learning proceeds by minimizing Tweedie deviance. It is shown that minimizing deviance involves a trade-off between the integral of weighted differences of lower partial moments and the bias measured on a specific scale. Autocalibration is then proposed as a remedy. This new method to correct for bias adds an extra local GLM step to the analysis. Theoretically, it is shown that it implements the autocalibration concept in pure premium calculation and ensures that balance also holds on a local scale, not only at portfolio level as with existing bias-correction techniques. The convex order appears to be the natural tool to compare competing models, putting a new light on the diagnostic graphs and associated metrics proposed by Denuit et al. (2019). △ Less

Submitted 9 July, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

arXiv:2102.06075 [pdf, ps, other]

doi 10.1287/moor.2015.0736

Local Utility and Multivariate Risk Aversion

Authors: Arthur Charpentier, Alfred Galichon, Marc Henry

Abstract: We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. We show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby generalizing Machina's result in Machina (1982). To analyze comparative risk attitudes with… ▽ More We revisit Machina's local utility as a tool to analyze attitudes to multivariate risks. We show that for non-expected utility maximizers choosing between multivariate prospects, aversion to multivariate mean preserving increases in risk is equivalent to the concavity of the local utility functions, thereby generalizing Machina's result in Machina (1982). To analyze comparative risk attitudes within the multivariate extension of rank dependent expected utility of Galichon and Henry (2011), we extend Quiggin's monotone mean and utility preserving increases in risk and show that the useful characterization given in Landsberger and Meilijson (1994) still holds in the multivariate case. △ Less

Submitted 22 February, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: 18 pages

Journal ref: Mathematics of Operations Research 41-2 (2016) pp. 377-744

arXiv:2006.08446 [pdf, other]

Modeling Joint Lives within Families

Authors: Olivier Cabrignac, Arthur Charpentier, Ewen Gallic

Abstract: Family history is usually seen as a significant factor insurance companies look at when applying for a life insurance policy. Where it is used, family history of cardiovascular diseases, death by cancer, or family history of high blood pressure and diabetes could result in higher premiums or no coverage at all. In this article, we use massive (historical) data to study dependencies between life le… ▽ More Family history is usually seen as a significant factor insurance companies look at when applying for a life insurance policy. Where it is used, family history of cardiovascular diseases, death by cancer, or family history of high blood pressure and diabetes could result in higher premiums or no coverage at all. In this article, we use massive (historical) data to study dependencies between life length within families. If joint life contracts (between a husband and a wife) have been long studied in actuarial literature, little is known about child and parents dependencies. We illustrate those dependencies using 19th century family trees in France, and quantify implications in annuities computations. For parents and children, we observe a modest but significant positive association between life lengths. It yields different estimates for remaining life expectancy, present values of annuities, or whole life insurance guarantee, given information about the parents (such as the number of parents alive). A similar but weaker pattern is observed when using information on grandparents. △ Less

Submitted 15 June, 2020; originally announced June 2020.

arXiv:2005.06526 [pdf, other]

COVID-19 pandemic control: balancing detection policy and lockdown intervention under ICU sustainability

Authors: Arthur Charpentier, Romuald Elie, Mathieu Laurière, Viet Chi Tran

Abstract: We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and de… ▽ More We consider here an extended SIR model, including several features of the recent COVID-19 outbreak: in particular the infected and recovered individuals can either be detected (+) or undetected (-) and we also integrate an intensive care unit (ICU) capacity. Our model enables a tractable quantitative analysis of the optimal policy for the control of the epidemic dynamics using both lockdown and detection intervention levers. With parametric specification based on literature on COVID-19, we investigate the sensitivities of various quantities on the optimal strategies, taking into account the subtle trade-off between the sanitary and the socio-economic cost of the pandemic, together with the limited capacity level of ICU. We identify the optimal lockdown policy as an intervention structured in 4 successive phases: First a quick and strong lockdown intervention to stop the exponential growth of the contagion; second a short transition phase to reduce the prevalence of the virus; third a long period with full ICU capacity and stable virus prevalence; finally a return to normal social interactions with disappearance of the virus. The optimal scenario hereby avoids the second wave of infection, provided the lockdown is released sufficiently slowly. We also provide optimal intervention measures with increasing ICU capacity, as well as optimization over the effort on detection of infectious and immune individuals. Whenever massive resources are introduced to detect infected individuals, the pressure on social distancing can be released, whereas the impact of detection of immune individuals reveals to be more moderate. △ Less

Submitted 21 May, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

MSC Class: 49N90; 92D30; 34H05

arXiv:2003.10014 [pdf, other]

Reinforcement Learning in Economics and Finance

Authors: Arthur Charpentier, Romuald Elie, Carl Remlinger

Abstract: Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards… ▽ More Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy -- a map** from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance. △ Less

Submitted 22 March, 2020; originally announced March 2020.

arXiv:1912.11736 [pdf, other]

Pareto models for risk management

Authors: Arthur Charpentier, Emmanuel Flachaire

Abstract: The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (Value-at-Risk, Expected Shortfall) or reinsurance premiums and related quantities (Large Claim Index, Return Period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (possible very) large threshold. Therefore, it could be… ▽ More The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (Value-at-Risk, Expected Shortfall) or reinsurance premiums and related quantities (Large Claim Index, Return Period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (possible very) large threshold. Therefore, it could be interesting to take into account second order behavior to provide a better fit. In this article, we present how to go from a strict Pareto model to Pareto-type distributions. We discuss inference, and derive formulas for various measures and indices, and finally provide applications on insurance losses and financial risks. △ Less

Submitted 25 December, 2019; originally announced December 2019.

arXiv:1907.02320 [pdf, other]

Optimal transport on large networks, a practitioner's guide

Authors: Arthur Charpentier, Alfred Galichon, Lucas Vernet

Abstract: This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pair of adjacent locations. Two types of agents are located at the nodes of this network. The buyers choose the most competitive sellers depending on the… ▽ More This article presents a set of tools for the modeling of a spatial allocation problem in a large geographic market and gives examples of applications. In our settings, the market is described by a network that maps the cost of travel between each pair of adjacent locations. Two types of agents are located at the nodes of this network. The buyers choose the most competitive sellers depending on their prices and the cost to reach them. Their utility is assumed additive in both these quantities. Each seller, taking as given other sellers prices, sets her own price to have a demand equal to the one we observed. We give a linear programming formulation for the equilibrium conditions. After formally introducing our model we apply it on two examples: prices offered by petrol stations and quality of services provided by maternity wards. These examples illustrate the applicability of our model to aggregate demand, rank prices and estimate cost structure over the network. We insist on the possibility of applications to large scale data sets using modern linear programming solvers such as Gurobi. In addition to this paper we released a R toolbox to implement our results and an online tutorial (http://optimalnetwork.github.io) △ Less

Submitted 22 August, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

arXiv:1905.10267 [pdf, other]

Extended Scale-Free Networks

Authors: Arthur Charpentier, Emmanuel Flachaire

Abstract: Recently, Broido & Clauset (2019) mentioned that (strict) Scale-Free networks were rare, in real life. This might be related to the statement of Stumpf, Wiuf & May (2005), that sub-networks of scale-free networks are not scale-free. In the later, those sub-networks are asymptotically scale-free, but one should not forget about second-order deviation (possibly also third order actually). In this ar… ▽ More Recently, Broido & Clauset (2019) mentioned that (strict) Scale-Free networks were rare, in real life. This might be related to the statement of Stumpf, Wiuf & May (2005), that sub-networks of scale-free networks are not scale-free. In the later, those sub-networks are asymptotically scale-free, but one should not forget about second-order deviation (possibly also third order actually). In this article, we introduce a concept of extended scale-free network, inspired by the extended Pareto distribution, that actually is maybe more realistic to describe real network than the strict scale free property. This property is consistent with Stumpf, Wiuf & May (2005): sub-network of scale-free larger networks are not strictly scale-free, but extended scale-free. △ Less

Submitted 28 May, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1810.09214 [pdf, other]

A new GEE method to account for heteroscedasticity, using asymmetric least-square regressions

Authors: Amadou Barry, Karim Oualkacha, Arthur Charpentier

Abstract: Generalized estimating equations (GEE) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response{\textemdash}and therefore do not account for data heterogeneity. Here, we combine the GEE with the asymmetric least squares (expectile) regression to derive a new class of estimators, which… ▽ More Generalized estimating equations (GEE) are widely used to analyze longitudinal data; however, they are not appropriate for heteroscedastic data, because they only estimate regressor effects on the mean response{\textemdash}and therefore do not account for data heterogeneity. Here, we combine the GEE with the asymmetric least squares (expectile) regression to derive a new class of estimators, which we call generalized expectile estimating equations (GEEE). The GEEE model estimates regressor effects on the expectiles of the response distribution, which provides a detailed view of regressor effects on the entire response distribution. In addition to capturing data heteroscedasticity, the GEEE extends the various working correlation structures to account for within-subject dependence. We derive the asymptotic properties of the GEEE estimators and propose a robust estimator of its covariance matrix for inference (see our R package, github.com/AmBarry/expectgee). Our simulations show that the GEEE estimator is non-biased and efficient, and our real data analysis shows it captures heteroscedasticity. △ Less

Submitted 24 December, 2020; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: 40 pages, 14 figures and all section modified

arXiv:1807.08991 [pdf, other]

Internal Migrations in France in the Nineteenth Century

Authors: Arthur Charpentier, Ewen Gallic

Abstract: The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the links between individuals and their ancestors, wh… ▽ More The digital age allows data collection to be done on a large scale and at low cost. This is the case of genealogy trees, which flourish on numerous digital platforms thanks to the collaboration of a mass of individuals wishing to trace their origins and share them with other users. The family trees constituted in this way contain information on the links between individuals and their ancestors, which can be used in historical demography, and more particularly to study migration phenomena. This article proposes to use the family trees of 238, 009 users of the Geneanet website, or 2.5 million (unique) individuals, to study internal migration. The case of 19th century France is taken as an example. Using the geographical coordinates of the birthplaces of individuals born in France between 1800 and 1804 and those of their descendants, we study migration between generations at several geographical scales. We start with a broad scale, that of the departments, to reach a much finer one, that of the cities. Our results are consistent with those of the literature traditionally based on parish or civil status registers. The results show that the use of collaborative genealogy data not only makes it possible to recover known facts in the literature, but also to enrich them. △ Less

Submitted 24 July, 2018; originally announced July 2018.

arXiv:1708.06992 [pdf, other]

Econométrie et Machine Learning

Authors: Arthur Charpentier, Emmanuel Flachaire, Antoine Ly

Abstract: Econometrics and machine learning seem to have one common goal: to construct a predictive model, for a variable of interest, using explanatory variables (or features). However, these two fields developed in parallel, thus creating two different cultures, to paraphrase Breiman (2001). The first was to build probabilistic models to describe economic phenomena. The second uses algorithms that will le… ▽ More Econometrics and machine learning seem to have one common goal: to construct a predictive model, for a variable of interest, using explanatory variables (or features). However, these two fields developed in parallel, thus creating two different cultures, to paraphrase Breiman (2001). The first was to build probabilistic models to describe economic phenomena. The second uses algorithms that will learn from their mistakes, with the aim, most often to classify (sounds, images, etc.). Recently, however, learning models have proven to be more effective than traditional econometric techniques (with a price to pay less explanatory power), and above all, they manage to manage much larger data. In this context, it becomes necessary for econometricians to understand what these two cultures are, what opposes them and especially what brings them closer together, in order to appropriate tools developed by the statistical learning community to integrate them into Econometric models. △ Less

Submitted 19 March, 2018; v1 submitted 26 July, 2017; originally announced August 2017.

Comments: in French

arXiv:1707.07607 [pdf, other]

We are not alone ! (at least, most of us). Homonymy in large scale social groups

Authors: Arthur Charpentier, Baptiste Coulmont

Abstract: This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups. The estimation is based on the generalization of the "birthday paradox problem". The main results is that, in societies such as France or the United States, identity collisions (based on first + last names) are frequent.… ▽ More This article brings forward an estimation of the proportion of homonyms in large scale groups based on the distribution of first names and last names in a subset of these groups. The estimation is based on the generalization of the "birthday paradox problem". The main results is that, in societies such as France or the United States, identity collisions (based on first + last names) are frequent. The large majority of the population has at least one homonym. But in smaller settings, it is much less frequent : even if small groups of a few thousand people have at least one couple of homonyms, only a few individuals have an homonym. △ Less

Submitted 24 July, 2017; originally announced July 2017.

arXiv:1602.08773 [pdf, other]

Macro vs. Micro Methods in Non-Life Claims Reserving (an Econometric Perspective)

Authors: Arthur Charpentier, Mathieu Pigeon

Abstract: Traditionally, actuaries have used run-off triangles to estimate reserve ("macro" models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with "macro" and "micro" models. We study theoretical properties of econometric models (Gaussian, Poisson and quasi-Poisson) on in… ▽ More Traditionally, actuaries have used run-off triangles to estimate reserve ("macro" models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with "macro" and "micro" models. We study theoretical properties of econometric models (Gaussian, Poisson and quasi-Poisson) on individual data, and clustered data. Finally, application on claims reserving are considered. △ Less

Submitted 28 February, 2016; originally announced February 2016.

arXiv:1404.4414 [pdf, other]

Probit transformation for nonparametric kernel estimation of the copula density

Authors: Gery Geenens, Arthur Charpentier, Davy Paindaveine

Abstract: Copula modelling has become ubiquitous in modern statistics. Here, the problem of nonparametrically estimating a copula density is addressed. Arguably the most popular nonparametric density estimator, the kernel estimator is not suitable for the unit-square-supported copula densities, mainly because it is heavily affected by boundary bias issues. In addition, most common copulas admit unbounded de… ▽ More Copula modelling has become ubiquitous in modern statistics. Here, the problem of nonparametrically estimating a copula density is addressed. Arguably the most popular nonparametric density estimator, the kernel estimator is not suitable for the unit-square-supported copula densities, mainly because it is heavily affected by boundary bias issues. In addition, most common copulas admit unbounded densities, and kernel methods are not consistent in that case. In this paper, a kernel-type copula density estimator is proposed. It is based on the idea of transforming the uniform marginals of the copula density into normal distributions via the probit function, estimating the density in the transformed domain, which can be accomplished without boundary problems, and obtaining an estimate of the copula density through back-transformation. Although natural, a raw application of this procedure was, however, seen not to perform very well in the earlier literature. Here, it is shown that, if combined with local likelihood density estimation methods, the idea yields very good and easy to implement estimators, fixing boundary issues in a natural way and able to cope with unbounded copula densities. The asymptotic properties of the suggested estimators are derived, and a practical way of selecting the crucially important smoothing parameters is devised. Finally, extensive simulation studies and a real data analysis evidence their excellent performance compared to their main competitors. △ Less

Submitted 16 April, 2014; originally announced April 2014.

arXiv:1112.0929 [pdf, other]

Multivariate integer-valued autoregressive models applied to earthquake counts

Authors: Mathieu Boudreault, Arthur Charpentier

Abstract: In various situations in the insurance industry, in finance, in epidemiology, etc., one needs to represent the joint evolution of the number of occurrences of an event. In this paper, we present a multivariate integer-valued autoregressive (MINAR) model, derive its properties and apply the model to earthquake occurrences across various pairs of tectonic plates. The model is an extension of Pedelis… ▽ More In various situations in the insurance industry, in finance, in epidemiology, etc., one needs to represent the joint evolution of the number of occurrences of an event. In this paper, we present a multivariate integer-valued autoregressive (MINAR) model, derive its properties and apply the model to earthquake occurrences across various pairs of tectonic plates. The model is an extension of Pedelis & Karlis (2011) where cross autocorrelation (spatial contagion in a seismic context) is considered. We fit various bivariate count models and find that for many contiguous tectonic plates, spatial contagion is significant in both directions. Furthermore, ignoring cross autocorrelation can underestimate the potential for high numbers of occurrences over the short-term. Our overall findings seem to further confirm Parsons & Velasco (2001). △ Less

Submitted 5 December, 2011; originally announced December 2011.

arXiv:1010.2621 [pdf, ps, other]

An Asymmetric Fingerprinting Scheme based on Tardos Codes

Authors: Ana Charpentier, Caroline Fontaine, Teddy Furon, Ingemar Cox

Abstract: Tardos codes are currently the state-of-the-art in the design of practical collusion-resistant fingerprinting codes. Tardos codes rely on a secret vector drawn from a publicly known probability distribution in order to generate each Buyer's fingerprint. For security purposes, this secret vector must not be revealed to the Buyers. To prevent an untrustworthy Provider forging a copy of a Work with a… ▽ More Tardos codes are currently the state-of-the-art in the design of practical collusion-resistant fingerprinting codes. Tardos codes rely on a secret vector drawn from a publicly known probability distribution in order to generate each Buyer's fingerprint. For security purposes, this secret vector must not be revealed to the Buyers. To prevent an untrustworthy Provider forging a copy of a Work with an innocent Buyer's fingerprint, previous asymmetric fingerprinting algorithms enforce the idea of the Buyers generating their own fingerprint. Applying this concept to Tardos codes is challenging since the fingerprint must be based on this vector secret. This paper provides the first solution for an asymmetric fingerprinting protocol dedicated to Tardos codes. The motivations come from a new attack, in which an untrustworthy Provider by modifying his secret vector frames an innocent Buyer. △ Less

Submitted 13 October, 2010; originally announced October 2010.

Comments: 6 pages, 2 figures

arXiv:0901.1521 [pdf, ps, other]

Tails of multivariate Archimedean copulas

Authors: Arthur Charpentier, Johan Segers

Abstract: A complete and user-friendly directory of tails of Archimedean copulas is presented which can be used in the selection and construction of appropriate models with desired properties. The results are synthesized in the form of a decision tree: Given the values of some readily computable characteristics of the Archimedean generator, the upper and lower tails of the copula are classified into one o… ▽ More A complete and user-friendly directory of tails of Archimedean copulas is presented which can be used in the selection and construction of appropriate models with desired properties. The results are synthesized in the form of a decision tree: Given the values of some readily computable characteristics of the Archimedean generator, the upper and lower tails of the copula are classified into one of three classes each, one corresponding to asymptotic dependence and the other two to asymptotic independence. For a long list of single-parameter families, the relevant tail quantities are computed so that the corresponding classes in the decision tree can easily be determined. In addition, new models with tailor-made upper and lower tails can be constructed via a number of transformation methods. The frequently occurring category of asymptotic independence turns out to conceal a surprisingly rich variety of tail dependence structures. △ Less

Submitted 12 January, 2009; originally announced January 2009.

Comments: to appear in the Journal of Multivariate Analysis

Report number: Univ catholique de Louvain, Institut de statistique DP0808 MSC Class: 60G70; 62E20

Showing 1–38 of 38 results for author: Charpentier, A