-
Lomas: A Platform for Confidential Analysis of Private Data
Authors:
Damien Aymon,
Dan-Thuy Lam,
Lancelot Marti,
Pauline Maury-Laribière,
Christine Choirat,
Raphaël de Fondeville
Abstract:
Public services collect massive volumes of data to fulfill their missions. These data fuel the generation of regional, national, and international statistics across various sectors. However, their immense potential remains largely untapped due to strict and legitimate privacy regulations. In this context, Lomas is a novel open-source platform designed to realize the full potential of the data held…
▽ More
Public services collect massive volumes of data to fulfill their missions. These data fuel the generation of regional, national, and international statistics across various sectors. However, their immense potential remains largely untapped due to strict and legitimate privacy regulations. In this context, Lomas is a novel open-source platform designed to realize the full potential of the data held by public administrations. It enables authorized users, such as approved researchers and government analysts, to execute algorithms on confidential datasets without directly accessing the data. The Lomas platform is designed to operate within a trusted computing environment, such as governmental IT infrastructure. Authorized users access the platform remotely to submit their algorithms for execution on private datasets. Lomas executes these algorithms without revealing the data to the user and returns the results protected by Differential Privacy, a framework that introduces controlled noise to the results, rendering any attempt to extract identifiable information unreliable. Differential Privacy allows for the mathematical quantification and control of the risk of disclosure while allowing for a complete transparency regarding how data is protected and utilized. The contributions of this project will significantly transform how data held by public services are used, unlocking valuable insights from previously inaccessible data. Lomas empowers research, informing policy development, e.g., public health interventions, and driving innovation across sectors, all while upholding the highest data confidentiality standards.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Optimal sequential sampling design for environmental extremes
Authors:
Raphaël de Fondeville,
Matthieu Wilhelm
Abstract:
The Sihl river, located near the city of Zurich in Switzerland, is under continuous and tight surveillance as it flows directly under the city's main railway station. To issue early warnings and conduct accurate risk quantification, a dense network of monitoring stations is necessary inside the river basin. However, as of 2021 only three automatic stations are operated in this region, naturally ra…
▽ More
The Sihl river, located near the city of Zurich in Switzerland, is under continuous and tight surveillance as it flows directly under the city's main railway station. To issue early warnings and conduct accurate risk quantification, a dense network of monitoring stations is necessary inside the river basin. However, as of 2021 only three automatic stations are operated in this region, naturally raising the question: how to extend this network for optimal monitoring of extreme rainfall events?
So far, existing methodologies for station network design have mostly focused on maximizing interpolation accuracy or minimizing the uncertainty of some model's parameters estimates. In this work, we propose new principles inspired from extreme value theory for optimal monitoring of extreme events. For stationary processes, we study the theoretical properties of the induced sampling design that yields non-trivial point patterns resulting from a compromise between a boundary effect and the maximization of inter-location distances. For general applications, we propose a theoretically justified functional peak-over-threshold model and provide an algorithm for sequential station selection. We then issue recommendations for possible extensions of the Sihl river monitoring network, by efficiently leveraging both station and radar measurements available in this region.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Influence of advanced footwear technology on sub-2 hour marathon and other top running performances
Authors:
Andreu Arderiu,
Raphaël de Fondeville
Abstract:
In 2019, Eliud Kipchoge ran a sub-two hour marathon wearing Nike's Alphafly shoes. Despite being the fastest marathon time ever recorded, it wasn't officially recognized as race conditions were tightly controlled to maximize his success. Besides, Kipchoge's use of Alphafly shoes was controversial, with some experts claiming that they might have provided an unfair competitive advantage. In this wor…
▽ More
In 2019, Eliud Kipchoge ran a sub-two hour marathon wearing Nike's Alphafly shoes. Despite being the fastest marathon time ever recorded, it wasn't officially recognized as race conditions were tightly controlled to maximize his success. Besides, Kipchoge's use of Alphafly shoes was controversial, with some experts claiming that they might have provided an unfair competitive advantage. In this work, we assess the potential influence of advanced footwear technology and the likelihood of a sub-two hour marathon in official races, by studying the evolution of running top performances from 2001 to 2019 for long distances ranging from 10km to marathon. The analysis is performed using extreme value theory, a field of statistics dealing with analysis of rare events. We find a significant evidence of performance-enhancement effect with a 10% increase of the probability that a new world record for marathon-men discipline is set in 2021. However, results suggest that achieving a sub-two hour marathon in an official race in 2021 is still very unlikely, and exceeds 10% probability only by 2025.
△ Less
Submitted 2 March, 2022; v1 submitted 17 April, 2021;
originally announced April 2021.
-
Functional Peaks-over-threshold Analysis
Authors:
Raphaël de Fondeville,
Anthony C. Davison
Abstract:
Peaks-over-threshold analysis using the generalized Pareto distribution is widely applied in modelling tails of univariate random variables, but much information may be lost when complex extreme events are studied using univariate results. In this paper, we extend peaks-over-threshold analysis to extremes of functional data. Threshold exceedances defined using a functional $r$ are modelled by the…
▽ More
Peaks-over-threshold analysis using the generalized Pareto distribution is widely applied in modelling tails of univariate random variables, but much information may be lost when complex extreme events are studied using univariate results. In this paper, we extend peaks-over-threshold analysis to extremes of functional data. Threshold exceedances defined using a functional $r$ are modelled by the generalized $r$-Pareto process, a functional generalization of the generalized Pareto distribution that covers the three classical regimes for the decay of tail probabilities, and that is the only possible continuous limit for $r$-exceedances of a properly rescaled process. We give construction rules, simulation algorithms and inference procedures for generalized $r$-Pareto processes, discuss model validation, and use the new methodology to study extreme European windstorms and heavy spatial rainfall.
△ Less
Submitted 13 January, 2022; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Evaluating probabilistic forecasts of extremes using continuous ranked probability score distributions
Authors:
Maxime Taillardat,
Anne-Laure Fougères,
Philippe Naveau,
Raphaël de Fondeville
Abstract:
Verifying probabilistic forecasts for extreme events is a highly active research area because popular media and public opinions are naturally focused on extreme events, and biased conclusions are readily made. In this context, classical verification methods tailored for extreme events, such as thresholded and weighted scoring rules, have undesirable properties that cannot be mitigated, and the wel…
▽ More
Verifying probabilistic forecasts for extreme events is a highly active research area because popular media and public opinions are naturally focused on extreme events, and biased conclusions are readily made. In this context, classical verification methods tailored for extreme events, such as thresholded and weighted scoring rules, have undesirable properties that cannot be mitigated, and the well-known continuous ranked probability score (CRPS) is no exception.
In this paper, we define a formal framework for assessing the behavior of forecast evaluation procedures with respect to extreme events, which we use to demonstrate that assessment based on the expectation of a proper score is not suitable for extremes. Alternatively, we propose studying the properties of the CRPS as a random variable by using extreme value theory to address extreme event verification. An index is introduced to compare calibrated forecasts, which summarizes the ability of probabilistic forecasts for predicting extremes. The strengths and limitations of this method are discussed using both theoretical arguments and simulations.
△ Less
Submitted 8 February, 2023; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Extremal Behavior of Aggregated Data with an Application to Downscaling
Authors:
Sebastian Engelke,
Raphael de Fondeville,
Marco Oesting
Abstract:
The distribution of spatially aggregated data from a stochastic process $X$ may exhibit a different tail behavior than its marginal distributions. For a large class of aggregating functionals $\ell$ we introduce the $\ell$-extremal coefficient that quantifies this difference as a function of the extremal spatial dependence in $X$. We also obtain the joint extremal dependence for multiple aggregati…
▽ More
The distribution of spatially aggregated data from a stochastic process $X$ may exhibit a different tail behavior than its marginal distributions. For a large class of aggregating functionals $\ell$ we introduce the $\ell$-extremal coefficient that quantifies this difference as a function of the extremal spatial dependence in $X$. We also obtain the joint extremal dependence for multiple aggregation functionals applied to the same process. Explicit formulas for the $\ell$-extremal coefficients and multivariate dependence structures are derived in important special cases. The results provide a theoretical link between the extremal distribution of the aggregated data and the corresponding underlying process, which we exploit to develop a method for statistical downscaling. We apply our framework to downscale daily temperature maxima in the south of France from a gridded data set and use our model to generate high resolution maps of the warmest day during the 2003 heatwave.
△ Less
Submitted 28 December, 2017;
originally announced December 2017.
-
High-dimensional peaks-over-threshold inference
Authors:
Raphaël de Fondeville,
Anthony C. Davison
Abstract:
Max-stable processes are increasingly widely used for modelling complex extreme events, but existing fitting methods are computationally demanding, limiting applications to a few dozen variables. $r$-Pareto processes are mathematically simpler and have the potential advantage of incorporating all relevant extreme events, by generalizing the notion of a univariate exceedance. In this paper we inves…
▽ More
Max-stable processes are increasingly widely used for modelling complex extreme events, but existing fitting methods are computationally demanding, limiting applications to a few dozen variables. $r$-Pareto processes are mathematically simpler and have the potential advantage of incorporating all relevant extreme events, by generalizing the notion of a univariate exceedance. In this paper we investigate score matching for performing high-dimensional peaks over threshold inference, focusing on extreme value processes associated to log-Gaussian random functions and discuss the behaviour of the proposed estimators for regularly-varying distributions with normalized marginals. Their performance is assessed on grids with several hundred locations, simulating from both the true model and from its domain of attraction. We illustrate the potential and flexibility of our methods by modelling extreme rainfall on a grid with $3600$ locations, based on risks for exceedances over local quantiles and for large spatially accumulated rainfall, and briefly discuss diagnostics of model fit. The differences between the two fitted models highlight the importance of the choice of risk and its impact on the dependence structure.
△ Less
Submitted 13 June, 2017; v1 submitted 27 May, 2016;
originally announced May 2016.