-
Conformal Prediction Sets for Populations of Graphs
Authors:
Anna Calissano,
Matteo Fontana,
Gianluca Zeni,
Simone Vantini
Abstract:
The analysis of data such as graphs has been gaining increasing attention in the past years. This is justified by the numerous applications in which they appear. Several methods are present to predict graphs, but much fewer to quantify the uncertainty of the prediction. The present work proposes an uncertainty quantification methodology for graphs, based on conformal prediction. The method works b…
▽ More
The analysis of data such as graphs has been gaining increasing attention in the past years. This is justified by the numerous applications in which they appear. Several methods are present to predict graphs, but much fewer to quantify the uncertainty of the prediction. The present work proposes an uncertainty quantification methodology for graphs, based on conformal prediction. The method works both for graphs with the same set of nodes (labelled graphs) and graphs with no clear correspondence between the set of nodes across the observed graphs (unlabelled graphs). The unlabelled case is dealt with the creation of prediction sets embedded in a quotient space. The proposed method does not rely on distributional assumptions, it achieves finite-sample validity, and it identifies interpretable prediction sets. To explore the features of this novel forecasting technique, we perform two simulation studies to show the methodology in both the labelled and the unlabelled case. We showcase the applicability of the method in analysing the performance of different teams during the FIFA 2018 football world championship via their player passing networks.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Local inference for functional data on manifold domains using permutation tests
Authors:
Niels Lundtorp Olsen,
Alessia Pini,
Simone Vantini
Abstract:
Pini and Vantini (2017) introduced the interval-wise testing procedure which performs local inference for functional data defined on an interval domain, where the output is an adjusted p-value function that controls for type I errors. We extend this idea to a general setting where domain is a Riemannian manifolds. This requires new methodology such as how to define adjustment sets on product manif…
▽ More
Pini and Vantini (2017) introduced the interval-wise testing procedure which performs local inference for functional data defined on an interval domain, where the output is an adjusted p-value function that controls for type I errors. We extend this idea to a general setting where domain is a Riemannian manifolds. This requires new methodology such as how to define adjustment sets on product manifolds and how to approximate the test statistic when the domain has non-zero curvature. We propose to use permutation tests for inference and apply the procedure in three settings: a simulation on a "chameleon-shaped" manifold and two applications related to climate change where the manifolds are a complex subset of $S^2$ and $S^2 \times S^1$, respectively. We note the tradeoff between type I and type II errors: increasing the adjustment set reduces the type I error but also results in smaller areas of significance. However, some areas still remain significant even at maximal adjustment.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
funLOCI: a local clustering algorithm for functional data
Authors:
Jacopo Di Iorio,
Simone Vantini
Abstract:
Nowadays, more and more problems are dealing with data with one infinite continuous dimension: functional data. In this paper, we introduce the funLOCI algorithm which allows to identify functional local clusters or functional loci, i.e., subsets/groups of functions exhibiting similar behaviour across the same continuous subset of the domain. The definition of functional local clusters leverages i…
▽ More
Nowadays, more and more problems are dealing with data with one infinite continuous dimension: functional data. In this paper, we introduce the funLOCI algorithm which allows to identify functional local clusters or functional loci, i.e., subsets/groups of functions exhibiting similar behaviour across the same continuous subset of the domain. The definition of functional local clusters leverages ideas from multivariate and functional clustering and biclustering and it is based on an additive model which takes into account the shape of the curves. funLOCI is a three-step algorithm based on divisive hierarchical clustering. The use of dendrograms allows to visualize and to guide the searching procedure and the cutting thresholds selection. To deal with the large quantity of local clusters, an extra step is implemented to reduce the number of results to the minimum.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
An Evaluation of Researchers' Migration Patterns in Europe using Digital Trace Data
Authors:
Jacopo Ghirri,
Marta Mastropietro,
Simone Vantini,
Francesca Ieva,
Matteo Fontana
Abstract:
The comprehension of the mechanisms behind the mobility of skilled workers is of paramount importance for policy making. The lacking nature of official measurements motivates the use of digital trace data extracted from ORCID public records. We use such data to investigate European regions, studied at NUTS2 level, over the time horizon of 2009 to 2020. We present a novel perspective where regions…
▽ More
The comprehension of the mechanisms behind the mobility of skilled workers is of paramount importance for policy making. The lacking nature of official measurements motivates the use of digital trace data extracted from ORCID public records. We use such data to investigate European regions, studied at NUTS2 level, over the time horizon of 2009 to 2020. We present a novel perspective where regions roles are dictated by the overall activity of the research community, contradicting the common brain drain interpretation of the phenomenon. We find that a high mobility is usually correlated with strong university prestige, high magnitude of investments and an overall good schooling level in a region.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Conformal Prediction Bands for Two-Dimensional Functional Time Series
Authors:
Niccolò Ajroldi,
Jacopo Diquigiovanni,
Matteo Fontana,
Simone Vantini
Abstract:
Time evolving surfaces can be modeled as two-dimensional Functional time series, exploiting the tools of Functional data analysis. Leveraging this approach, a forecasting framework for such complex data is developed. The main focus revolves around Conformal Prediction, a versatile nonparametric paradigm used to quantify uncertainty in prediction problems. Building upon recent variations of Conform…
▽ More
Time evolving surfaces can be modeled as two-dimensional Functional time series, exploiting the tools of Functional data analysis. Leveraging this approach, a forecasting framework for such complex data is developed. The main focus revolves around Conformal Prediction, a versatile nonparametric paradigm used to quantify uncertainty in prediction problems. Building upon recent variations of Conformal Prediction for Functional time series, a probabilistic forecasting scheme for two-dimensional functional time series is presented, while providing an extension of Functional Autoregressive Processes of order one to this setting. Estimation techniques for the latter process are introduced and their performance are compared in terms of the resulting prediction regions. Finally, the proposed forecasting procedure and the uncertainty quantification technique are applied to a real dataset, collecting daily observations of Sea Level Anomalies of the Black Sea
△ Less
Submitted 18 July, 2023; v1 submitted 27 July, 2022;
originally announced July 2022.
-
funcharts: Control charts for multivariate functional data in R
Authors:
Christian Capezza,
Fabio Centofanti,
Antonio Lepore,
Alessandra Menafoglio,
Biagio Palumbo,
Simone Vantini
Abstract:
Modern statistical process monitoring (SPM) applications focus on profile monitoring, i.e., the monitoring of process quality characteristics that can be modeled as profiles, also known as functional data. Despite the large interest in the profile monitoring literature, there is still a lack of software to facilitate its practical application. This article introduces the funcharts R package that i…
▽ More
Modern statistical process monitoring (SPM) applications focus on profile monitoring, i.e., the monitoring of process quality characteristics that can be modeled as profiles, also known as functional data. Despite the large interest in the profile monitoring literature, there is still a lack of software to facilitate its practical application. This article introduces the funcharts R package that implements recent developments on the SPM of multivariate functional quality characteristics, possibly adjusted by the influence of additional variables, referred to as covariates. The package also implements the real-time version of all control charting procedures to monitor profiles partially observed up to an intermediate domain point. The package is illustrated both through its built-in data generator and a real-case study on the SPM of Ro-Pax ship CO2 emissions during navigation, which is based on the ShipNavigation data provided in the Supplementary Material.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
conformalInference.multi and conformalInference.fd: Twin Packages for Conformal Prediction
Authors:
Paolo Vergottini,
Matteo Fontana,
Jacopo Diquigiovanni,
Aldo Solari,
Simone Vantini
Abstract:
Building on top of a regression model, Conformal Prediction methods produce distribution free prediction sets, requiring only i.i.d. data. While R packages implementing such methods for the univariate response framework have been developed, this is not the case with multivariate and functional responses. conformalInference.multi and conformalInference.fd address this void, by extending classical a…
▽ More
Building on top of a regression model, Conformal Prediction methods produce distribution free prediction sets, requiring only i.i.d. data. While R packages implementing such methods for the univariate response framework have been developed, this is not the case with multivariate and functional responses. conformalInference.multi and conformalInference.fd address this void, by extending classical and more advanced conformal prediction methods like full conformal, split conformal, jackknife+ and multi split conformal to deal with the multivariate and functional case. The extreme flexibility of conformal prediction, fully embraced by the structure of the package, which does not require any specific regression model, enables users to pass in any regression function as input while using basic regression models as reference. Finally, the issue of visualisation is addressed by providing embedded plotting functions to visualize prediction regions.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Robust Functional ANOVA with Application to Additive Manufacturing
Authors:
Fabio Centofanti,
Bianca Maria Colosimo,
Marco Luigi Grasso,
Alessandra Menafoglio,
Biagio Palumbo,
Simone Vantini
Abstract:
The development of data acquisition systems is facilitating the collection of data that are apt to be modelled as functional data. In some applications, the interest lies in the identification of significant differences in group functional means defined by varying experimental conditions, which is known as functional analysis of variance (FANOVA). With real data, it is common that the sample under…
▽ More
The development of data acquisition systems is facilitating the collection of data that are apt to be modelled as functional data. In some applications, the interest lies in the identification of significant differences in group functional means defined by varying experimental conditions, which is known as functional analysis of variance (FANOVA). With real data, it is common that the sample under study is contaminated by some outliers, which can strongly bias the analysis. In this paper, we propose a new robust nonparametric functional ANOVA method (RoFANOVA) that reduces the weights of outlying functional data on the results of the analysis. It is implemented through a permutation test based on a test statistic obtained via a functional extension of the classical robust $ M $-estimator. By means of an extensive Monte Carlo simulation study, the proposed test is compared with some alternatives already presented in the literature, in both one-way and two-way designs. The performance of the RoFANOVA is demonstrated in the framework of a motivating real-case study in the field of additive manufacturing that deals with the analysis of spatter ejections. The RoFANOVA method is implemented in the R package rofanova, available online at https://github.com/unina-sfere/rofanova.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Distribution-Free Prediction Bands for Multivariate Functional Time Series: an Application to the Italian Gas Market
Authors:
Jacopo Diquigiovanni,
Matteo Fontana,
Simone Vantini
Abstract:
Uncertainty quantification in forecasting represents a topic of great importance in energy trading, as understanding the status of the energy market would enable traders to directly evaluate the impact of their own offers/bids. To this end, we propose a scalable procedure that outputs closed-form simultaneous prediction bands for multivariate functional response variables in a time series setting,…
▽ More
Uncertainty quantification in forecasting represents a topic of great importance in energy trading, as understanding the status of the energy market would enable traders to directly evaluate the impact of their own offers/bids. To this end, we propose a scalable procedure that outputs closed-form simultaneous prediction bands for multivariate functional response variables in a time series setting, which is able to guarantee performance bounds in terms of unconditional coverage and asymptotic exactness, both under some conditions. After evaluating its performance on synthetic data, the method is used to build multivariate prediction bands for daily demand and offer curves in the Italian gas market.
△ Less
Submitted 15 January, 2024; v1 submitted 1 July, 2021;
originally announced July 2021.
-
Conformal Prediction Bands for Multivariate Functional Data
Authors:
Jacopo Diquigiovanni,
Matteo Fontana,
Simone Vantini
Abstract:
Motivated by the pressing request of methods able to create prediction sets in a general regression framework for a multivariate functional response and pushed by new methodological advancements in non-parametric prediction for functional data, we propose a set of conformal predictors that produce finite-sample either valid or exact multivariate simultaneous prediction bands under the mild assumpt…
▽ More
Motivated by the pressing request of methods able to create prediction sets in a general regression framework for a multivariate functional response and pushed by new methodological advancements in non-parametric prediction for functional data, we propose a set of conformal predictors that produce finite-sample either valid or exact multivariate simultaneous prediction bands under the mild assumption of exchangeable regression pairs. The fact that the prediction bands can be built around any regression estimator and that can be easily found in closed form yields a very widely usable method, which is fairly straightforward to implement. In addition, we first introduce and then describe a specific conformal predictor that guarantees an asymptotic result in terms of efficiency and inducing prediction bands able to modulate their width based on the local behavior and magnitude of the functional data. The method is investigated and analyzed through a simulation study and a real-world application in the field of urban mobility.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
The Importance of Being a Band: Finite-Sample Exact Distribution-Free Prediction Sets for Functional Data
Authors:
Jacopo Diquigiovanni,
Matteo Fontana,
Simone Vantini
Abstract:
Functional Data Analysis represents a field of growing interest in statistics. Despite several studies have been proposed leading to fundamental results, the problem of obtaining valid and efficient prediction sets has not been thoroughly covered. Indeed, the great majority of methods currently in the literature rely on strong distributional assumptions (e.g, Gaussianity), dimension reduction tech…
▽ More
Functional Data Analysis represents a field of growing interest in statistics. Despite several studies have been proposed leading to fundamental results, the problem of obtaining valid and efficient prediction sets has not been thoroughly covered. Indeed, the great majority of methods currently in the literature rely on strong distributional assumptions (e.g, Gaussianity), dimension reduction techniques and/or asymptotic arguments. In this work, we propose a new nonparametric approach in the field of Conformal Prediction based on a new family of nonconformity measures inducing conformal predictors able to create closed-form finite-sample valid or exact prediction sets under very minimal distributional assumptions. In addition, our proposal ensures that the prediction sets obtained are bands, an essential feature in the functional setting that allows the visualization and interpretation of such sets. The procedure is also fast, scalable, does not rely on functional dimension reduction techniques and allows the user to select different nonconformity measures depending on the problem at hand always obtaining valid bands. Within this family of measures, we propose also a specific measure leading to prediction bands asymptotically no less efficient than those with constant width.
△ Less
Submitted 12 April, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Adaptive Smoothing Spline Estimator for the Function-on-Function Linear Regression Model
Authors:
Fabio Centofanti,
Antonio Lepore,
Alessandra Menafoglio,
Biagio Palumbo,
Simone Vantini
Abstract:
In this paper, we propose an adaptive smoothing spline (AdaSS) estimator for the function-on-function linear regression model where each value of the response, at any domain point, depends on the full trajectory of the predictor. The AdaSS estimator is obtained by the optimization of an objective function with two spatially adaptive penalties, based on initial estimates of the partial derivatives…
▽ More
In this paper, we propose an adaptive smoothing spline (AdaSS) estimator for the function-on-function linear regression model where each value of the response, at any domain point, depends on the full trajectory of the predictor. The AdaSS estimator is obtained by the optimization of an objective function with two spatially adaptive penalties, based on initial estimates of the partial derivatives of the regression coefficient function. This allows the proposed estimator to adapt more easily to the true coefficient function over regions of large curvature and not to be undersmoothed over the remaining part of the domain. A novel evolutionary algorithm is developed ad hoc to obtain the optimization tuning parameters. Extensive Monte Carlo simulations have been carried out to compare the AdaSS estimator with competitors that have already appeared in the literature before. The results show that our proposal mostly outperforms the competitor in terms of estimation and prediction accuracy. Lastly, those advantages are illustrated also on two real-data benchmark examples.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Ridge regression with adaptive additive rectangles and other piecewise functional templates
Authors:
Edoardo Belli,
Simone Vantini
Abstract:
We propose an $L_{2}$-based penalization algorithm for functional linear regression models, where the coefficient function is shrunk towards a data-driven shape template $γ$, which is constrained to belong to a class of piecewise functions by restricting its basis expansion. In particular, we focus on the case where $γ$ can be expressed as a sum of $q$ rectangles that are adaptively positioned wit…
▽ More
We propose an $L_{2}$-based penalization algorithm for functional linear regression models, where the coefficient function is shrunk towards a data-driven shape template $γ$, which is constrained to belong to a class of piecewise functions by restricting its basis expansion. In particular, we focus on the case where $γ$ can be expressed as a sum of $q$ rectangles that are adaptively positioned with respect to the regression error. As the problem of finding the optimal knot placement of a piecewise function is nonconvex, the proposed parametrization allows to reduce the number of variables in the global optimization scheme, resulting in a fitting algorithm that alternates between approximating a suitable template and solving a convex ridge-like problem. The predictive power and interpretability of our method is shown on multiple simulations and two real world case studies.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Measure Inducing Classification and Regression Trees for Functional Data
Authors:
Edoardo Belli,
Simone Vantini
Abstract:
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis, which allows to leverage representation learning and multiple splitting rules at the node level, reducing generalization error while retaining the interpretability of a tree. This is achieved by learning a weighted functional $L^{2}$ space by means of constrained convex optimiza…
▽ More
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis, which allows to leverage representation learning and multiple splitting rules at the node level, reducing generalization error while retaining the interpretability of a tree. This is achieved by learning a weighted functional $L^{2}$ space by means of constrained convex optimization, which is then used to extract multiple weighted integral features from the input functions, in order to determine the binary split for each internal node of the tree. The approach is designed to manage multiple functional inputs and/or outputs, by defining suitable splitting rules and loss functions that can depend on the specific problem and can also be combined with scalar and categorical data, as the tree is grown with the original greedy CART algorithm. We focus on the case of scalar-valued functional inputs defined on unidimensional domains and illustrate the effectiveness of our method in both classification and regression tasks, through a simulation study and four real world applications.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Smooth Lasso Estimator for the Function-on-Function Linear Regression Model
Authors:
Fabio Centofanti,
Matteo Fontana,
Antonio Lepore,
Simone Vantini
Abstract:
A new estimator, named S-LASSO, is proposed for the coefficient function of the Function-on-Function linear regression model. The S-LASSO estimator is shown to be able to increase the interpretability of the model, by better locating regions where the coefficient function is zero, and to smoothly estimate non-zero values of the coefficient function. The sparsity of the estimator is ensured by a \t…
▽ More
A new estimator, named S-LASSO, is proposed for the coefficient function of the Function-on-Function linear regression model. The S-LASSO estimator is shown to be able to increase the interpretability of the model, by better locating regions where the coefficient function is zero, and to smoothly estimate non-zero values of the coefficient function. The sparsity of the estimator is ensured by a \textit{functional LASSO penalty}, which pointwise shrinks toward zero the coefficient function, while the smoothness is provided by two roughness penalties that penalize the curvature of the final estimator. The resulting estimator is proved to be estimation and pointwise sign consistent. Via an extensive Monte Carlo simulation study, the estimation and predictive performance of the S-LASSO estimator are shown to be better than (or at worst comparable with) competing estimators already presented in the literature before. Practical advantages of the S-LASSO estimator are illustrated through the analysis of the \textit{Canadian weather}, \textit{Swedish mortality} and \textit{ship CO\textsubscript{2} emission data}. The S-LASSO method is implemented in the \textsf{R} package \textsf{slasso}, openly available online on CRAN.
△ Less
Submitted 14 July, 2022; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Global Sensitivity and Domain-Selective Testing for Functional-Valued Responses: An Application to Climate Economy Models
Authors:
Matteo Fontana,
Massimo Tavoni,
Simone Vantini
Abstract:
Understanding the dynamics and evolution of climate change and associated uncertainties is key for designing robust policy actions. Computer models are key tools in this scientific effort, which have now reached a high level of sophistication and complexity. Model auditing is needed in order to better understand their results, and to deal with the fact that such models are increasingly opaque with…
▽ More
Understanding the dynamics and evolution of climate change and associated uncertainties is key for designing robust policy actions. Computer models are key tools in this scientific effort, which have now reached a high level of sophistication and complexity. Model auditing is needed in order to better understand their results, and to deal with the fact that such models are increasingly opaque with respect to their inner workings. Current techniques such as Global Sensitivity Analysis (GSA) are limited to dealing either with multivariate outputs, stochastic ones, or finite-change inputs. This limits their applicability to time-varying variables such as future pathways of greenhouse gases. To provide additional semantics in the analysis of a model ensemble, we provide an extension of GSA methodologies tackling the case of stochastic functional outputs with finite change inputs. To deal with finite change inputs and functional outputs, we propose an extension of currently available GSA methodologies while we deal with the stochastic part by introducing a novel, domain-selective inferential technique for sensitivity indices. Our method is explored via a simulation study that shows its robustness and efficacy in detecting sensitivity patterns. We apply it to real world data, where its capabilities can provide to practitioners and policymakers additional information about the time dynamics of sensitivity patterns, as well as information about robustness.
△ Less
Submitted 29 April, 2024; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Conformal Prediction: a Unified Review of Theory and New Challenges
Authors:
Matteo Fontana,
Gianluca Zeni,
Simone Vantini
Abstract:
In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the…
▽ More
In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.
△ Less
Submitted 29 July, 2022; v1 submitted 16 May, 2020;
originally announced May 2020.
-
False Discovery Rate for Functional Data
Authors:
Niels Lundtorp Olsen,
Alessia Pini,
Simone Vantini
Abstract:
Since Benjamini and Hochberg introduced false discovery rate (FDR) in their seminal paper, this has become a very popular approach to the multiple comparisons problem. An increasingly popular topic within functional data analysis is local inference, i.e., the continuous statistical testing of a null hypothesis along the domain. The principal issue in this topic is the infinite amount of tested hyp…
▽ More
Since Benjamini and Hochberg introduced false discovery rate (FDR) in their seminal paper, this has become a very popular approach to the multiple comparisons problem. An increasingly popular topic within functional data analysis is local inference, i.e., the continuous statistical testing of a null hypothesis along the domain. The principal issue in this topic is the infinite amount of tested hypotheses, which can be seen as an extreme case of the multiple comparisons problem. In this paper we define and discuss the notion of false discovery rate in a very general functional data setting. Moreover, a continuous version of the Benjamini-Hochberg procedure is introduced along with a definition of adjusted p-value function. Some general conditions are stated, under which the functional Benjamini-Hochberg procedure provides control of the functional FDR. Two different simulation studies are presented; the first study has a one-dimensional domain and a comparison with another state of the art method, and the second study has a planar two-dimensional domain. Finally, the proposed method is applied to satellite measurements of Earth temperature. In detail, we aim at identifying the regions of the planet where temperature has significantly increased in the last decades. After adjustment, large areas are still significant.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.