-
BayesMortalityPlus: A package in R for Bayesian graduation of mortality modelling
Authors:
Lucas M. F. Silva,
Luiz F. V. Figueiredo,
Viviana G. R. Lobo,
Thaís C. O. Fonseca,
Mariane B. Alves
Abstract:
The BayesMortalityPlus package provides a framework for modelling and predicting mortality data. The package includes tools for the construction of life tables based on Heligman-Pollard laws, and also on dynamic linear smoothers. Flexibility is available in terms of modelling so that the response variable may be modeled as Poisson, Binomial or Gaussian. If temporal data is available, the package p…
▽ More
The BayesMortalityPlus package provides a framework for modelling and predicting mortality data. The package includes tools for the construction of life tables based on Heligman-Pollard laws, and also on dynamic linear smoothers. Flexibility is available in terms of modelling so that the response variable may be modeled as Poisson, Binomial or Gaussian. If temporal data is available, the package provides a Bayesian implementation for the well-known Lee-Carter model that allows for estimation, projection of mortality over time, and assessment of uncertainty of any linear or nonlinear function of parameters such as life expectancy. Illustrations are considered to show the capability of the proposed package to model mortality data.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Lapse risk modelling in insurance: a Bayesian mixture approach
Authors:
Viviana G. R. Lobo,
Thais C. O. Fonseca,
Mariane B. Alves
Abstract:
This paper focuses on modelling surrender time for policyholders in the context of life insurance. In this setup, a large lapse rate at the first months of a contract is often observed, with a decrease in this rate after some months. The modelling of the time to cancellation must account for this specific behaviour. Another stylised fact is that policies which are not cancelled in the study period…
▽ More
This paper focuses on modelling surrender time for policyholders in the context of life insurance. In this setup, a large lapse rate at the first months of a contract is often observed, with a decrease in this rate after some months. The modelling of the time to cancellation must account for this specific behaviour. Another stylised fact is that policies which are not cancelled in the study period are considered censored. To account for both censuring and heterogeneous lapse rates, this work assumes a Bayesian survival model with a mixture of regressions. The inference is based on data augmentation allowing for fast computations even for data sets of over a million clients. Moreover, scalable point estimation based on EM algorithm is also presented. An illustrative example emulates a typical behaviour for life insurance contracts and a simulated study investigates the properties of the proposed model. In particular, the observed censuring in the insurance context might be up to 50% of the data, which is very unusual for survival models in other fields such as epidemiology. This aspect is exploited in our simulated study.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Dynamical non-Gaussian modelling of spatial processes
Authors:
Thaís C. O. da Fonseca,
Viviana G. R. Lobo,
Alexandra M. Schmidt
Abstract:
Spatio-temporal processes in environmental applications are often assumed to follow a Gaussian model, possibly after some transformation. However, heterogeneity in space and time might have a pattern that will not be accommodated by transforming the data. In this scenario, modelling the variance laws is an appealing alternative. This work adds flexibility to the usual Multivariate Dynamic Gaussian…
▽ More
Spatio-temporal processes in environmental applications are often assumed to follow a Gaussian model, possibly after some transformation. However, heterogeneity in space and time might have a pattern that will not be accommodated by transforming the data. In this scenario, modelling the variance laws is an appealing alternative. This work adds flexibility to the usual Multivariate Dynamic Gaussian model by defining the process as a scale mixture between a Gaussian and log-Gaussian processes. The scale is represented by a process varying smoothly over space and time which is allowed to depend on covariates. State-space equations define the dynamics over time for both mean and variance processes resulting infeasible inference and prediction. Analysis of artificial datasets show that the parameters are identifiable and simpler models are well recovered by the general proposed model. The analyses of two important environmental processes, maximum temperature and maximum ozone, illustrate the effectiveness of our proposal in improving the uncertainty quantification in the prediction of spatio-temporal processes.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
A decision support system for addressing food security in the UK
Authors:
Martine J. Barons,
Thais C. O. Fonseca,
Andy Davis,
Jim Q. Smith
Abstract:
This paper presents an integrating decision support system to model food security in the UK. In ever-larger dynamic systems, such as the food system, it is increasingly difficult for decision-makers to effectively account for all the variables within the system that may influence the outcomes of interest under enactments of various candidate policies. Each of the influencing variables is likely, t…
▽ More
This paper presents an integrating decision support system to model food security in the UK. In ever-larger dynamic systems, such as the food system, it is increasingly difficult for decision-makers to effectively account for all the variables within the system that may influence the outcomes of interest under enactments of various candidate policies. Each of the influencing variables is likely, themselves, to be dynamic sub-systems with expert domains supported by sophisticated probabilistic models. Recent increases in food poverty the UK raised the questions about the main drivers to food insecurity, how this may be changing over time and how evidence can be used in evaluating policy for decision support. In this context, an integrating decision support system is proposed for household food security to allow decision-makers to compare several candidate policies which may affect the outcome of food insecurity at household level.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Dynamic clustering of time series data
Authors:
Victhor S. Sartório,
Thaís C. O. Fonseca
Abstract:
We propose a new method for clustering multivariate time-series data based on Dynamic Linear Models. Whereas usual time-series clustering methods obtain static membership parameters, our proposal allows each time-series to dynamically change their cluster memberships over time. In this context, a mixture model is assumed for the time series and a flexible Dirichlet evolution for mixture weights al…
▽ More
We propose a new method for clustering multivariate time-series data based on Dynamic Linear Models. Whereas usual time-series clustering methods obtain static membership parameters, our proposal allows each time-series to dynamically change their cluster memberships over time. In this context, a mixture model is assumed for the time series and a flexible Dirichlet evolution for mixture weights allows for smooth membership changes over time. Posterior estimates and predictions can be obtained through Gibbs sampling, but a more efficient method for obtaining point estimates is presented, based on Stochastic Expectation-Maximization and Gradient Descent. Finally, two applications illustrate the usefulness of our proposed model to model both univariate and multivariate time-series: World Bank indicators for the renewable energy consumption of EU nations and the famous Gapminder dataset containing life-expectancy and GDP per capita for various countries.
△ Less
Submitted 28 January, 2020;
originally announced February 2020.
-
The effects of degrees of freedom estimation in the Asymmetric GARCH model with Student-t Innovations
Authors:
T. C. O. Fonseca,
V. S. Cerqueira,
H. S. Migon,
C. A. C. Torres
Abstract:
This work investigates the effects of using the independent Jeffreys prior for the degrees of freedom parameter of a Student-t model in the asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) model. To capture asymmetry in the reaction to past shocks, smooth transition models are assumed for the variance. We adopt the fully Bayesian approach for inference, prediction and m…
▽ More
This work investigates the effects of using the independent Jeffreys prior for the degrees of freedom parameter of a Student-t model in the asymmetric generalised autoregressive conditional heteroskedasticity (GARCH) model. To capture asymmetry in the reaction to past shocks, smooth transition models are assumed for the variance. We adopt the fully Bayesian approach for inference, prediction and model selection We discuss problems related to the estimation of degrees of freedom in the Student-t model and propose a solution based on independent Jeffreys priors which correct problems in the likelihood function. A simulated study is presented to investigate how the estimation of model parameters in the Student-t GARCH model are affected by small sample sizes, prior distributions and misspecification regarding the sampling distribution. An application to the Dow Jones stock market data illustrates the usefulness of the asymmetric GARCH model with Student-t errors.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Space-time calibration of wind speed forecasts from regional climate models
Authors:
Luiz E. S. Gomes,
Thaís C. O. Fonseca,
Kelly C. M. Gonçalves,
Ramiro Ruiz-Cárdenas
Abstract:
Numerical weather predictions (NWP) are systematically subject to errors due to the deterministic solutions used by numerical models to simulate the atmosphere. Statistical postprocessing techniques are widely used nowadays for NWP calibration. However, time-varying bias is usually not accommodated by such models. Its calibration performance is also sensitive to the temporal window used for traini…
▽ More
Numerical weather predictions (NWP) are systematically subject to errors due to the deterministic solutions used by numerical models to simulate the atmosphere. Statistical postprocessing techniques are widely used nowadays for NWP calibration. However, time-varying bias is usually not accommodated by such models. Its calibration performance is also sensitive to the temporal window used for training. This paper proposes space-time models that extend the main statistical postprocessing approaches to calibrate NWP model outputs. Trans-Gaussian random fields are considered to account for meteorological variables with asymmetric behavior. Data augmentation is used to account for censuring in the response variable. The benefits of the proposed extensions are illustrated through the calibration of hourly 10 m wind speed forecasts in Southeastern Brazil coming from the Eta model.
△ Less
Submitted 3 September, 2020; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Reference Bayesian analysis for hierarchical models
Authors:
Thaís C. O. Fonseca,
Helio S. Migon,
Heudson Mirandola
Abstract:
This paper proposes an alternative approach for constructing invariant Jeffreys prior distributions tailored for hierarchical or multilevel models. In particular, our proposal is based on a flexible decomposition of the Fisher information for hierarchical models which overcomes the marginalization step of the likelihood of model parameters. The Fisher information matrix for the hierarchical model…
▽ More
This paper proposes an alternative approach for constructing invariant Jeffreys prior distributions tailored for hierarchical or multilevel models. In particular, our proposal is based on a flexible decomposition of the Fisher information for hierarchical models which overcomes the marginalization step of the likelihood of model parameters. The Fisher information matrix for the hierarchical model is derived from the Hessian of the Kullback-Liebler (KL) divergence for the model in a neighborhood of the parameter value of interest. Properties of the KL divergence are used to prove the proposed decomposition. Our proposal takes advantage of the hierarchy and leads to an alternative way of computing Jeffreys priors for the hyperparameters and an upper bound for the prior information. While the Jeffreys prior gives the minimum information about parameters, the proposed bound gives an upper limit for the information put in any prior distribution. A prior with information above that limit may be considered too informative. From a practical point of view, the proposed prior may be evaluated computationally as part of a MCMC algorithm. This property might be essential for modeling setups with many levels in which analytic marginalization is not feasible. We illustrate the usefulness of our proposal with examples in mixture models, in model selection priors such as lasso and in the Student-t model.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Bayesian cross-validation of geostatistical models
Authors:
Viviana G R Lobo,
Thaís C O da Fonseca,
Fernando A S Moura
Abstract:
The problem of validating or criticising models for georeferenced data is challenging, since the conclusions can vary significantly depending on the locations of the validation set. This work proposes the use of cross-validation techniques to assess the goodness of fit of spatial models in different regions of the spatial domain to account for uncertainty in the choice of the validation sets. An o…
▽ More
The problem of validating or criticising models for georeferenced data is challenging, since the conclusions can vary significantly depending on the locations of the validation set. This work proposes the use of cross-validation techniques to assess the goodness of fit of spatial models in different regions of the spatial domain to account for uncertainty in the choice of the validation sets. An obvious problem with the basic cross-validation scheme is that it is based on selecting only a few out of sample locations to validate the model, possibily making the conclusions sensitive to which partition of the data into training and validation cases is utilized. A possible solution to this issue would be to consider all possible configurations of data divided into training and validation observations. From a Bayesian point of view, this could be computationally demanding, as estimation of parameters usually requires Monte Carlo Markov Chain methods. To deal with this problem, we propose the use of estimated discrepancy functions considering all configurations of data partition in a computationally efficient manner based on sampling importance resampling. In particular, we consider uncertainty in the locations by assigning a prior distribution to them. Furthermore, we propose a stratified cross-validation scheme to take into account spatial heterogeneity, reducing the total variance of estimated predictive discrepancy measures considered for model assessment. We illustrate the advantages of our proposal with simulated examples of homogeneous and inhomogeneous spatial processes to investigate the effects of our proposal in scenarios of preferential sampling designs. The methods are illustrated with an application to a rainfall dataset.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
Bayesian covariance modeling of multivariate spatial random fields
Authors:
Rafael S. Erbisti,
Thais C. O. Fonseca,
Mariane B. Alves
Abstract:
In this work we present full Bayesian inference for a new flexible nonseparable class of cross-covariance functions for multivariate spatial data. A Bayesian test is proposed for separability of covariance functions which is much more interpretable than parameters related to separability. Spatial models have been increasingly applied in several areas, such as environmental science, climate science…
▽ More
In this work we present full Bayesian inference for a new flexible nonseparable class of cross-covariance functions for multivariate spatial data. A Bayesian test is proposed for separability of covariance functions which is much more interpretable than parameters related to separability. Spatial models have been increasingly applied in several areas, such as environmental science, climate science and agriculture. These data are usually available in space, time and possibly for several processes. In this context the modeling of dependence is crucial for correct uncertainty quantification and reliable predictions. In particular, for multivariate spatial data we need to specify a valid cross-covariance function, which defines the dependence between the components of a response vector for all locations in the spatial domain. However, cross-covariance functions are not easily specified and the computational burden is a limitation for model complexity. In this work, we propose a nonseparable covariance function that is based on the convex combination of separable covariance functions and on latent dimensions representation of the vector components. The covariance structure proposed is valid and flexible. We simulate four different scenarios for different degrees of separability and compute the posterior probability of separability. It turns out that the posterior probability is much easier to interpret than actual model parameters. We illustrate our methodology with a weather dataset from Ceará, Brazil.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.