Search | arXiv e-print repository

A computationally efficient procedure for combining ecological datasets by means of sequential consensus inference

Authors: Mario Figueira, David Conesa, Antonio López-Quílez, Iosu Paradinas

Abstract: Combining data has become an indispensable tool for managing the current diversity and abundance of data. But, as data complexity and data volume swell, the computational demands of previously proposed models for combining data escalate proportionally, posing a significant challenge to practical implementation. This study presents a sequential consensus Bayesian inference procedure that allows for… ▽ More Combining data has become an indispensable tool for managing the current diversity and abundance of data. But, as data complexity and data volume swell, the computational demands of previously proposed models for combining data escalate proportionally, posing a significant challenge to practical implementation. This study presents a sequential consensus Bayesian inference procedure that allows for a flexible definition of models, aiming to emulate the versatility of integrated models while significantly reducing their computational cost. The method is based on updating the distribution of the fixed effects and hyperparameters from their marginal posterior distribution throughout a sequential inference procedure, and performing a consensus on the random effects after the sequential inference is completed. The applicability, together with its strengths and limitations, is outlined in the methodological description of the procedure. The sequential consensus method is presented in two distinct algorithms. The first algorithm performs a sequential updating and consensus from the stored values of the marginal or joint posterior distribution of the random effects. The second algorithm performs an extra step, addressing the deficiencies that may arise when the model partition does not share the whole latent field. The performance of the procedure is shown by three different examples -- one simulated and two with real data -- intending to expose its strengths and limitations. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 36 pages, 15 figures, 1 table and 2 algorithms

MSC Class: 62L10

arXiv:2307.07094 [pdf, other]

How to perform modeling with independent and preferential data jointly?

Authors: Mario Figueira, David Conesa, Antonio López-Quílez, Iosu Paradinas

Abstract: Continuous space species distribution models (SDMs) have a long-standing history as a valuable tool in ecological statistical analysis. Geostatistical and preferential models are both common models in ecology. Geostatistical models are employed when the process under study is independent of the sampling locations, while preferential models are employed when sampling locations are dependent on the… ▽ More Continuous space species distribution models (SDMs) have a long-standing history as a valuable tool in ecological statistical analysis. Geostatistical and preferential models are both common models in ecology. Geostatistical models are employed when the process under study is independent of the sampling locations, while preferential models are employed when sampling locations are dependent on the process under study. But, what if we have both types of data collectd over the same process? Can we combine them? If so, how should we combine them? This study investigated the suitability of both geostatistical and preferential models, as well as a mixture model that accounts for the different sampling schemes. Results suggest that in general the preferential and mixture models have satisfactory and close results in most cases, while the geostatistical models presents systematically worse estimates at higher spatial complexity, smaller number of samples and lower proportion of completely random samples. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 8 pages, 3 figures and 2 tables

arXiv:2305.17922 [pdf, other]

Bayesian feedback in the framework of ecological sciences

Authors: Mario Figueira-Pereira, Xavier Barber, David Conesa, Antonio López-Quílez, Joaquín Martínez-Minaya, Iosu Paradinas, Maria Grazia Pennino

Abstract: In ecology we may find scenarios where the same phenomenon (species occurrence, species abundance, etc.) is observed using two different types of samplers. For instance, species data can be collected from scientific sampling with a completely random sample pattern, but also from opportunistic sampling (e.g., whale or bird watching fishery commercial vessels), in which observers tend to look for a… ▽ More In ecology we may find scenarios where the same phenomenon (species occurrence, species abundance, etc.) is observed using two different types of samplers. For instance, species data can be collected from scientific sampling with a completely random sample pattern, but also from opportunistic sampling (e.g., whale or bird watching fishery commercial vessels), in which observers tend to look for a specific species in areas where they expect to find Species Distribution Models (SDMs) are a widely used tool for analyzing this kind of ecological data. Specifically, we have two models available for the above data: a geostatistical model (GM) for the data coming from a complete random sampler and a preferential model (PM) for data from opportunistic sampling. Integration of information coming from different sources can be handled via expert elicitation and integrated models. We focus here in a sequential Bayesian procedure to connect two models through the update of prior distributions. Implementation of the Bayesian paradigm is done through the integrated nested Laplace approximation (INLA) methodology, a good option to make inference and prediction in spatial models with high performance and low computational costs. This sequential approach has been evaluated by simulating several scenarios and comparing the results of sharing information from one model to another using different criteria. The procedure has also been exemplified with a real dataset. Our main results imply that, in general, it is better to share information from the independent (completely random) to the preferential model than the alternative way. However, it depends on different factors such as the spatial range or the spatial arrangement of sampling locations. △ Less

Submitted 24 April, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 36 pages, 13 figures and 3 tables

MSC Class: 62P10

arXiv:2007.07095 [pdf, other]

Private Sources of Mobility Data Under COVID-19

Authors: Raquel Pérez Arnal, David Conesa, Sergio Alvarez-Napagao, Toyotaro Suzumura, Martí Català, Enric Alvarez, Dario Garcia-Gasulla

Abstract: The COVID-19 pandemic is changing the world in unprecedented and unpredictable ways. Human mobility is at the epicenter of that change, as the greatest facilitator for the spread of the virus. To study the change in mobility, to evaluate the efficiency of mobility restriction policies, and to facilitate a better response to possible future crisis, we need to properly understand all mobility data s… ▽ More The COVID-19 pandemic is changing the world in unprecedented and unpredictable ways. Human mobility is at the epicenter of that change, as the greatest facilitator for the spread of the virus. To study the change in mobility, to evaluate the efficiency of mobility restriction policies, and to facilitate a better response to possible future crisis, we need to properly understand all mobility data sources at our disposal. Our work is dedicated to the study of private mobility sources, gathered and released by large technological companies. This data is of special interest because, unlike most public sources, it is focused on people, not transportation means. i.e., its unit of measurement is the closest thing to a person in a western society: a phone. Furthermore, the sample of society they cover is large and representative. On the other hand, this sort of data is not directly accessible for anonymity reasons. Thus, properly interpreting its patterns demands caution. Aware of that, we set forth to explore the behavior and inter-relations of private sources of mobility data in the context of Spain. This country represents a good experimental setting because of its large and fast pandemic peak, and for its implementation of a sustained, generalized lockdown. We find private mobility sources to be both correlated and complementary. Using them, we evaluate the efficiency of implemented policies, and provide a insights into what new normal means in Spain. △ Less

Submitted 14 July, 2020; originally announced July 2020.

Comments: 14 pages, 8 figures, 1 table

arXiv:1907.04059 [pdf, other]

doi 10.1080/10618600.2022.2144330

The Integrated Nested Laplace Approximation for fitting Dirichlet regression models

Authors: Joaquín Martínez-Minaya, Finn Lindgren, Antonio López-Quílez, Daniel Simpson, David Conesa

Abstract: This paper introduces a Laplace approximation to Bayesian inference in Dirichlet regression models, which can be used to analyze a set of variables on a simplex exhibiting skewness and heteroscedasticity, without having to transform the data. These data, which mainly consist of proportions or percentages of disjoint categories, are widely known as compositional data and are common in areas such as… ▽ More This paper introduces a Laplace approximation to Bayesian inference in Dirichlet regression models, which can be used to analyze a set of variables on a simplex exhibiting skewness and heteroscedasticity, without having to transform the data. These data, which mainly consist of proportions or percentages of disjoint categories, are widely known as compositional data and are common in areas such as ecology, geology, and psychology. We provide both the theoretical foundations and a description of how Laplace approximation can be implemented in the case of Dirichlet regression. The paper also introduces the package dirinla in the R-language that extends the R-INLA package, which can not deal directly with Dirichlet likelihoods. Simulation studies are presented to validate the good behaviour of the proposed method, while a real data case-study is used to show how this approach can be applied. △ Less

Submitted 1 November, 2022; v1 submitted 9 July, 2019; originally announced July 2019.

Journal ref: Journal of Computational and Graphical Statistics (2023)

Showing 1–5 of 5 results for author: Conesa, D