Search | arXiv e-print repository

A sequential Monte Carlo algorithm for data assimilation problems in ecology

Authors: Kwaku Peprah Adjei, Rob Cooke, Nick Isaac, Robert B. O'Hara

Abstract: 1. Temporal trends in species distributions are necessary for monitoring changes in biodiversity, which aids policymakers and conservationists in making informed decisions. Dynamic species distribution models are often fitted to ecological time series data using Markov Chain Monte Carlo algorithms to produce these temporal trends. However, the fitted models can be time-consuming to produce and run… ▽ More 1. Temporal trends in species distributions are necessary for monitoring changes in biodiversity, which aids policymakers and conservationists in making informed decisions. Dynamic species distribution models are often fitted to ecological time series data using Markov Chain Monte Carlo algorithms to produce these temporal trends. However, the fitted models can be time-consuming to produce and run, making it inefficient to refit them as new observations become available. 2. We propose an algorithm that updates model parameters and the latent state distribution (e.g. true occupancy) using the saved information from a previously fitted model. This algorithm capitalises on the strength of importance sampling to generate new posterior samples of interest by updating the model output. The algorithm was validated with simulation studies on linear Gaussian state space models and occupancy models, and we applied the framework to Crested Tits in Switzerland and Yellow Meadow Ants in the UK. 3. We found that models updated with the proposed algorithm captured the true model parameters and latent state values as good as the models refitted to the expanded dataset. Moreover, the updated models were much faster to run and preserved the trajectory of the derived quantities. 4. The proposed approach serves as an alternative to conventional methods for updating state-space models (SSMs), and it is most beneficial when the fitted SSMs have a long run time. Overall, we provide a Monte Carlo algorithm to efficiently update complex models, a key issue in develo** biodiversity models and indicators. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2311.06755 [pdf, other]

The Point Process Framework for Integrated Modelling of Biodiversity Data

Authors: Kwaku Peprah Adjei, Philip Mostert, Jorge Sicacha Parada, Emma Skarstein, Robert B. O'Hara

Abstract: The quantity and types of biodiversity data being collected have increased in recent years. If we are to model and monitor biodiversity effectively, we need to respect how different data sets were collected, and effectively integrate these data types together. The framework that has emerged to do this is based on a point process formulation, with individuals as points and their distribution as a r… ▽ More The quantity and types of biodiversity data being collected have increased in recent years. If we are to model and monitor biodiversity effectively, we need to respect how different data sets were collected, and effectively integrate these data types together. The framework that has emerged to do this is based on a point process formulation, with individuals as points and their distribution as a realisation of a random field. We describe this formulation and how the process model for the actual distribution is linked to the data that is collected through observation models. The observation models describe the data collection process and its uncertainties and biases. We provide an example of using these methods to model species of Norwegian freshwater fish, which shows how integrated models can be adapted to the data we can collect. We summarise the modelling issues that arise and the approaches that could be taken to solve them. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2305.01989 [pdf, other]

Modelling heterogeneity in the classification process in multi-species distribution models can improve predictive performance

Authors: Kwaku Peprah Adjei, Robert B. O'Hara, Wouter Koch, Anders Finstad

Abstract: 1. Species distribution models and maps from large-scale biodiversity data are necessary for conservation management. One current issue is that biodiversity data are prone to taxonomic misclassifications. Methods to account for these misclassifications in multispecies distribution models have assumed that the classification probabilities are constant throughout the study. In reality, classificatio… ▽ More 1. Species distribution models and maps from large-scale biodiversity data are necessary for conservation management. One current issue is that biodiversity data are prone to taxonomic misclassifications. Methods to account for these misclassifications in multispecies distribution models have assumed that the classification probabilities are constant throughout the study. In reality, classification probabilities are likely to vary with several covariates. Failure to account for such heterogeneity can lead to bias in parameter estimates. 2. Here we present a general multispecies distribution model that accounts for heterogeneity in the classification process. The proposed model assumes a multinomial generalised linear model for the classification confusion matrix. We compare the performance of the heterogeneous classification model to that of the homogeneous classification model by assessing how well they estimate the parameters in the model and their predictive performance on hold-out samples. We applied the model to gull data from Norway, Denmark and Finland, obtained from GBIF. 3. Our simulation study showed that accounting for heterogeneity in the classification process increased precision by 30% and reduced accuracy and recall by 6%. Applying the model framework to the gull dataset did not improve the predictive performance between the homogeneous and heterogeneous models due to the smaller misclassified sample sizes. However, when machine learning predictive scores are used as weights to inform the species distribution models about the classification process, the precision increases by 70%. 4. We recommend multiple multinomial regression to be used to model the variation in the classification process when the data contains relatively larger misclassified samples. Machine prediction scores should be used when the data contains relatively smaller misclassified samples. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2206.12179 [pdf]

doi 10.48550/arXiv.2206.12179

How is model-related uncertainty quantified and reported in different disciplines?

Authors: Emily G. Simmonds, Kwaku Peprah Adjei, Christoffer Wold Andersen, Janne Cathrin Hetle Aspheim, Claudia Battistin, Nicola Bulso, Hannah Christensen, Benjamin Cretois, Ryan Cubero, Ivan A. Davidovich, Lisa Dickel, Benjamin Dunn, Etienne Dunn-Sigouin, Karin Dyrstad, Sigurd Einum, Donata Giglio, Haakon Gjerlow, Amelie Godefroidt, Ricardo Gonzalez-Gil, Soledad Gonzalo Cogno, Fabian Grosse, Paul Halloran, Mari F. Jensen, John James Kennedy, Peter Egge Langsaether , et al. (18 additional authors not shown)

Abstract: How do we know how much we know? Quantifying uncertainty associated with our modelling work is the only way we can answer how much we know about any phenomenon. With quantitative science now highly influential in the public sphere and the results from models translating into action, we must support our conclusions with sufficient rigour to produce useful, reproducible results. Incomplete considera… ▽ More How do we know how much we know? Quantifying uncertainty associated with our modelling work is the only way we can answer how much we know about any phenomenon. With quantitative science now highly influential in the public sphere and the results from models translating into action, we must support our conclusions with sufficient rigour to produce useful, reproducible results. Incomplete consideration of model-based uncertainties can lead to false conclusions with real world impacts. Despite these potentially damaging consequences, uncertainty consideration is incomplete both within and across scientific fields. We take a unique interdisciplinary approach and conduct a systematic audit of model-related uncertainty quantification from seven scientific fields, spanning the biological, physical, and social sciences. Our results show no single field is achieving complete consideration of model uncertainties, but together we can fill the gaps. We propose opportunities to improve the quantification of uncertainty through use of a source framework for uncertainty consideration, model type specific guidelines, improved presentation, and shared best practice. We also identify shared outstanding challenges (uncertainty in input data, balancing trade-offs, error propagation, and defining how much uncertainty is required). Finally, we make nine concrete recommendations for current practice (following good practice guidelines and an uncertainty checklist, presenting uncertainty numerically, and propagating model-related uncertainty into conclusions), future research priorities (uncertainty in input data, quantifying uncertainty in complex models, and the importance of missing uncertainty in different contexts), and general research standards across the sciences (transparency about study limitations and dedicated uncertainty sections of manuscripts). △ Less

Submitted 1 July, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

Comments: 40 Pages (including supporting information), 3 Figures, 2 Boxes, 1 Table

arXiv:2204.03708

Accounting for Misclassification in Multispecies Distribution Models

Authors: Kwaku Peprah Adjei, Robert Bob O'Hara, Anders G. Finstad, Wouter Koch

Abstract: 1. Species identification errors may have severe implications for the inference of species distributions. Accounting for misclassification in species distributions is an important topic of biodiversity research. With an increasing amount of biodiversity that comes from Citizen Science projects, where identification is not verified by preserved specimens, this issue is becoming more important. This… ▽ More 1. Species identification errors may have severe implications for the inference of species distributions. Accounting for misclassification in species distributions is an important topic of biodiversity research. With an increasing amount of biodiversity that comes from Citizen Science projects, where identification is not verified by preserved specimens, this issue is becoming more important. This has often been dealt with by accounting for false positives in species distribution models. However, the problem should account for misclassifications in general. 2. Here we present a flexible framework that accounts for misclassification in the distribution models and provides estimates of uncertainty around these estimates. The model was applied to data on viceroy, queen and monarch butterflies in the United States. The data were obtained from the iNaturalist database in the period 2019 to 2020. 3. Simulations and analysis of butterfly data showed that the proposed model was able to correct the reported abundance distribution for misclassification and also predict the true state for misclassified state. △ Less

Submitted 3 May, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: The article has been replaced by another submitted pre-print

Showing 1–5 of 5 results for author: Adjei, K P