-
Bayesian Projection of Refugee and Asylum Seeker Populations
Authors:
Herbert Susmann,
Adrian E. Raftery
Abstract:
Estimates of future migration patterns are a crucial input to world population projections. Forced migration, including refugee and asylum seekers, plays an important role in overall migration patterns, but is notoriously difficult to forecast. We propose a modeling pipeline based on Bayesian hierarchical time-series modeling for projecting combined refugee and asylum seeker populations by country…
▽ More
Estimates of future migration patterns are a crucial input to world population projections. Forced migration, including refugee and asylum seekers, plays an important role in overall migration patterns, but is notoriously difficult to forecast. We propose a modeling pipeline based on Bayesian hierarchical time-series modeling for projecting combined refugee and asylum seeker populations by country of origin using data from the United Nations High Commissioner for Human Rights (UNHCR). Our approach is based on a conceptual model of refugee crises following growth and decline phases, separated by a peak. The growth and decline phases are modeled by logistic growth and decline through an interrupted logistic process model. We evaluate our method through a set of validation exercises that show it has good performance for forecasts at 1, 5, and 10 year horizons, and we present projections for 35 countries with ongoing refugee crises.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Local Longitudinal Modified Treatment Policies
Authors:
Herbert Susmann,
Iván Díaz
Abstract:
Longitudinal Modified Treatment Policies (LMTPs) provide a framework for defining a broad class of causal target parameters for continuous and categorical exposures. We propose Local LMTPs, a generalization of LMTPs to settings where the target parameter is conditional on subsets of units defined by the treatment or exposure. Such parameters have wide scientific relevance, with well-known paramete…
▽ More
Longitudinal Modified Treatment Policies (LMTPs) provide a framework for defining a broad class of causal target parameters for continuous and categorical exposures. We propose Local LMTPs, a generalization of LMTPs to settings where the target parameter is conditional on subsets of units defined by the treatment or exposure. Such parameters have wide scientific relevance, with well-known parameters such as the Average Treatment Effect on the Treated (ATT) falling within the class. We provide a formal causal identification result that expresses the Local LMTP parameter in terms of sequential regressions, and derive the efficient influence function of the parameter which defines its semi-parametric and local asymptotic minimax efficiency bound. Efficient semi-parametric inference of Local LMTP parameters requires estimating the ratios of functions of complex conditional probabilities (or densities). We propose an estimator for Local LMTP parameters that directly estimates these required ratios via empirical loss minimization, drawing on the theory of Riesz representers. The estimator is implemented using a combination of ensemble machine learning algorithms and deep neural networks, and evaluated via simulation studies. We illustrate in simulation that estimation of the density ratios using Riesz representation might provide more stable estimators in finite samples in the presence of empirical violations of the overlap/positivity assumption.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
AdaptiveConformal: An R Package for Adaptive Conformal Inference
Authors:
Herbert Susmann,
Antoine Chambaz,
Julie Josse
Abstract:
Conformal Inference (CI) is a popular approach for generating finite sample prediction intervals based on the output of any point prediction method when data are exchangeable. Adaptive Conformal Inference (ACI) algorithms extend CI to the case of sequentially observed data, such as time series, and exhibit strong theoretical guarantees without having to assume exchangeability of the observed data.…
▽ More
Conformal Inference (CI) is a popular approach for generating finite sample prediction intervals based on the output of any point prediction method when data are exchangeable. Adaptive Conformal Inference (ACI) algorithms extend CI to the case of sequentially observed data, such as time series, and exhibit strong theoretical guarantees without having to assume exchangeability of the observed data. The common thread that unites algorithms in the ACI family is that they adaptively adjust the width of the generated prediction intervals in response to the observed data. We provide a detailed description of five ACI algorithms and their theoretical guarantees, and test their performance in simulation studies. We then present a case study of producing prediction intervals for influenza incidence in the United States based on black-box point forecasts. Implementations of all the algorithms are released as an open-source R package, AdaptiveConformal, which also includes tools for visualizing and summarizing conformal prediction intervals.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Quantile Super Learning for independent and online settings with application to solar power forecasting
Authors:
Herbert Susmann,
Antoine Chambaz
Abstract:
Estimating quantiles of an outcome conditional on covariates is of fundamental interest in statistics with broad application in probabilistic prediction and forecasting. We propose an ensemble method for conditional quantile estimation, Quantile Super Learning, that combines predictions from multiple candidate algorithms based on their empirical performance measured with respect to a cross-validat…
▽ More
Estimating quantiles of an outcome conditional on covariates is of fundamental interest in statistics with broad application in probabilistic prediction and forecasting. We propose an ensemble method for conditional quantile estimation, Quantile Super Learning, that combines predictions from multiple candidate algorithms based on their empirical performance measured with respect to a cross-validated empirical risk of the quantile loss function. We present theoretical guarantees for both iid and online data scenarios. The performance of our approach for quantile estimation and in forming prediction intervals is tested in simulation studies. Two case studies related to solar energy are used to illustrate Quantile Super Learning: in an iid setting, we predict the physical properties of perovskite materials for photovoltaic cells, and in an online setting we forecast ground solar irradiance based on output from dynamic weather ensemble models.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
A Bayesian analysis of current duration data with reporting issues: an application to estimating the distribution of time-between-sex from time-since-last-sex data as collected in cross-sectional surveys in low- and middle-income countries
Authors:
Chi Hyun Lee,
Herbert Susmann,
Leontine Alkema
Abstract:
Aggregate measures of family planning are used to monitor demand for and usage of contraceptive methods in populations globally, for example as part of the FP2030 initiative. Family planning measures for low- and middle-income countries are typically based on data collected through cross-sectional household surveys. Recently proposed measures account for sexual activity through assessment of the d…
▽ More
Aggregate measures of family planning are used to monitor demand for and usage of contraceptive methods in populations globally, for example as part of the FP2030 initiative. Family planning measures for low- and middle-income countries are typically based on data collected through cross-sectional household surveys. Recently proposed measures account for sexual activity through assessment of the distribution of time-between-sex (TBS) in the population of interest. In this paper, we propose a statistical approach to estimate the distribution of TBS using data typically available in low- and middle-income countries, while addressing two major challenges. The first challenge is that timing of sex information is typically limited to women's time-since-last-sex (TSLS) data collected in the cross-sectional survey. In our proposed approach, we adopt the current duration method to estimate the distribution of TBS using the available TSLS data, from which the frequency of sex at the population level can be derived. Furthermore, the observed TSLS data are subject to reporting issues because they can be reported in different units and may be rounded off. To apply the current duration approach and account for these data reporting issues, we develop a flexible Bayesian model, and provide a detailed technical description of the proposed modeling approach.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Flexible Modeling of Demographic Transition Processes with a Bayesian Hierarchical B-splines Model
Authors:
Herbert Susmann,
Leontine Alkema
Abstract:
Several demographic and health indicators, including the total fertility rate (TFR) and modern contraceptive use rate (mCPR), evolve similarly over time, characterized by a transition between stable states. Existing approaches for estimation or projection of transitions in multiple populations have successfully used parametric functions to capture the relation between the rate of change of an indi…
▽ More
Several demographic and health indicators, including the total fertility rate (TFR) and modern contraceptive use rate (mCPR), evolve similarly over time, characterized by a transition between stable states. Existing approaches for estimation or projection of transitions in multiple populations have successfully used parametric functions to capture the relation between the rate of change of an indicator and its level. However, incorrect parametric forms may result in bias or incorrect coverage in long-term projections. We propose a new class of models to capture demographic transitions in multiple populations. Our proposal, the B-spline Transition Model (BTM), models the relationship between the rate of change of an indicator and its level using B-splines, allowing for data-adaptive estimation of transition functions. Bayesian hierarchical models are used to share information on the transition function between populations. We apply the BTM to estimate and project country-level TFR and mCPR and compare the results against those from extant parametric models. For TFR, BTM projections have generally lower error than the comparison model. For mCPR, while results are comparable between BTM and a parametric approach, the B-spline model generally improves out-of-sample predictions. The case studies suggest that the BTM may be considered for demographic applications
△ Less
Submitted 24 May, 2024; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Temporal models for demographic and global health outcomes in multiple populations: Introducing a new framework to review and standardize documentation of model assumptions and facilitate model comparison
Authors:
Herbert Susmann,
Monica Alexander,
Leontine Alkema
Abstract:
There is growing interest in producing estimates of demographic and global health indicators in populations with limited data. Statistical models are needed to combine data from multiple data sources into estimates and projections with uncertainty. Diverse modeling approaches have been applied to this problem, making comparisons between models difficult. We propose a model class, Temporal Models f…
▽ More
There is growing interest in producing estimates of demographic and global health indicators in populations with limited data. Statistical models are needed to combine data from multiple data sources into estimates and projections with uncertainty. Diverse modeling approaches have been applied to this problem, making comparisons between models difficult. We propose a model class, Temporal Models for Multiple Populations (TMMPs), to facilitate documentation of model assumptions in a standardized way and comparison across models. The class distinguishes between latent trends and the observed data, which may be noisy or exhibit systematic biases. We provide general formulations of the process model, which describes the latent trend of the indicator of interest. We show how existing models for a variety of indicators can be written as TMMPs and how the TMMP-based description can be used to compare and contrast model assumptions. We end with a discussion of outstanding questions and future directions.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.