-
Bayesian Linear Models: A compact general set of results
Authors:
J Andres Christen
Abstract:
I present all the details in calculating the posterior distribution of the conjugate Normal-Gamma prior in Bayesian Linear Models (BLM), including correlated observations, prediction, model selection and comments on efficient numeric implementations. A Python implementation is also presented. These have been presented and available in many books and texts but, I believe, a general compact and simp…
▽ More
I present all the details in calculating the posterior distribution of the conjugate Normal-Gamma prior in Bayesian Linear Models (BLM), including correlated observations, prediction, model selection and comments on efficient numeric implementations. A Python implementation is also presented. These have been presented and available in many books and texts but, I believe, a general compact and simple presentation is always welcome and not always simple to find. Since correlated observations are also included, these results may also be useful for time series analysis and spacial statistics. Other particular cases presented include regression, Gaussian processes and Bayesian Dynamic Models.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
A Physics Based Surrogate Model in Bayesian Uncertainty Quantification involving Elliptic PDEs
Authors:
A. Galaviz,
J. A. Christen,
A. Capella
Abstract:
The paper addresses Bayesian inferences in inverse problems with uncertainty quantification involving a computationally expensive forward map associated with solving a partial differential equations. To mitigate the computational cost, the paper proposes a new surrogate model informed by the physics of the problem, specifically when the forward map involves solving a linear elliptic partial differ…
▽ More
The paper addresses Bayesian inferences in inverse problems with uncertainty quantification involving a computationally expensive forward map associated with solving a partial differential equations. To mitigate the computational cost, the paper proposes a new surrogate model informed by the physics of the problem, specifically when the forward map involves solving a linear elliptic partial differential equation. The study establishes the consistency of the posterior distribution for this surrogate model and demonstrates its effectiveness through numerical examples with synthetic data. The results indicate a substantial improvement in computational speed, reducing the processing time from several months with the exact forward map to a few minutes, while maintaining negligible loss of accuracy in the posterior distribution.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Climate-sensitive Urban Planning through Optimization of Tree Placements
Authors:
Simon Schrodi,
Ferdinand Briegel,
Max Argus,
Andreas Christen,
Thomas Brox
Abstract:
Climate change is increasing the intensity and frequency of many extreme weather events, including heatwaves, which results in increased thermal discomfort and mortality rates. While global mitigation action is undoubtedly necessary, so is climate adaptation, e.g., through climate-sensitive urban planning. Among the most promising strategies is harnessing the benefits of urban trees in shading and…
▽ More
Climate change is increasing the intensity and frequency of many extreme weather events, including heatwaves, which results in increased thermal discomfort and mortality rates. While global mitigation action is undoubtedly necessary, so is climate adaptation, e.g., through climate-sensitive urban planning. Among the most promising strategies is harnessing the benefits of urban trees in shading and cooling pedestrian-level environments. Our work investigates the challenge of optimal placement of such trees. Physical simulations can estimate the radiative and thermal impact of trees on human thermal comfort but induce high computational costs. This rules out optimization of tree placements over large areas and considering effects over longer time scales. Hence, we employ neural networks to simulate the point-wise mean radiant temperatures--a driving factor of outdoor human thermal comfort--across various time scales, spanning from daily variations to extended time scales of heatwave events and even decades. To optimize tree placements, we harness the innate local effect of trees within the iterated local search framework with tailored adaptations. We show the efficacy of our approach across a wide spectrum of study areas and time scales. We believe that our approach is a step towards empowering decision-makers, urban designers and planners to proactively and effectively assess the potential of urban trees to mitigate heat stress.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Dynamic survival analysis: modelling the hazard function via ordinary differential equations
Authors:
J. A. Christen,
F. J. Rubio
Abstract:
The hazard function represents one of the main quantities of interest in the analysis of survival data. We propose a general approach for parametrically modelling the dynamics of the hazard function using systems of autonomous ordinary differential equations (ODEs). This modelling approach can be used to provide qualitative and quantitative analyses of the evolution of the hazard function over tim…
▽ More
The hazard function represents one of the main quantities of interest in the analysis of survival data. We propose a general approach for parametrically modelling the dynamics of the hazard function using systems of autonomous ordinary differential equations (ODEs). This modelling approach can be used to provide qualitative and quantitative analyses of the evolution of the hazard function over time. Our proposal capitalises on the extensive literature of ODEs which, in particular, allow for establishing basic rules or laws on the dynamics of the hazard function via the use of autonomous ODEs. We show how to implement the proposed modelling framework in cases where there is an analytic solution to the system of ODEs or where an ODE solver is required to obtain a numerical solution. We focus on the use of a Bayesian modelling approach, but the proposed methodology can also be coupled with maximum likelihood estimation. A simulation study is presented to illustrate the performance of these models and the interplay of sample size and censoring. Two case studies using real data are presented to illustrate the use of the proposed approach and to highlight the interpretability of the corresponding models. We conclude with a discussion on potential extensions of our work and strategies to include covariates into our framework. Although we focus on examples on Medical Statistics, the proposed framework is applicable in any context where the interest lies on estimating and interpreting the dynamics hazard function.
△ Less
Submitted 25 May, 2024; v1 submitted 9 August, 2023;
originally announced August 2023.
-
A surrogate model for studying random field energy release rates in 2D brittle fractures
Authors:
Luis Blanco-Cocom,
Marcos A. Capistrán,
Jaroslaw Knap,
J. Andrés Christen
Abstract:
This article proposes a weighted-variational model as an approximated surrogate model to lessen numerical complexity and lower the execution times of brittle fracture simulations. Consequently, Monte Carlo studies of brittle fractures become possible when energy release rates are modelled as a random field. In the weighed-variational model, we propose applying a Gaussian random field with a Matérn…
▽ More
This article proposes a weighted-variational model as an approximated surrogate model to lessen numerical complexity and lower the execution times of brittle fracture simulations. Consequently, Monte Carlo studies of brittle fractures become possible when energy release rates are modelled as a random field. In the weighed-variational model, we propose applying a Gaussian random field with a Matérn covariance function to simulate a non-homogeneous energy release rate ($G_c$) of a material. Numerical solutions to the weighed-variational model, along with the more standard but computationally demanding hybrid phase-field models, are obtained using the FEniCS open-source software. The results have indicated that the weighted-variational model is a competitive surrogate model of the hybrid phase-field method to mimic brittle fractures in real structures. This method reduces execution times by 90\%. We conducted a similar study and compared our results with an actual brittle fracture laboratory experiment. We present an example where a Monte Carlo study is carried out, modeling $G_c$ as a Gaussian Process, obtaining a distribution of possible fractures, and load-displacement curves.
△ Less
Submitted 29 February, 2024; v1 submitted 16 May, 2021;
originally announced May 2021.
-
Analytical Solutions for Radiation-Driven Winds in Massive Stars II: The $δ$-slow Regime
Authors:
I. Araya,
A. Christen,
M. Curé,
L. S. Cidale,
R. O. J. Venero,
C. Arcos,
A. C. Gormaz-Matamala,
M. Haucke,
P. Escárate,
H. Clavería
Abstract:
Accurate mass-loss rates and terminal velocities from massive stars winds are essential to obtain synthetic spectra from radiative transfer calculations and to determine the evolutionary path of massive stars. From a theoretical point of view, analytical expressions for the wind parameters and velocity profile would have many advantages over numerical calculations that solve the complex non-linear…
▽ More
Accurate mass-loss rates and terminal velocities from massive stars winds are essential to obtain synthetic spectra from radiative transfer calculations and to determine the evolutionary path of massive stars. From a theoretical point of view, analytical expressions for the wind parameters and velocity profile would have many advantages over numerical calculations that solve the complex non-linear set of hydrodynamic equations. In a previous work, we obtained an analytical description for the fast wind regime. Now, we propose an approximate expression for the line-force in terms of new parameters and obtain a velocity profile closed-form solution (in terms of the Lambert $W$ function) for the $δ$-slow regime. Using this analytical velocity profile, we were able to obtain the mass-loss rates based on the m-CAK theory. Moreover, we established a relation between this new set of line-force parameters with the known stellar and m-CAK line-force parameters. To this purpose, we calculated a grid of numerical hydrodynamical models and performed a multivariate multiple regression. The numerical and our descriptions lead to good agreement between their values.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Bayesian sequential data assimilation for COVID-19 forecasting
Authors:
Maria L. Daza-Torres,
Marcos A. Capistrán,
Antonio Capella,
J. Andrés Christen
Abstract:
We introduce a Bayesian sequential data assimilation method for COVID-19 forecasting. It is assumed that suitable transmission, epidemic and observation models are available and previously validated and the transmission and epidemic models are coded into a dynamical system. The observation model depends on the dynamical system state variables and parameters, and is cast as a likelihood function. W…
▽ More
We introduce a Bayesian sequential data assimilation method for COVID-19 forecasting. It is assumed that suitable transmission, epidemic and observation models are available and previously validated and the transmission and epidemic models are coded into a dynamical system. The observation model depends on the dynamical system state variables and parameters, and is cast as a likelihood function. We elicit prior distributions of the effective population size, the dynamical system initial conditions and infectious contact rate, and use Markov Chain Monte Carlo sampling to make inference and prediction of quantities of interest (QoI) at the onset of the epidemic outbreak. The forecast is sequentially updated over a sliding window of epidemic records as new data becomes available. Prior distributions for the state variables at the new forecasting time are assembled using the dynamical system, calibrated for the previous forecast. Moreover, changes in the contact rate and effective population size are naturally introduced through auto-regressive models on the corresponding parameters. We show our forecasting method's performance using a SEIR type model and COVID-19 data from several Mexican localities.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
A simulation study to compare 210Pb dating data analyses
Authors:
Marco A Aquino-López,
Nicole K. Sanderson,
Maarten Blaauw,
Joan-Albert Sanchez-Cabeza,
Ana Carolina Ruiz-Fernandez,
J Andrés Christen Marco A Aquino-López,
Nicole K. Sanderson,
Maarten Blaauw,
Joan-Albert Sanchez-Cabeza,
Ana Carolina Ruiz-Fernandez,
J Andrés Christen
Abstract:
The increasing interest in understanding anthropogenic impacts on the environment have led to a considerable number of studies focusing on sedimentary records for the last $\sim$ 100 - 200 years. Dating this period is often complicated by the poor resolution and large errors associated with radiocarbon (14C) ages, which is the most popular dating technique. To improve age-depth model resolution fo…
▽ More
The increasing interest in understanding anthropogenic impacts on the environment have led to a considerable number of studies focusing on sedimentary records for the last $\sim$ 100 - 200 years. Dating this period is often complicated by the poor resolution and large errors associated with radiocarbon (14C) ages, which is the most popular dating technique. To improve age-depth model resolution for the recent period, sediment dating with lead-210 ($^{210}$Pb) is widely used as it provides absolute and continuous dates for the last $\sim$ 100 - 150 years. The $^{210}$Pb dating method has traditionally relied on the Constant Rate of Supply (CRS, also known as Constant Flux - CF) model which uses the radioactive decay equation as an age-depth relationship resulting in a restrictive model to approximate dates. In this work, we compare the classical approach to $^{210}$Pb dating (CRS) and its Bayesian alternative (\textit{Plum}). To do so, we created simulated $^{210}$Pb profiles following three different sedimentation processes, complying with the assumptions imposed by the CRS model, and analysed them using both approaches. Results indicate that the CRS model does not capture the true values even with a high dating resolution for the sediment, nor improves does its accuracy improve as more information is available. On the other hand, the Bayesian alternative (\textit{Plum}) provides consistently more accurate results even with few samples, and its accuracy and precision constantly improves as more information is available.
△ Less
Submitted 12 December, 2020;
originally announced December 2020.
-
Uncertainty quantification for fault slip inversion
Authors:
J. Cricelio Montesinos-López,
Antonio Capella,
J. Andrés Christen,
Josué Tago
Abstract:
We propose an efficient Bayesian approach to infer a fault displacement from geodetic data in a slow slip event. Our physical model of the slip process reduces to a multiple linear regression subject to constraints. Assuming a Gaussian model for the geodetic data and considering a multivariate truncated normal prior distribution for the unknown fault slip, the resulting posterior distribution is a…
▽ More
We propose an efficient Bayesian approach to infer a fault displacement from geodetic data in a slow slip event. Our physical model of the slip process reduces to a multiple linear regression subject to constraints. Assuming a Gaussian model for the geodetic data and considering a multivariate truncated normal prior distribution for the unknown fault slip, the resulting posterior distribution is also multivariate truncated normal. Regarding the posterior, we propose an algorithm based on Optimal Directional Gibbs that allows us to efficiently sample from the resulting high-dimensional posterior distribution of along dip and along strike movements of our fault grid division. A synthetic fault slip example illustrates the flexibility and accuracy of the proposed approach. The methodology is also applied to a real data set, for the 2006 Guerrero, Mexico, Slow Slip Event, where the objective is to recover the fault slip on a known interface that produces displacements observed at ground geodetic stations. As a by-product of our approach, we are able to estimate moment magnitude for the 2006 Guerrero Event with uncertainty quantification.
△ Less
Submitted 19 March, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Penalised t-walk MCMC
Authors:
Felipe J Medina-Aguayo,
J Andrés Christen
Abstract:
Handling multimodality that commonly arises from complicated statistical models remains a challenge. Current Markov chain Monte Carlo (MCMC) methodology tackling this subject is based on an ensemble of chains targeting a product of power-tempered distributions. Despite the theoretical validity of such methods, practical implementations typically suffer from bad mixing and slow convergence due to t…
▽ More
Handling multimodality that commonly arises from complicated statistical models remains a challenge. Current Markov chain Monte Carlo (MCMC) methodology tackling this subject is based on an ensemble of chains targeting a product of power-tempered distributions. Despite the theoretical validity of such methods, practical implementations typically suffer from bad mixing and slow convergence due to the high-computation cost involved. In this work we study novel extensions of the t-walk algorithm, an existing MCMC method that is inexpensive and invariant to affine transformations of the state space, for dealing with multimodal distributions. We acknowledge that the effectiveness of the new method will be problem dependent and might struggle in complex scenarios; for such cases we propose a post-processing technique based on pseudo-marginal theory for combining isolated samples.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Filtering and improved Uncertainty Quantification in the dynamic estimation of effective reproduction numbers
Authors:
Marcos A. Capistrán,
Antonio Capella,
J. Andrés Christen
Abstract:
The effective reproduction number $R_t$ measures an infectious disease's transmissibility as the number of secondary infections in one reproduction time in a population having both susceptible and non-susceptible hosts. Current approaches do not quantify the uncertainty correctly in estimating $R_t$, as expected by the observed variability in contagion patterns. We elaborate on the Bayesian estima…
▽ More
The effective reproduction number $R_t$ measures an infectious disease's transmissibility as the number of secondary infections in one reproduction time in a population having both susceptible and non-susceptible hosts. Current approaches do not quantify the uncertainty correctly in estimating $R_t$, as expected by the observed variability in contagion patterns. We elaborate on the Bayesian estimation of $R_t$ by improving on the Poisson sampling model of Cori et al. (2013). By adding an autoregressive latent process, we build a Dynamic Linear Model on the log of observed $R_t$s, resulting in a filtering type Bayesian inference. We use a conjugate analysis, and all calculations are explicit. Results show an improved uncertainty quantification on the estimation of $R_t$'s, with a reliable method that could safely be used by non-experts and within other forecasting systems. We illustrate our approach with recent data from the current COVID19 epidemic in Mexico.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Forecasting hospital demand during COVID-19 pandemic outbreaks
Authors:
Marcos A. Capistran,
Antonio Capella,
J. Andres Christen
Abstract:
We present a compartmental SEIRD model aimed at forecasting hospital occupancy in metropolitan areas during the current COVID-19 outbreak. The model features asymptomatic and symptomatic infections with detailed hospital dynamics. We model explicitly branching probabilities and non exponential residence times in each latent and infected compartments. Using both hospital admittance confirmed cases…
▽ More
We present a compartmental SEIRD model aimed at forecasting hospital occupancy in metropolitan areas during the current COVID-19 outbreak. The model features asymptomatic and symptomatic infections with detailed hospital dynamics. We model explicitly branching probabilities and non exponential residence times in each latent and infected compartments. Using both hospital admittance confirmed cases and deaths we infer the contact rate and the initial conditions of the dynamical system, considering break points to model lockdown interventions. Our Bayesian approach allows us to produce timely probabilistic forecasts of hospital demand. The model has been used by the federal government of Mexico to assist public policy, and has been applied for the analysis of more than 70 metropolitan areas and the 32 states in the country.
△ Less
Submitted 5 June, 2020; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Systematic statistical analysis of microbial data from dilution series
Authors:
J Andrés Christen,
Al Parker
Abstract:
In microbial studies, samples are often treated under different experimental conditions and then tested for microbial survival. A technique, dating back to the 1880's, consists of diluting the samples several times and incubating each dilution to verify the existence of microbial Colony Forming Units or CFU's, seen by the naked eye. The main problem in the dilution series data analysis is the unce…
▽ More
In microbial studies, samples are often treated under different experimental conditions and then tested for microbial survival. A technique, dating back to the 1880's, consists of diluting the samples several times and incubating each dilution to verify the existence of microbial Colony Forming Units or CFU's, seen by the naked eye. The main problem in the dilution series data analysis is the uncertainty quantification of the simple point estimate of the original number of CFU's in the sample (i.e., at dilution zero). Common approaches such as log-normal or Poisson models do not seem to handle well extreme cases with low or high counts, among other issues. We build a novel binomial model, based on the actual design of the experimental procedure including the dilution series. For repetitions we construct a hierarchical model for experimental results from a single lab and in turn a higher hierarchy for inter-lab analyses. Results seem promising, with a systematic treatment of all data cases, including zeros, censored data, repetitions, intra and inter-laboratory studies. Using a Bayesian approach, a robust and efficient MCMC method is used to analyze several real data sets.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Error control in the numerical posterior distribution in the Bayesian UQ analysis of a semilinear evolution PDE
Authors:
Maria L. Daza-Torres,
J. Cricelio Montesinos-López,
Marcos A. Capistrán,
J. Andrés Christen,
Heikki Haario
Abstract:
We elaborate on results obtained in \cite{christen2018} for controlling the numerical posterior error for Bayesian UQ problems, now considering forward maps arising from the solution of a semilinear evolution partial differential equation. Results in \cite{christen2018} demand an estimate for the absolute global error (AGE) of the numeric forward map. Our contribution is a numerical method for com…
▽ More
We elaborate on results obtained in \cite{christen2018} for controlling the numerical posterior error for Bayesian UQ problems, now considering forward maps arising from the solution of a semilinear evolution partial differential equation. Results in \cite{christen2018} demand an estimate for the absolute global error (AGE) of the numeric forward map. Our contribution is a numerical method for computing the AGE for semilinear evolution PDEs and shows the potential applicability of \cite{christen2018} in this important wide range family of PDEs. Numerical examples are given to illustrate the efficiency of the proposed method, obtaining numerical posterior distributions for unknown parameters that are nearly identical to the corresponding theoretical posterior, by kee** their Bayes factor close to 1.
△ Less
Submitted 5 November, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
AutoRegressive Planet Search: Application to the Kepler Mission
Authors:
Gabriel A. Caceres,
Eric D. Feigelson,
G. Jogesh Babu,
Natalia Bahamonde,
Alejandra Christen,
Karine Bertin,
Cristian Meza,
Michel Curé
Abstract:
The 4-year light curves of 156,717 stars observed with NASA's Kepler mission are analyzed using the AutoRegressive Planet Search (ARPS) methodology described by Caceres et al. (2019). The three stages of processing are: maximum likelihood ARIMA modeling of the light curves to reduce stellar brightness variations; constructing the Transit Comb Filter periodogram to identify transit-like periodic di…
▽ More
The 4-year light curves of 156,717 stars observed with NASA's Kepler mission are analyzed using the AutoRegressive Planet Search (ARPS) methodology described by Caceres et al. (2019). The three stages of processing are: maximum likelihood ARIMA modeling of the light curves to reduce stellar brightness variations; constructing the Transit Comb Filter periodogram to identify transit-like periodic dips in the ARIMA residuals; Random Forest classification trained on Kepler Team confirmed planets using several dozen features from the analysis. Orbital periods between 0.2 and 100 days are examined. The result is a recovery of 76% of confirmed planets, 97% when period and transit depth constraints are added. The classifier is then applied to the full Kepler dataset; 1,004 previously noticed and 97 new stars have light curve criteria consistent with the confirmed planets, after subjective vetting removes clear False Alarms and False Positive cases. The 97 Kepler ARPS Candidate Transits mostly have periods $P<10$ days; many are UltraShort Period hot planets with radii $<1$% of the host star. Extensive tabular and graphical output from the ARPS time series analysis is provided to assist in other research relating to the Kepler sample.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
FATSO: A family of operators for variable selection in linear models
Authors:
Nicolás E. Kuschinski,
J. Andrés Christen
Abstract:
In linear models it is common to have situations where several regression coefficients are zero. In these situations a common tool to perform regression is a variable selection operator. One of the most common such operators is the LASSO operator, which promotes point estimates which are zero. The LASSO operator and similar approaches, however, give little in terms of easily interpretable paramete…
▽ More
In linear models it is common to have situations where several regression coefficients are zero. In these situations a common tool to perform regression is a variable selection operator. One of the most common such operators is the LASSO operator, which promotes point estimates which are zero. The LASSO operator and similar approaches, however, give little in terms of easily interpretable parameters to determine the degree of variable selectivity. In this paper we propose a new family of selection operators which builds on the geometry of LASSO but which yield an easily interpretable way to tune selectivity. These operators correspond to Bayesian prior densities and hence are suitable for Bayesian inference. We present some examples using simulated and real data, with promising results.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
Bayesian Experimental Design for Oral Glucose Tolerance Tests (OGTT)
Authors:
Nicolás E. Kuschinski,
J. Andrés Christen,
Adriana Monroy,
Silvestre Alavez
Abstract:
OGTT is a common test, frequently used to diagnose insulin resistance or diabetes, in which a patient's blood sugar is measured at various times over the course of a few hours. Recent developments in the study of OGTT results have framed it as an inverse problem which has been the subject of Bayesian inference. This is a powerful new tool for analyzing the results of an OGTT test,and the question…
▽ More
OGTT is a common test, frequently used to diagnose insulin resistance or diabetes, in which a patient's blood sugar is measured at various times over the course of a few hours. Recent developments in the study of OGTT results have framed it as an inverse problem which has been the subject of Bayesian inference. This is a powerful new tool for analyzing the results of an OGTT test,and the question arises as to whether the test itself can be improved. It is of particular interest to discover whether the times at which a patient's glucose is measured can be changed to improve the effectiveness of the test. The purpose of this paper is to explore the possibility of finding a better experimental design, that is, a set of times to perform the test. We review the theory of Bayesian experimental design and propose an estimator for the expected utility of a design. We then study the properties of this estimator and propose a new method for quantifying the uncertainty in comparisons between designs. We implement this method to find a new design and the proposed design is compared favorably to the usual testing scheme.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
A method to deconvolve stellar rotational velocities III. The probability distribution function via Maximum Likelihood utilizing Finite Distribution Mixtures
Authors:
Rafael Orellana,
Pedro Escarate,
Michel Cure,
Alejandra Christen,
Rodrigo Carvajal,
Juan Carlos Agüero
Abstract:
The study of accurate methods to estimate the distribution of stellar rotational velocities is important for understanding many aspects of stellar evolution. From such observations we obtain the projected rotational speed v sin(i) in order to recover the true distribution of the rotational velocity. To that end, we need to solve a difficult inverse problem that can be posed as a Fredholm integral…
▽ More
The study of accurate methods to estimate the distribution of stellar rotational velocities is important for understanding many aspects of stellar evolution. From such observations we obtain the projected rotational speed v sin(i) in order to recover the true distribution of the rotational velocity. To that end, we need to solve a difficult inverse problem that can be posed as a Fredholm integral of the first kind. n this work we have used a novel approach based on Maximum likelihood (ML) estimation to obtain an approximation of the true rotational velocity probability density function expressed as a sum of known distribution families. In our proposal, the measurements have been treated as random variables drawn from the projected rotational velocity probability density function. We analyzed the case of Maxwellian sum approximation, where we estimated the parameters that define the sum of distributions. The performance of the proposed method is analyzed using Monte Carlo simulations considering two theoretical cases for the probability density function of the true rotational stellar velocities: i) an unimodal Maxwellian probability density distribution and ii) a bimodal Maxwellian probability density distribution. The results show that the proposed method yielded more accurate estimates in comparison with the Tikhonov regularization method, especially for small sample length N=50. Our proposal was evaluated using real data from three sets of measurements, and our findings were validated using three statistical tests. The ML approach with Maxwellian sum approximation is a accurate method to deconvolve the rotational velocity probability density function, even when the sample length is small (N= 50)
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
AutoRegressive Planet Search: Methodology
Authors:
Gabriel A. Caceres,
Eric D. Feigelson,
G. Jogesh Babu,
Natalia Bahamonde,
Alejandra Christen,
Karine Bertin,
Cristian Meza,
Michel Curé
Abstract:
The detection of periodic signals from transiting exoplanets is often impeded by extraneous aperiodic photometric variability, either intrinsic to the star or arising from the measurement process. Frequently, these variations are autocorrelated wherein later flux values are correlated with previous ones. In this work, we present the methodology of the Autoregessive Planet Search (ARPS) project whi…
▽ More
The detection of periodic signals from transiting exoplanets is often impeded by extraneous aperiodic photometric variability, either intrinsic to the star or arising from the measurement process. Frequently, these variations are autocorrelated wherein later flux values are correlated with previous ones. In this work, we present the methodology of the Autoregessive Planet Search (ARPS) project which uses Autoregressive Integrated Moving Average (ARIMA) and related statistical models that treat a wide variety of stochastic processes, as well as nonstationarity, to improve detection of new planetary transits. Providing a time series is evenly spaced or can be placed on an evenly spaced grid with missing values, these low-dimensional parametric models can prove very effective. We introduce a planet-search algorithm to detect periodic transits in the residuals after the application of ARIMA models. Our matched-filter algorithm, the Transit Comb Filter (TCF), is closely related to the traditional Box-fitting Least Squares and provides an analogous periodogram. Finally, if a previously identified or simulated sample of planets is available, selected scalar features from different stages of the analysis -- the original light curves, ARIMA fits, TCF periodograms, and folded light curves -- can be collectively used with a multivariate classifier to identify promising candidates while efficiently rejecting false alarms. We use Random Forests for this task, in conjunction with Receiver Operating Characteristic (ROC) curves, to define discovery criteria for new, high fidelity planetary candidates. The ARPS methodology can be applied to both evenly spaced satellite light curves and densely cadenced ground-based photometric surveys.
△ Less
Submitted 14 May, 2019; v1 submitted 15 January, 2019;
originally announced January 2019.
-
A computational geometry method for the inverse scattering problem
Authors:
Maria L. Daza-Torres,
Juan Antonio Infante del Río,
Marcos A. Capistrán,
J. Andrés Christen
Abstract:
In this paper we demonstrate a computational method to solve the inverse scattering problem for a star-shaped, smooth, penetrable obstacle in 2D. Our method is based on classical ideas from computational geometry. First, we approximate the support of a scatterer by a point cloud. Secondly, we use the Bayesian paradigm to model the joint conditional probability distribution of the non-convex hull o…
▽ More
In this paper we demonstrate a computational method to solve the inverse scattering problem for a star-shaped, smooth, penetrable obstacle in 2D. Our method is based on classical ideas from computational geometry. First, we approximate the support of a scatterer by a point cloud. Secondly, we use the Bayesian paradigm to model the joint conditional probability distribution of the non-convex hull of the point cloud and the constant refractive index of the scatterer given near field data. Of note, we use the non-convex hull of the point cloud as spline control points to evaluate, on a finer mesh, the volume potential arising in the integral equation formulation of the direct problem. Finally, in order to sample the arising posterior distribution, we propose a probability transition kernel that commutes with affine transformations of space. Our findings indicate that our method is reliable to retrieve the support and constant refractive index of the scatterer simultaneously. Indeed, our sampling method is robust to estimate a quantity of interest such as the area of the scatterer. We conclude pointing out a series of generalizations of our method.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
Posterior distribution existence and error control in Banach spaces in the Bayesian approach to UQ in inverse problems
Authors:
J. Andrés Christen,
Marcos A. Capistrán,
M. Luisa Daza-Torres,
Hugo Flores-Argüedas,
J. Cricelio Montesinos-López
Abstract:
We generalize the results of \cite{Capistran2016} on expected Bayes factors (BF) to control the numerical error in the posterior distribution to an infinite dimensional setting when considering Banach functional spaces and now in a prior setting. The main result is a bound on the absolute global error to be tolerated by the Forward Map numerical solver, to keep the BF of the numerical vs. the theo…
▽ More
We generalize the results of \cite{Capistran2016} on expected Bayes factors (BF) to control the numerical error in the posterior distribution to an infinite dimensional setting when considering Banach functional spaces and now in a prior setting. The main result is a bound on the absolute global error to be tolerated by the Forward Map numerical solver, to keep the BF of the numerical vs. the theoretical model near to 1, now in this more general setting, possibly including a truncated, finite dimensional approximate prior measure. In so doing we found a far more general setting to define and prove existence of the infinite dimensional posterior distribution than that depicted in, for example, \cite{Stuart2010}. Discretization consistency and rates of convergence are also investigated in this general setting for the Bayesian inverse problem.
△ Less
Submitted 12 October, 2018; v1 submitted 8 December, 2017;
originally announced December 2017.
-
Bayesian analysis of 210Pb dating
Authors:
Marco A Aquino-López,
Maarten Blaauw,
J Andrés Christen,
Nicole K. Sanderson
Abstract:
In many studies of environmental change of the past few centuries, 210Pb dating is used to obtain chronologies for sedimentary sequences. One of the most commonly used approaches to estimate the ages of depths in a sequence is to assume a constant rate of supply (CRS) or influx of `unsupported' 210Pb from the atmosphere, together with a constant or varying amount of `supported' 210Pb. Current 210P…
▽ More
In many studies of environmental change of the past few centuries, 210Pb dating is used to obtain chronologies for sedimentary sequences. One of the most commonly used approaches to estimate the ages of depths in a sequence is to assume a constant rate of supply (CRS) or influx of `unsupported' 210Pb from the atmosphere, together with a constant or varying amount of `supported' 210Pb. Current 210Pb dating models do not use a proper statistical framework and thus provide poor estimates of errors. Here we develop a new model for 210Pb dating, where both ages and values of supported and unsupported 210Pb form part of the parameters. We apply our model to a case study from Canada as well as to some simulated examples. Our model can extend beyond the current CRS approach, deal with asymmetric errors and mix 210Pb with other types of dating, thus obtaining more robust, realistic and statistically better defined estimates.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Sampling hyperparameters in hierarchical models: improving on Gibbs for high-dimensional latent fields and large data sets
Authors:
Richard A. Norton,
J. Andres Christen,
Colin Fox
Abstract:
We consider posterior sampling in the very common Bayesian hierarchical model in which observed data depends on high-dimensional latent variables that, in turn, depend on relatively few hyperparameters. When the full conditional over the latent variables has a known form, the marginal posterior distribution over hyperparameters is accessible and can be sampled using a Markov chain Monte Carlo (MCM…
▽ More
We consider posterior sampling in the very common Bayesian hierarchical model in which observed data depends on high-dimensional latent variables that, in turn, depend on relatively few hyperparameters. When the full conditional over the latent variables has a known form, the marginal posterior distribution over hyperparameters is accessible and can be sampled using a Markov chain Monte Carlo (MCMC) method on a low-dimensional parameter space. This may improve computational efficiency over standard Gibbs sampling since computation is not over the high-dimensional space of latent variables and correlations between hyperparameters and latent variables become irrelevant. When the marginal posterior over hyperparameters depends on a fixed-dimensional sufficient statistic, precomputation of the sufficient statistic renders the cost of the low-dimensional MCMC independent of data size. Then, when the hyperparameters are the primary variables of interest, inference may be performed in big-data settings at modest cost. Moreover, since the form of the full conditional for the latent variables does not depend on the form of the hyperprior distribution, the method imposes no restriction on the hyperprior, unlike Gibbs sampling that typically requires conjugate distributions. We demonstrate these efficiency gains in four computed examples.
△ Less
Submitted 20 October, 2016;
originally announced October 2016.
-
A method to deconvolve stellar rotational velocities II
Authors:
A. Christen,
P. Escarate,
M. Cure,
D. F. Rial,
J. Cassetti
Abstract:
Knowing the distribution of stellar rotational velocities is essential for the understanding stellar evolution. Because we measure the projected rotational speed vsini, we need to solve an ill-posed problem given by a Fredholm integral of the first kind to recover the true rotational velocity distribution. After discretization of the Fredholm integral, we apply the Tikhonov regularization method t…
▽ More
Knowing the distribution of stellar rotational velocities is essential for the understanding stellar evolution. Because we measure the projected rotational speed vsini, we need to solve an ill-posed problem given by a Fredholm integral of the first kind to recover the true rotational velocity distribution. After discretization of the Fredholm integral, we apply the Tikhonov regularization method to obtain directly the probability distribution function for stellar rotational velocities. We propose a simple and straightforward procedure to determine the Tikhonov parameter. We applied Monte Carlo simulations to prove that Tikhonov method is a consistent estimator and asymptotically unbiased. This method is applied to a sample of cluster stars. We obtain confidences intervals using bootstrap method. Our results are in good agreement with the one obtained using the Lucy method, in recovering the probability density distribution of rotational velocities. Furthermore, Lucy estimation lies inside our confidence interval. Tikhonov regularization is a very robust method that deconvolve the rotational velocity probability density function from a sample of vsini data straightforward without needing any convergence criteria.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.
-
Numerical posterior distribution error control and expected Bayes Factors in the bayesian Uncertainty Quantification of Inverse Problems
Authors:
J. Andrés Christen,
Marcos A. Capistrán,
Miguel Ángel Moreles
Abstract:
In the bayesian analysis of Inverse Problems most relevant cases the forward maps (FM, or regressor function) are defined in terms of a system of (O, P)DE's with intractable solutions. These necessarily involve a numerical method to find approximate versions of such solutions and lead to a numerical/approximate posterior distribution. Recently several results have been published on the regularity…
▽ More
In the bayesian analysis of Inverse Problems most relevant cases the forward maps (FM, or regressor function) are defined in terms of a system of (O, P)DE's with intractable solutions. These necessarily involve a numerical method to find approximate versions of such solutions and lead to a numerical/approximate posterior distribution. Recently several results have been published on the regularity conditions required on such numerical methods to ensure converge of the numerical to the theoretical posterior. However, more practical guidelines are needed to ensure a suitable working numerical posterior. ]Capistran2016] prove for ODE's that the Bayes Factor of the approximate vs the theoretical model tends to 1 in the same order as the numerical method order. In this work we generalize the latter paper in that we consider 1) also PDE's, 2) correlated observations, 3) practical guidelines in a multidimensional setting and 4) explore the use of expected Bayes Factors. This permits us to obtain bounds on the absolute global errors to be tolerated by the FM numerical solver, which we illustrate with some examples. Since the Bayes Factor is kept above 0.95 we expect that the resulting numerical posterior is basically indistinguishable from the theoretical posterior, even though we are using an approximate numerical FM. The method is illustrated with some examples using synthetic data.
△ Less
Submitted 29 August, 2017; v1 submitted 7 July, 2016;
originally announced July 2016.
-
Modeling Oral Glucose Tolerance Test (OGTT) data and its Bayesian Inverse Problem
Authors:
Nicolás Kuschinski,
J. Andrés Christen,
Adriana Monroy,
Silvestre Alavez
Abstract:
One common way to test for diabetes is the Oral Glucose Tolerance Test or OGTT. Most common methods for the analysis of the data on this test are wasteful of much of the information contained therein. We propose to model blood glucose during an OGTT using a compartmental dynamic model with a system of ODEs. Our model works well in describing most scenarios that occur during an OGTT considering onl…
▽ More
One common way to test for diabetes is the Oral Glucose Tolerance Test or OGTT. Most common methods for the analysis of the data on this test are wasteful of much of the information contained therein. We propose to model blood glucose during an OGTT using a compartmental dynamic model with a system of ODEs. Our model works well in describing most scenarios that occur during an OGTT considering only 4 parameters. Fitting the model to data is an inverse problem, which is suitable for Bayesian inference. Priors are specified and posterior inference results are shown using real data.
△ Less
Submitted 14 May, 2019; v1 submitted 18 January, 2016;
originally announced January 2016.
-
Quantitative exponential bounds for the renewal theorem with spread-out distributions
Authors:
J. -B Bardet,
A Christen,
J Fontbona
Abstract:
We establish explicit exponential convergence estimates for the renewal theorem, in terms of a uniform component of the inter arrival distribution, of its Laplace transform which is assumed finite on a positive interval, and of the Laplace transform of some related random variable. Our proof is based on a coupling construction relying on discrete-time Markovian structures that underly the renewal…
▽ More
We establish explicit exponential convergence estimates for the renewal theorem, in terms of a uniform component of the inter arrival distribution, of its Laplace transform which is assumed finite on a positive interval, and of the Laplace transform of some related random variable. Our proof is based on a coupling construction relying on discrete-time Markovian structures that underly the renewal processes and on Lyapunov-Doeblin type arguments.
△ Less
Submitted 30 November, 2016; v1 submitted 23 April, 2015;
originally announced April 2015.
-
A method to deconvolve mass ratio distribution from binary stars
Authors:
Michel Cure,
Diego F. Rial,
Alejandra Christen,
Julia Cassetti,
Henri M. J. Boffin
Abstract:
To better understand the evolution of stars in binary systems as well as to constrain the formation of binary stars, it is important to know the binary mass-ratio distribution. However, in most cases, i.e. for single-lined spectroscopic binaries, the mass ratio cannot be measured directly but only derived as the convolution of a function that depends on the mass ratio and the unknown inclination a…
▽ More
To better understand the evolution of stars in binary systems as well as to constrain the formation of binary stars, it is important to know the binary mass-ratio distribution. However, in most cases, i.e. for single-lined spectroscopic binaries, the mass ratio cannot be measured directly but only derived as the convolution of a function that depends on the mass ratio and the unknown inclination angle of the orbit on the plane of the sky. We extend our previous method to deconvolve this inverse problem (Cure et al. 2014), i.e., we obtain as an integral the cumulative distribution function (CDF) for the mass ratio distribution. After a suitable transformation of variables it turns out that this problem is the same as the one for rotational velocities $v \sin i$, allowing a close analytic formulation for the CDF. We then apply our method to two real datasets: a sample of Am stars binary systems, and a sample of massive spectroscopic binaries in the Cyg OB2 Association.} {We are able to reproduce the previous results of Boffin (2010) for the sample of Am stars, while we show that the mass ratio distribution of massive stars shows an excess of small mass ratio systems, contrarily to what was claimed by Kobulnicky et al. (2014). Our method proves very robust and deconvolves the distribution from a sample in just a single step.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
A method to deconvolve stellar rotational velocities
Authors:
Michel Cure,
Diego F. Rial,
Alejandra Christen,
Julia Cassetti
Abstract:
Rotational speed is an important physical parameter of stars and knowing the distribution of stellar rotational velocities is essential for the understanding stellar evolution. However, it cannot be measured directly but the convolution of the rotational speed and the sine of the inclination angle, $v \sin i$. We developed a method to deconvolve this inverse problem and obtain the cumulative distr…
▽ More
Rotational speed is an important physical parameter of stars and knowing the distribution of stellar rotational velocities is essential for the understanding stellar evolution. However, it cannot be measured directly but the convolution of the rotational speed and the sine of the inclination angle, $v \sin i$. We developed a method to deconvolve this inverse problem and obtain the cumulative distribution function (CDF) for stellar rotational velocities extending the work of Chandrasekhar & Münch (1950). This method is applied a) to theoretical synthetic data recovering the original velocity distribution with very small error; b) to a sample of about 12.000 field main--sequence stars, corroborating that the velocity distribution function is non--Maxwellian, but is better described by distributions based on the concept of maximum entropy, such as Tsallis or Kaniadakis distribution functions. This is a very robust and novel method that deconvolve the rotational velocity cumulative distribution function from a sample of $v \sin i$ data in just one single step without needing any convergence criteria.
△ Less
Submitted 19 March, 2014; v1 submitted 6 January, 2014;
originally announced January 2014.
-
An analysis of the interaction between influenza and respiratory syncytial virus based on acute respiratory infection records
Authors:
Yendry N. Arguedas-Flatts,
Marcos A. Capistrán,
J. Andrés Christen,
Daniel E. Noyola
Abstract:
Under the hypothesis that both influenza and respiratory syncytial virus (RSV) are the two leading causes of acute respiratory infections (ARI), in this paper we have used a standard two-pathogen epidemic model as a regressor to explain, on a yearly basis, high season ARI data in terms of the contact rates and initial conditions of the mathematical model. The rationale is that ARI high season is a…
▽ More
Under the hypothesis that both influenza and respiratory syncytial virus (RSV) are the two leading causes of acute respiratory infections (ARI), in this paper we have used a standard two-pathogen epidemic model as a regressor to explain, on a yearly basis, high season ARI data in terms of the contact rates and initial conditions of the mathematical model. The rationale is that ARI high season is a transient regime of a noisy system, e.g., the system is driven away from equilibrium every year by fluctuations in variables such as humidity, temperature, viral mutations and human behavior. Using the value of the replacement number as a phenotypic trait associated to fitness, we provide evidence that influenza and RSV coexists throughout the ARI high season through superinfection.
△ Less
Submitted 29 November, 2013;
originally announced December 2013.
-
Bayesian Analysis of ODE's: solver optimal accuracy and Bayes factors
Authors:
Marcos Capistrán,
J. Andrés Christen,
Sophie Donnet
Abstract:
In most relevant cases in the Bayesian analysis of ODE inverse problems, a numerical solver needs to be used. Therefore, we cannot work with the exact theoretical posterior distribution but only with an approximate posterior deriving from the error in the numerical solver. To compare a numerical and the theoretical posterior distributions we propose to use Bayes Factors (BF), considering both of t…
▽ More
In most relevant cases in the Bayesian analysis of ODE inverse problems, a numerical solver needs to be used. Therefore, we cannot work with the exact theoretical posterior distribution but only with an approximate posterior deriving from the error in the numerical solver. To compare a numerical and the theoretical posterior distributions we propose to use Bayes Factors (BF), considering both of them as models for the data at hand. We prove that the theoretical vs a numerical posterior BF tends to 1, in the same order (of the step size used) as the numerical forward map solver does. For higher order solvers (eg. Runge-Kutta) the Bayes Factor is already nearly 1 for step sizes that would take far less computational effort. Considerable CPU time may be saved by using coarser solvers that nevertheless produce practically error free posteriors. Two examples are presented where nearly 90% CPU time is saved while all inference results are identical to using a solver with a much finer time step.
△ Less
Submitted 10 November, 2013;
originally announced November 2013.
-
On optimal direction gibbs sampling
Authors:
J. Andrés Christen,
Colin Fox,
Diego Andrés Pérez-Ruiz,
Mario Santana-Cibrian
Abstract:
Generalized Gibbs kernels are those that may take any direction not necessarily bounded to each axis along the parameters of the objective function. We study how to optimally choose such directions in a Directional, random scan, Gibbs sampler setting. The optimal direction is chosen by minimizing to the mutual information (Kullback-Leibler divergence) of two steps of the MCMC for a truncated Norma…
▽ More
Generalized Gibbs kernels are those that may take any direction not necessarily bounded to each axis along the parameters of the objective function. We study how to optimally choose such directions in a Directional, random scan, Gibbs sampler setting. The optimal direction is chosen by minimizing to the mutual information (Kullback-Leibler divergence) of two steps of the MCMC for a truncated Normal objective function. The result is generalized to be used when a Multivariate Normal (local) approximation is available for the objective function. Three Gibbs direction distributions are tested in highly skewed non-normal objective functions.
△ Less
Submitted 17 May, 2012;
originally announced May 2012.
-
Total variation estimates for the TCP process
Authors:
Jean-Baptiste Bardet,
Alejandra Christen,
Arnaud Guillin,
Florent Malrieu,
Pierre-André Zitt
Abstract:
The TCP window size process appears in the modeling of the famous Transmission Control Protocol used for data transmission over the Internet. This continuous time Markov process takes its values in [0, \infty), is ergodic and irreversible. The sample paths are piecewise linear deterministic and the whole randomness of the dynamics comes from the jump mechanism. The aim of the present paper is to p…
▽ More
The TCP window size process appears in the modeling of the famous Transmission Control Protocol used for data transmission over the Internet. This continuous time Markov process takes its values in [0, \infty), is ergodic and irreversible. The sample paths are piecewise linear deterministic and the whole randomness of the dynamics comes from the jump mechanism. The aim of the present paper is to provide quantitative estimates for the exponential convergence to equilibrium, in terms of the total variation and Wasserstein distances.
△ Less
Submitted 21 September, 2012; v1 submitted 29 December, 2011;
originally announced December 2011.
-
Towards Uncertainty Quantification and Inference in the stochastic SIR Epidemic Model
Authors:
Marcos A. Capistrán,
J. Andrés Christen,
Jorge X. Velasco-Hernández
Abstract:
In this paper we introduce a novel method to conduct inference with models defined through a continuous-time Markov process, and we apply these results to a classical stochastic SIR model as a case study. Using the inverse-size expansion of van Kampen we obtain approximations for first and second moments for the state variables. These approximate moments are in turn matched to the moments of an in…
▽ More
In this paper we introduce a novel method to conduct inference with models defined through a continuous-time Markov process, and we apply these results to a classical stochastic SIR model as a case study. Using the inverse-size expansion of van Kampen we obtain approximations for first and second moments for the state variables. These approximate moments are in turn matched to the moments of an inputed generic discrete distribution aimed at generating an approximate likelihood that is valid both for low count or high count data. We conduct a full Bayesian inference to estimate epidemic parameters using informative priors. Excellent estimations and predictions are obtained both in a synthetic data scenario and in two Dengue fever case studies.
△ Less
Submitted 9 November, 2011;
originally announced November 2011.
-
A Generic Multivariate Distribution for Counting Data
Authors:
Marcos Capistrán,
J. Andrés Christen
Abstract:
Motivated by the need, in some Bayesian likelihood free inference problems, of imputing a multivariate counting distribution based on its vector of means and variance-covariance matrix, we define a generic multivariate discrete distribution. Based on blending the Binomial, Poisson and Negative-Binomial distributions, and using a normal multivariate copula, the required distribution is defined. Thi…
▽ More
Motivated by the need, in some Bayesian likelihood free inference problems, of imputing a multivariate counting distribution based on its vector of means and variance-covariance matrix, we define a generic multivariate discrete distribution. Based on blending the Binomial, Poisson and Negative-Binomial distributions, and using a normal multivariate copula, the required distribution is defined. This distribution tends to the Multivariate Normal for large counts and has an approximate pmf version that is quite simple to evaluate.
△ Less
Submitted 24 March, 2011;
originally announced March 2011.