-
LocalCop: An R package for local likelihood inference for conditional copulas
Authors:
Elif F. Acar,
Martin Lysy,
Alan Kuchinsky
Abstract:
Conditional copulas models allow the dependence structure between multiple response variables to be modelled as a function of covariates. LocalCop (Acar & Lysy, 2024) is an R/C++ package for computationally efficient semiparametric conditional copula modelling using a local likelihood inference framework developed in Acar, Craiu, & Yao (2011), Acar, Craiu, & Yao (2013) and Acar, Czado, & Lysy (201…
▽ More
Conditional copulas models allow the dependence structure between multiple response variables to be modelled as a function of covariates. LocalCop (Acar & Lysy, 2024) is an R/C++ package for computationally efficient semiparametric conditional copula modelling using a local likelihood inference framework developed in Acar, Craiu, & Yao (2011), Acar, Craiu, & Yao (2013) and Acar, Czado, & Lysy (2019).
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Plant-Capture Methods for Estimating Population Size from Uncertain Plant Captures
Authors:
Yiran Wang,
Martin Lysy,
Audrey Béliveau
Abstract:
Plant-capture is a variant of classical capture-recapture methods used to estimate the size of a population. In this method, decoys referred to as "plants" are introduced into the population in order to estimate the capture probability. The method has shown considerable success in estimating population sizes from limited samples in many epidemiological, ecological, and demographic studies. However…
▽ More
Plant-capture is a variant of classical capture-recapture methods used to estimate the size of a population. In this method, decoys referred to as "plants" are introduced into the population in order to estimate the capture probability. The method has shown considerable success in estimating population sizes from limited samples in many epidemiological, ecological, and demographic studies. However, previous plant-recapture studies have not systematically accounted for uncertainty in the capture status of each individual plant. In this work, we propose various approaches to formally incorporate uncertainty into the plant-capture model arising from (i) the capture status of plants and (ii) the heterogeneity between multiple survey sites. We present two inference methods and compare their performance in simulation studies. We then apply our methods to estimate the size of the homeless population in several US cities using the large-scale "S-night" study conducted by the US Census Bureau.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
BlackJAX: Composable Bayesian inference in JAX
Authors:
Alberto Cabezas,
Adrien Corenflos,
Junpeng Lao,
Rémi Louf,
Antoine Carnec,
Kaustubh Chaudhari,
Reuben Cohn-Gordon,
Jeremie Coullon,
Wei Deng,
Sam Duffield,
Gerardo Durán-Martín,
Marcin Elantkowski,
Dan Foreman-Mackey,
Michele Gregori,
Carlos Iguaran,
Ravin Kumar,
Martin Lysy,
Kevin Murphy,
Juan Camilo Orduz,
Karm Patel,
Xi Wang,
Rob Zinkov
Abstract:
BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well w…
▽ More
BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well with probabilistic programming languages by working directly with the (un-normalized) target log density function. BlackJAX is intended as a collection of low-level, composable implementations of basic statistical 'atoms' that can be combined to perform well-defined Bayesian inference, but also provides high-level routines for ease of use. It is designed for users who need cutting-edge methods, researchers who want to create complex sampling methods, and people who want to learn how these work.
△ Less
Submitted 22 February, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Data-Adaptive Probabilistic Likelihood Approximation for Ordinary Differential Equations
Authors:
Mohan Wu,
Martin Lysy
Abstract:
Estimating the parameters of ordinary differential equations (ODEs) is of fundamental importance in many scientific applications. While ODEs are typically approximated with deterministic algorithms, new research on probabilistic solvers indicates that they produce more reliable parameter estimates by better accounting for numerical errors. However, many ODE systems are highly sensitive to their pa…
▽ More
Estimating the parameters of ordinary differential equations (ODEs) is of fundamental importance in many scientific applications. While ODEs are typically approximated with deterministic algorithms, new research on probabilistic solvers indicates that they produce more reliable parameter estimates by better accounting for numerical errors. However, many ODE systems are highly sensitive to their parameter values. This produces deep local maxima in the likelihood function -- a problem which existing probabilistic solvers have yet to resolve. Here we present a novel probabilistic ODE likelihood approximation, DALTON, which can dramatically reduce parameter sensitivity by learning from noisy ODE measurements in a data-adaptive manner. Our approximation scales linearly in both ODE variables and time discretization points, and is applicable to ODEs with both partially-unobserved components and non-Gaussian measurement models. Several examples demonstrate that DALTON produces more accurate parameter estimates via numerical optimization than existing probabilistic ODE solvers, and even in some cases than the exact ODE likelihood itself.
△ Less
Submitted 6 December, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Functional Connectivity: Continuous-Time Latent Factor Models for Neural Spike Trains
Authors:
Meixi Chen,
Martin Lysy,
David Moorman,
Reza Ramezan
Abstract:
Modelling the dynamics of interactions in a neuronal ensemble is an important problem in functional connectivity research. One popular framework is latent factor models (LFMs), which have achieved notable success in decoding neuronal population dynamics. However, most LFMs are specified in discrete time, where the choice of bin size significantly impacts inference results. In this work, we present…
▽ More
Modelling the dynamics of interactions in a neuronal ensemble is an important problem in functional connectivity research. One popular framework is latent factor models (LFMs), which have achieved notable success in decoding neuronal population dynamics. However, most LFMs are specified in discrete time, where the choice of bin size significantly impacts inference results. In this work, we present what is, to the best of our knowledge, the first continuous-time multivariate spike train LFM for studying neuronal interactions and functional connectivity. We present an efficient parameter inference algorithm for our biologically justifiable model which (1) scales linearly in the number of simultaneously recorded neurons and (2) bypasses time binning and related issues. Simulation studies show that parameter estimation using the proposed model is highly accurate. Applying our LFM to experimental data from a classical conditioning study on the prefrontal cortex in rats, we found that coordinated neuronal activities are affected by (1) the onset of the cue for reward delivery, and (2) the sub-region within the frontal cortex (OFC/mPFC). These findings shed new light on our understanding of cue and outcome value encoding.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
A Multivariate Point Process Model for Simultaneously Recorded Neural Spike Trains
Authors:
Reza Ramezan,
Meixi Chen,
Martin Lysy,
Paul Marriott
Abstract:
The current state-of-the-art in neurophysiological data collection allows for simultaneous recording of tens to hundreds of neurons, for which point processes are an appropriate statistical modelling framework. However, existing point process models lack multivariate generalizations which are both flexible and computationally tractable. This paper introduces a multivariate generalization of the Sk…
▽ More
The current state-of-the-art in neurophysiological data collection allows for simultaneous recording of tens to hundreds of neurons, for which point processes are an appropriate statistical modelling framework. However, existing point process models lack multivariate generalizations which are both flexible and computationally tractable. This paper introduces a multivariate generalization of the Skellam process with resetting (SPR), a point process tailored to model individual neural spike trains. The multivariate SPR (MSPR) is biologically justified as it mimics the process of neural integration. Its flexible dependence structure and a fast parameter estimation method make it well-suited for the analysis of simultaneously recorded spike trains from multiple neurons. The strengths and weaknesses of the MSPR are demonstrated through simulation and analysis of experimental data.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Multimodel Bayesian Analysis of Load Duration Effects in Lumber Reliability
Authors:
Yunfeng Yang,
Martin Lysy,
Samuel W. K. Wong
Abstract:
This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also…
▽ More
This paper evaluates the reliability of lumber, accounting for the duration-of-load (DOL) effect under different load profiles based on a multimodel Bayesian approach. Three individual DOL models previously used for reliability assessment are considered: the US model, the Canadian model, and the Gamma process model. Procedures for stochastic generation of residential, snow, and wind loads are also described. We propose Bayesian model-averaging (BMA) as a method for combining the reliability estimates of individual models under a given load profile that coherently accounts for statistical uncertainty in the choice of model and parameter values. The method is applied to the analysis of a Hemlock experimental dataset, where the BMA results are illustrated via estimated reliability indices together with 95% interval bands.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Fast and Scalable Inference for Spatial Extreme Value Models
Authors:
Meixi Chen,
Reza Ramezan,
Martin Lysy
Abstract:
The generalized extreme value (GEV) distribution is a popular model for analyzing and forecasting extreme weather data. To increase prediction accuracy, spatial information is often pooled via a latent Gaussian process (GP) on the GEV parameters. Inference for GEV-GP models is typically carried out using Markov chain Monte Carlo (MCMC) methods, or using approximate inference methods such as the in…
▽ More
The generalized extreme value (GEV) distribution is a popular model for analyzing and forecasting extreme weather data. To increase prediction accuracy, spatial information is often pooled via a latent Gaussian process (GP) on the GEV parameters. Inference for GEV-GP models is typically carried out using Markov chain Monte Carlo (MCMC) methods, or using approximate inference methods such as the integrated nested Laplace approximation (INLA). However, MCMC becomes prohibitively slow as the number of spatial locations increases, whereas INLA is only applicable in practice to a limited subset of GEV-GP models. In this paper, we revisit the original Laplace approximation for fitting spatial GEV models. In combination with a popular sparsity-inducing spatial covariance approximation technique, we show through simulations that our approach accurately estimates the Bayesian predictive distribution of extreme weather events, is scalable to several thousand spatial locations, and is several orders of magnitude faster than MCMC. A case study in forecasting extreme snowfall across Canada is presented.
△ Less
Submitted 16 May, 2024; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Measurement Error Correction in Particle Tracking Microrheology
Authors:
Yun Ling,
Martin Lysy,
Ian Seim,
Jay M. Newby,
David B. Hill,
Jeremy Cribb,
M. Gregory Forest
Abstract:
In diverse biological applications, particle tracking of passive microscopic species has become the experimental measurement of choice -- when either the materials are of limited volume, or so soft as to deform uncontrollably when manipulated by traditional instruments. In a wide range of particle tracking experiments, a ubiquitous finding is that the mean squared displacement (MSD) of particle po…
▽ More
In diverse biological applications, particle tracking of passive microscopic species has become the experimental measurement of choice -- when either the materials are of limited volume, or so soft as to deform uncontrollably when manipulated by traditional instruments. In a wide range of particle tracking experiments, a ubiquitous finding is that the mean squared displacement (MSD) of particle positions exhibits a power-law signature, the parameters of which reveal valuable information about the viscous and elastic properties of various biomaterials. However, MSD measurements are typically contaminated by complex and interacting sources of instrumental noise. As these often affect the high-frequency bandwidth to which MSD estimates are particularly sensitive, inadequate error correction can lead to severe bias in power law estimation and thereby, the inferred viscoelastic properties. In this article, we propose a novel strategy to filter high-frequency noise from particle tracking measurements. Our filters are shown theoretically to cover a broad spectrum of high-frequency noises, and lead to a parametric estimator of MSD power-law coefficients for which an efficient computational implementation is presented. Based on numerous analyses of experimental and simulated data, results suggest our methods perform very well compared to other denoising procedures.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
A Convergence Diagnostic for Bayesian Clustering
Authors:
Masoud Asgharian,
Martin Lysy,
Vahid Partovi Nia
Abstract:
In many applications of Bayesian clustering, posterior sampling on the discrete state space of cluster allocations is achieved via Markov chain Monte Carlo (MCMC) techniques. As it is typically challenging to design transition kernels to explore this state space efficiently, MCMC convergence diagnostics for clustering applications is especially important. For general MCMC problems, state-of-the-ar…
▽ More
In many applications of Bayesian clustering, posterior sampling on the discrete state space of cluster allocations is achieved via Markov chain Monte Carlo (MCMC) techniques. As it is typically challenging to design transition kernels to explore this state space efficiently, MCMC convergence diagnostics for clustering applications is especially important. For general MCMC problems, state-of-the-art convergence diagnostics involve comparisons across multiple chains. However, single-chain alternatives can be appealing for computationally intensive and slowly-mixing MCMC, as is typically the case for Bayesian clustering. Thus, we propose here a single-chain convergence diagnostic specifically tailored to discrete-space MCMC. Namely, we consider a Hotelling-type statistic on the highest probability states, and use regenerative sampling theory to derive its equilibrium distribution. By leveraging information from the unnormalized posterior, our diagnostic protects against seemingly convergent chains in which the relative frequency of visited states is incorrect. The methodology is illustrated with a Bayesian clustering analysis of genetic mutants of the flowering plant Arabidopsis thaliana.
△ Less
Submitted 12 June, 2019; v1 submitted 7 December, 2017;
originally announced December 2017.
-
Robust and Efficient Parametric Spectral Estimation in Atomic Force Microscopy
Authors:
Bryan Yates,
Aleksander Labuda,
Martin Lysy
Abstract:
An atomic force microscope (AFM) is capable of producing ultra-high resolution measurements of nanoscopic objects and forces. It is an indispensable tool for various scientific disciplines such as molecular engineering, solid-state physics, and cell biology. Prior to a given experiment, the AFM must be calibrated by fitting a spectral density model to baseline recordings. However, since AFM experi…
▽ More
An atomic force microscope (AFM) is capable of producing ultra-high resolution measurements of nanoscopic objects and forces. It is an indispensable tool for various scientific disciplines such as molecular engineering, solid-state physics, and cell biology. Prior to a given experiment, the AFM must be calibrated by fitting a spectral density model to baseline recordings. However, since AFM experiments typically collect large amounts of data, parameter estimation by maximum likelihood can be prohibitively expensive. Thus, practitioners routinely employ a much faster least-squares estimation method, at the cost of substantially reduced statistical efficiency. Additionally, AFM data is often contaminated by periodic electronic noise, to which parameter estimates are highly sensitive. This article proposes a two-stage estimator to address these issues. Preliminary parameter estimates are first obtained by a variance-stabilizing procedure, by which the simplicity of least-squares combines with the efficiency of maximum likelihood. A test for spectral periodicities then eliminates high-impact outliers, considerably and robustly protecting the second-stage estimator from the effects of electronic noise. Simulation and experimental results indicate that a two- to ten-fold reduction in mean squared error can be expected by applying our methodology.
△ Less
Submitted 27 June, 2017;
originally announced June 2017.
-
Maximum Likelihood Estimation for Single Particle, Passive Microrheology Data with Drift
Authors:
John W. R. Mellnik,
Martin Lysy,
Paula A. Vasquez,
Natesh S. Pillai,
David B. Hill,
Jeremy Crib,
Scott A. McKinley,
M. Gregory Forest
Abstract:
Volume limitations and low yield thresholds of biological fluids have led to widespread use of passive microparticle rheology. The mean-squared-displacement (MSD) statistics of bead position time series (bead paths) are either applied directly to determine the creep compliance [Xu et al (1998)] or transformed to determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent hurdle a…
▽ More
Volume limitations and low yield thresholds of biological fluids have led to widespread use of passive microparticle rheology. The mean-squared-displacement (MSD) statistics of bead position time series (bead paths) are either applied directly to determine the creep compliance [Xu et al (1998)] or transformed to determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent hurdle arises when there is a non-diffusive experimental drift in the data. Commensurate with the magnitude of drift relative to diffusive mobility, quantified by a Péclet number, the MSD statistics are distorted, and thus the path data must be "corrected" for drift. The standard approach is to estimate and subtract the drift from particle paths, and then calculate MSD statistics. We present an alternative, parametric approach using maximum likelihood estimation that simultaneously fits drift and diffusive model parameters from the path data; the MSD statistics (and consequently the compliance and dynamic moduli) then follow directly from the best-fit model. We illustrate and compare both methods on simulated path data over a range of Péclet numbers, where exact answers are known. We choose fractional Brownian motion as the numerical model because it affords tunable, sub-diffusive MSD statistics consistent with typical 30 second long, experimental observations of microbeads in several biological fluids. Finally, we apply and compare both methods on data from human bronchial epithelial cell culture mucus.
△ Less
Submitted 21 February, 2016; v1 submitted 10 September, 2015;
originally announced September 2015.
-
A Heteroscedastic Accelerated Failure Time Model for Survival Analysis
Authors:
Yifan Wang,
Tian You,
Martin Lysy
Abstract:
Nonparametric and semiparametric methods are commonly used in survival analysis to mitigate the bias due to model misspecification. However, such methods often cannot estimate upper-tail survival quantiles when a sizable proportion of the data are censored, in which case parametric likelihood-based estimators present a viable alternative. In this article, we extend a popular family of parametric s…
▽ More
Nonparametric and semiparametric methods are commonly used in survival analysis to mitigate the bias due to model misspecification. However, such methods often cannot estimate upper-tail survival quantiles when a sizable proportion of the data are censored, in which case parametric likelihood-based estimators present a viable alternative. In this article, we extend a popular family of parametric survival models which make the Accelerated Failure Time (AFT) assumption to account for heteroscedasticity in the survival times. The conditional variances can depend on arbitrary covariates, thus adding considerable flexibility to the homoscedastic model. We present an Expectation-Conditional-Maximization (ECM) algorithm to efficiently compute the HAFT maximum likelihood estimator with right-censored data. The methodology is applied to the heavily censored data from a colon cancer clinical trial, for which a new type of highly stringent model residuals is proposed. Based on these, the HAFT model was found to eliminate most outliers from its homoscedastic counterpart.
△ Less
Submitted 17 July, 2019; v1 submitted 20 August, 2015;
originally announced August 2015.
-
Model comparison and assessment for single particle tracking in biological fluids
Authors:
Martin Lysy,
Natesh S. Pillai,
David B. Hill,
M. Gregory Forest,
John Mellnik,
Paula Vasquez,
Scott A. McKinley
Abstract:
State-of-the-art techniques in passive particle-tracking microscopy provide high-resolution path trajectories of diverse foreign particles in biological fluids. For particles on the order of 1 micron diameter, these paths are generally inconsistent with simple Brownian motion. Yet, despite an abundance of data confirming these findings and their wide-ranging scientific implications, stochastic mod…
▽ More
State-of-the-art techniques in passive particle-tracking microscopy provide high-resolution path trajectories of diverse foreign particles in biological fluids. For particles on the order of 1 micron diameter, these paths are generally inconsistent with simple Brownian motion. Yet, despite an abundance of data confirming these findings and their wide-ranging scientific implications, stochastic modeling of the complex particle motion has received comparatively little attention. Even among posited models, there is virtually no literature on likelihood-based inference, model comparisons, and other quantitative assessments. In this article, we develop a rigorous and computationally efficient Bayesian methodology to address this gap. We analyze two of the most prevalent candidate models for 30 second paths of 1 micron diameter tracer particles in human lung mucus: fractional Brownian motion (fBM) and a Generalized Langevin Equation (GLE) consistent with viscoelastic theory. Our model comparisons distinctly favor GLE over fBM, with the former describing the data remarkably well up to the timescales for which we have reliable information.
△ Less
Submitted 29 November, 2015; v1 submitted 22 July, 2014;
originally announced July 2014.
-
Statistical Inference for Stochastic Differential Equations with Memory
Authors:
Martin Lysy,
Natesh S. Pillai
Abstract:
In this paper we construct a framework for doing statistical inference for discretely observed stochastic differential equations (SDEs) where the driving noise has 'memory'. Classical SDE models for inference assume the driving noise to be Brownian motion, or "white noise", thus implying a Markov assumption. We focus on the case when the driving noise is a fractional Brownian motion, which is a co…
▽ More
In this paper we construct a framework for doing statistical inference for discretely observed stochastic differential equations (SDEs) where the driving noise has 'memory'. Classical SDE models for inference assume the driving noise to be Brownian motion, or "white noise", thus implying a Markov assumption. We focus on the case when the driving noise is a fractional Brownian motion, which is a common continuous-time modeling device for capturing long-range memory. Since the likelihood is intractable, we proceed via data augmentation, adapting a familiar discretization and missing data approach developed for the white noise case. In addition to the other SDE parameters, we take the Hurst index to be unknown and estimate it from the data. Posterior sampling is performed via a Hybrid Monte Carlo algorithm on both the parameters and the missing data simultaneously so as to improve mixing. We point out that, due to the long-range correlations of the driving noise, careful discretization of the underlying SDE is necessary for valid inference. Our approach can be adapted to other types of rough-path driving processes such as Gaussian "colored" noise. The methodology is used to estimate the evolution of the memory parameter in US short-term interest rates.
△ Less
Submitted 3 July, 2013;
originally announced July 2013.
-
Shrinkage Estimation in Multilevel Normal Models
Authors:
Carl N. Morris,
Martin Lysy
Abstract:
This review traces the evolution of theory that started when Charles Stein in 1955 [In Proc. 3rd Berkeley Sympos. Math. Statist. Probab. I (1956) 197--206, Univ. California Press] showed that using each separate sample mean from $k\ge3$ Normal populations to estimate its own population mean $μ_i$ can be improved upon uniformly for every possible $μ=(μ_1,...,μ_k)'$. The dominating estimators, refer…
▽ More
This review traces the evolution of theory that started when Charles Stein in 1955 [In Proc. 3rd Berkeley Sympos. Math. Statist. Probab. I (1956) 197--206, Univ. California Press] showed that using each separate sample mean from $k\ge3$ Normal populations to estimate its own population mean $μ_i$ can be improved upon uniformly for every possible $μ=(μ_1,...,μ_k)'$. The dominating estimators, referred to here as being "Model-I minimax," can be found by shrinking the sample means toward any constant vector. Admissible minimax shrinkage estimators were derived by Stein and others as posterior means based on a random effects model, "Model-II" here, wherein the $μ_i$ values have their own distributions. Section 2 centers on Figure 2, which organizes a wide class of priors on the unknown Level-II hyperparameters that have been proved to yield admissible Model-I minimax shrinkage estimators in the "equal variance case." Putting a flat prior on the Level-II variance is unique in this class for its scale-invariance and for its conjugacy, and it induces Stein's harmonic prior (SHP) on $μ_i$.
△ Less
Submitted 26 March, 2012;
originally announced March 2012.