Search | arXiv e-print repository

Analysing spatial point patterns in digital pathology: immune cells in high-grade serous ovarian carcinomas

Authors: Jonatan A. González, Julia Wrobel, Simon Vandekar, Paula Moraga

Abstract: Multiplex immunofluorescence (mIF) imaging technology facilitates the study of the tumour microenvironment in cancer patients. Due to the capabilities of this emerging bioimaging technique, it is possible to statistically analyse, for example, the co-varying location and functions of multiple different types of immune cells. Complex spatial relationships between different immune cells have been sh… ▽ More Multiplex immunofluorescence (mIF) imaging technology facilitates the study of the tumour microenvironment in cancer patients. Due to the capabilities of this emerging bioimaging technique, it is possible to statistically analyse, for example, the co-varying location and functions of multiple different types of immune cells. Complex spatial relationships between different immune cells have been shown to correlate with patient outcomes and may reveal new pathways for targeted immunotherapy treatments. This tutorial reviews methods and procedures relating to spatial point patterns for complex data analysis. We consider tissue cells as a realisation of a spatial point process for each patient. We focus on proper functional descriptors for each observation and techniques that allow us to obtain information about inter-patient variation. Ovarian cancer is the deadliest gynaecological malignancy and can resist chemotherapy treatment effective in cancers. We use a dataset of high-grade serous ovarian cancer samples from 51 patients. We examine the immune cell composition (T cells, B cells, macrophages) within tumours and additional information such as cell classification (tumour or stroma) and other patient clinical characteristics. Our analyses, supported by reproducible software, apply to other digital pathology datasets. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2302.12345 [pdf, ps, other]

RESI: An R Package for Robust Effect Sizes

Authors: Megan Jones, Kaidi Kang, Simon Vandekar

Abstract: Effect size indices are useful parameters that quantify the strength of association and are unaffected by sample size. There are many available effect size parameters and estimators, but it is difficult to compare effect sizes across studies as most are defined for a specific type of population parameter. We recently introduced a new Robust Effect Size Index (RESI) and confidence interval, which i… ▽ More Effect size indices are useful parameters that quantify the strength of association and are unaffected by sample size. There are many available effect size parameters and estimators, but it is difficult to compare effect sizes across studies as most are defined for a specific type of population parameter. We recently introduced a new Robust Effect Size Index (RESI) and confidence interval, which is advantageous because it is not model-specific. Here we present the RESI R package, which makes it easy to report the RESI and its confidence interval for many different model classes, with a consistent interpretation across parameters and model types. The package produces coefficient, ANOVA tables, and overall Wald tests for model inputs, appending the RESI estimate and confidence interval to each. The package also includes functions for visualization and conversions to and from other effect size measures. For illustration, we analyze and interpret three different model types. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 25 pages, 2 figures. Submitted to Journal of Statistical Software

arXiv:2111.05966 [pdf, other]

Accurate confidence interval estimation for non-centrality parameters and effect size indices

Authors: Kaidi Kang, Kristan Armstrong, Suzanne Avery, Maureen McHugo, Stephan Heckers, Simon Vandekar

Abstract: We recently proposed a robust effect size index (RESI) that is related to the non-centrality parameter of a test statistic. RESI is advantageous over common indices because (1) it is widely applicable to many types of data; (2) it can rely on a robust covariance estimate; (3) it can accommodate the existence of nuisance parameters. We provided a consistent estimator for the RESI, however, there is… ▽ More We recently proposed a robust effect size index (RESI) that is related to the non-centrality parameter of a test statistic. RESI is advantageous over common indices because (1) it is widely applicable to many types of data; (2) it can rely on a robust covariance estimate; (3) it can accommodate the existence of nuisance parameters. We provided a consistent estimator for the RESI, however, there is no established confidence interval (CI) estimation procedure for the RESI. Here, we use statistical theory and simulations to evaluate several CI estimation procedures for three estimators of the RESI. Our findings show (1) in contrast to common effect sizes, the robust estimator is consistent for the true effect size; (2) common CI procedures for effect sizes that are non-centrality parameters fail to cover the true effect size at the nominal level. Using the robust estimator along with the proposed bootstrap CI is generally accurate and applicable to conduct consistent estimation and valid inference for the RESI, especially when model assumptions may be violated. Based on the RESI, we propose a general framework for the analysis of effect size (ANOES), such that effect sizes and confidence intervals can be easily reported in an analysis of variance (ANOVA) table format for a wide range of models △ Less

Submitted 10 November, 2021; originally announced November 2021.

arXiv:2110.13074 [pdf, other]

Faster estimation for constrained gamma mixture models using closed-form estimators

Authors: Jiangmei Xiong, Eliot McKinley, Joseph T. Roland, Robert Coffey, Martha J. Shrubsole, Ken S. Lau, Simon Vandekar

Abstract: Mixture models are useful in a wide array of applications to identify subpopulations in noisy overlap** distributions. For example, in multiplexed immunofluorescence (mIF), cell image intensities represent expression levels and the cell populations are a noisy mixture of expressed and unexpressed cells. Among mixture models, the gamma mixture model has the strength of being flexible in fitting s… ▽ More Mixture models are useful in a wide array of applications to identify subpopulations in noisy overlap** distributions. For example, in multiplexed immunofluorescence (mIF), cell image intensities represent expression levels and the cell populations are a noisy mixture of expressed and unexpressed cells. Among mixture models, the gamma mixture model has the strength of being flexible in fitting skewed strictly positive data that occur in many biological measurements. However, the current estimation method uses numerical optimization within the expectation maximization algorithm and is computationally expensive. This makes it infeasible to be applied across many large data sets, as is necessary in mIF data. Powered by a recently developed closed-form estimator for the gamma distribution, we propose a closed-form gamma mixture model that is not only more computationally efficient, but can also incorporate constraints from known biological information to the fitted distribution. We derive the closed-form estimators for the gamma mixture model and use simulations to demonstrate that our model produces comparable results with the current model with significantly less time, and is excellent in constrained model fitting. △ Less

Submitted 25 October, 2021; originally announced October 2021.

arXiv:1902.07232 [pdf, other]

A Robust Effect Size Index

Authors: Simon Vandekar, Ran Tao, Jeffrey Blume

Abstract: Effect size indices are useful tools in study design and reporting because they are unitless measures of association strength that do not depend on sample size. Existing effect size indices are developed for particular parametric models or population parameters. Here, we propose a robust effect size index based on M-estimators. This approach yields an index that is very generalizable because it is… ▽ More Effect size indices are useful tools in study design and reporting because they are unitless measures of association strength that do not depend on sample size. Existing effect size indices are developed for particular parametric models or population parameters. Here, we propose a robust effect size index based on M-estimators. This approach yields an index that is very generalizable because it is unitless across a wide range of models. We demonstrate that the new index is a function of Cohen's $d$, $R^2$, and standardized log odds ratio when each of the parametric models is correctly specified. We show that existing effect size estimators are biased when the parametric models are incorrect (e.g. under unknown heteroskedasticity). We provide simple formulas to compute power and sample size and use simulations to assess the bias and variance of the effect size estimator in finite samples. Because the new index is invariant across models, it has the potential to make communication and comprehension of effect size uniform across the behavioral sciences. △ Less

Submitted 30 October, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1808.07449 [pdf, other]

Robust Spatial Extent Inference with a Semiparametric Bootstrap Joint Testing Procedure

Authors: Simon N. Vandekar, Theodore D. Satterthwaite, Cedric H. Xia, Kosha Ruparel, Ruben C. Gur, Raquel E. Gur, Russell T. Shinohara

Abstract: Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI.… ▽ More Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI. The failure of GRF-based methods is due to unrealistic assumptions about the covariance function of the imaging data. The permutation procedure is the most robust SEI tool because it estimates the covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi-) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use our methods to study the association between performance and executive functioning in a working fMRI study. The sPBJ procedure is robust to variance misspecification and maintains nominal FWER in small samples, in contrast to the GRF methods. The sPBJ also has equal or superior power to the PBJ and permutation procedures. We provide an R package https://github.com/simonvandekar/pbj to perform inference using the PBJ and sPBJ procedures △ Less

Submitted 22 August, 2018; originally announced August 2018.

arXiv:1709.09009 [pdf, other]

doi 10.1080/01621459.2018.1448826

Interpretable High-Dimensional Inference Via Score Projection with an Application in Neuroimaging

Authors: Simon N. Vandekar, Philip T. Reiss, Russell T. Shinohara

Abstract: In the fields of neuroimaging and genetics, a key goal is testing the association of a single outcome with a very high-dimensional imaging or genetic variable. Often, summary measures of the high-dimensional variable are created to sequentially test and localize the association with the outcome. In some cases, the results for summary measures are significant, but subsequent tests used to localize… ▽ More In the fields of neuroimaging and genetics, a key goal is testing the association of a single outcome with a very high-dimensional imaging or genetic variable. Often, summary measures of the high-dimensional variable are created to sequentially test and localize the association with the outcome. In some cases, the results for summary measures are significant, but subsequent tests used to localize differences are underpowered and do not identify regions associated with the outcome. Here, we propose a generalization of Rao's score test based on projecting the score statistic onto a linear subspace of a high-dimensional parameter space. In addition, we provide methods to localize signal in the high-dimensional space by projecting the scores to the subspace where the score test was performed. This allows for inference in the high-dimensional space to be performed on the same degrees of freedom as the score test, effectively reducing the number of comparisons. Simulation results demonstrate the test has competitive power relative to others commonly used. We illustrate the method by analyzing a subset of the Alzheimer's Disease Neuroimaging Initiative dataset. Results suggest cortical thinning of the frontal and temporal lobes may be a useful biological marker of Alzheimer's risk. △ Less

Submitted 26 September, 2017; originally announced September 2017.

arXiv:1708.05037 [pdf, other]

doi 10.1093/biostatistics/kxx051

Faster Family-wise Error Control for Neuroimaging with a Parametric Bootstrap

Authors: Simon N. Vandekar, Theodore D. Satterthwaite, Adon Rosen, Rastko Ciric, David R. Roalf, Kosha Ruparel, Ruben C. Gur, Raquel E. Gur, Russell T. Shinohara

Abstract: In neuroimaging, hundreds to hundreds of thousands of tests are performed across a set of brain regions or all locations in an image. Recent studies have shown that the most common family-wise error (FWE) controlling procedures in imaging, which rely on classical mathematical inequalities or Gaussian random field theory, yield FWE rates that are far from the nominal level. Depending on the approac… ▽ More In neuroimaging, hundreds to hundreds of thousands of tests are performed across a set of brain regions or all locations in an image. Recent studies have shown that the most common family-wise error (FWE) controlling procedures in imaging, which rely on classical mathematical inequalities or Gaussian random field theory, yield FWE rates that are far from the nominal level. Depending on the approach used, the FWER can be exceedingly small or grossly inflated. Given the widespread use of neuroimaging as a tool for understanding neurological and psychiatric disorders, it is imperative that reliable multiple testing procedures are available. To our knowledge, only permutation joint testing procedures have been shown to reliably control the FWER at the nominal level. However, these procedures are computationally intensive due to the increasingly available large sample sizes and dimensionality of the images, and analyses can take days to complete. Here, we develop a parametric bootstrap joint testing procedure. The parametric bootstrap procedure works directly with the test statistics, which leads to much faster estimation of adjusted \emph{p}-values than resampling-based procedures while reliably controlling the FWER in sample sizes available in many neuroimaging studies. We demonstrate that the procedure controls the FWER in finite samples using simulations, and present region- and voxel-wise analyses to test for sex differences in developmental trajectories of cerebral blood flow. △ Less

Submitted 18 August, 2017; v1 submitted 16 August, 2017; originally announced August 2017.

Showing 1–8 of 8 results for author: Vandekar, S