Skip to main content

Showing 1–20 of 20 results for author: Strobl, E V

.
  1. arXiv:2402.05802  [pdf, other

    cs.LG stat.AP stat.ML

    Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

    Authors: Thomas A. Lasko, John M. Still, Thomas Z. Li, Marco Barbero Mota, William W. Stead, Eric V. Strobl, Bennett A. Landman, Fabien Maldonado

    Abstract: Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 29 Pages, 8 figures

    ACM Class: I.2.6; I.2.1; J.3

  2. arXiv:2311.04787  [pdf

    cs.LG cs.PF stat.ML

    Why Do Probabilistic Clinical Models Fail To Transport Between Sites?

    Authors: Thomas A. Lasko, Eric V. Strobl, William W. Stead

    Abstract: The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we present common sources for this failure to transport, which we divide into sources under the control of the experimenter and sources inherent to th… ▽ More

    Submitted 28 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 20 pages, 3 figures

  3. arXiv:2307.05338  [pdf, other

    q-bio.GN

    Root Causal Inference from Single Cell RNA Sequencing with the Negative Binomial

    Authors: Eric V. Strobl

    Abstract: Accurately inferring the root causes of disease from sequencing data can improve the discovery of novel therapeutic targets. However, existing root causal inference algorithms require perfectly measured continuous random variables. Single cell RNA sequencing (scRNA-seq) datasets contain large numbers of cells but non-negative counts measured by an error prone process. We therefore introduce an alg… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  4. arXiv:2305.17574  [pdf, ps, other

    cs.AI cs.LG q-bio.QM stat.AP stat.ML

    Counterfactual Formulation of Patient-Specific Root Causes of Disease

    Authors: Eric V. Strobl

    Abstract: Root causes of disease intuitively correspond to root vertices that increase the likelihood of a diagnosis. This description of a root cause nevertheless lacks the rigorous mathematical formulation needed for the development of computer algorithms designed to automatically detect root causes from data. Prior work defined patient-specific root causes of disease using an interventionalist account th… ▽ More

    Submitted 31 May, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

  5. arXiv:2210.15340  [pdf, other

    stat.ML cs.LG stat.AP

    Sample-Specific Root Causal Inference with Latent Variables

    Authors: Eric V. Strobl, Thomas A. Lasko

    Abstract: Root causal analysis seeks to identify the set of initial perturbations that induce an unwanted outcome. In prior work, we defined sample-specific root causes of disease using exogenous error terms that predict a diagnosis in a structural equation model. We rigorously quantified predictivity using Shapley values. However, the associated algorithms for inferring root causes assume no latent confoun… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  6. arXiv:2205.13085  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model

    Authors: Eric V. Strobl, Thomas A. Lasko

    Abstract: Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structu… ▽ More

    Submitted 6 July, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

  7. arXiv:2205.11627  [pdf, other

    stat.ML cs.LG stat.AP stat.ME

    Identifying Patient-Specific Root Causes of Disease

    Authors: Eric V. Strobl, Thomas A. Lasko

    Abstract: Complex diseases are caused by a multitude of factors that may differ between patients. As a result, hypothesis tests comparing all patients to all healthy controls can detect many significant variables with inconsequential effect sizes. A few highly predictive root causes may nevertheless generate disease within each patient. In this paper, we define patient-specific root causes as variables subj… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  8. arXiv:2111.13229  [pdf, other

    stat.ML cs.LG stat.ME

    Generalizing Clinical Trials with Convex Hulls

    Authors: Eric V. Strobl, Thomas A. Lasko

    Abstract: Randomized clinical trials eliminate confounding but impose strict exclusion criteria that limit recruitment to a subset of the population. Observational datasets are more inclusive but suffer from confounding -- often providing overly optimistic estimates of treatment response over time due to partially optimized physician prescribing patterns. We therefore assume that the unconfounded treatment… ▽ More

    Submitted 27 October, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

  9. arXiv:2105.00455  [pdf, other

    stat.ML cs.LG stat.ME

    Synthesized Difference in Differences

    Authors: Eric V. Strobl, Thomas A. Lasko

    Abstract: We consider estimating the conditional average treatment effect for everyone by eliminating confounding and selection bias. Unfortunately, randomized clinical trials (RCTs) eliminate confounding but impose strict exclusion criteria that prevent sampling of the entire clinical population. Observational datasets are more inclusive but suffer from confounding. We therefore analyze RCT and observation… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 May, 2021; originally announced May 2021.

    Comments: Accepted to ACM BCB 2021

  10. arXiv:2011.01889  [pdf, other

    stat.ML cs.LG

    Automated Hyperparameter Selection for the PC Algorithm

    Authors: Eric V. Strobl

    Abstract: The PC algorithm infers causal relations using conditional independence tests that require a pre-specified Type I $α$ level. PC is however unsupervised, so we cannot tune $α$ using traditional cross-validation. We therefore propose AutoPC, a fast procedure that optimizes $α$ directly for a user chosen metric. We in particular force PC to double check its output by executing a second run on the rec… ▽ More

    Submitted 22 December, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Under consideration at Pattern Recognition Letters

  11. arXiv:1909.05418  [pdf, ps, other

    math.ST

    The Global Markov Property for a Mixture of DAGs

    Authors: Eric V. Strobl

    Abstract: Real causal processes may contain feedback loops and change over time. In this paper, we model cycles and non-stationary distributions using a mixture of directed acyclic graphs (DAGs). We then study the conditional independence (CI) relations induced by a density that factorizes according to a mixture of DAGs in two steps. First, we generalize d-separation for a single DAG to mixture d-separation… ▽ More

    Submitted 12 September, 2019; v1 submitted 11 September, 2019; originally announced September 2019.

  12. arXiv:1905.10330  [pdf, other

    stat.ML cs.LG stat.ME

    Dirac Delta Regression: Conditional Density Estimation with Clinical Trials

    Authors: Eric V. Strobl, Shyam Visweswaran

    Abstract: Personalized medicine seeks to identify the causal effect of treatment for a particular patient as opposed to a clinical population at large. Most investigators estimate such personalized treatment effects by regressing the outcome of a randomized clinical trial (RCT) on patient covariates. The realized value of the outcome may however lie far from the conditional expectation. We therefore introdu… ▽ More

    Submitted 1 September, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

  13. arXiv:1901.09475  [pdf, other

    stat.ML cs.LG stat.AP

    Causal Discovery with a Mixture of DAGs

    Authors: Eric V. Strobl

    Abstract: Causal processes in biomedicine may contain cycles, evolve over time or differ between populations. However, many graphical models cannot accommodate these conditions. We propose to model causation using a mixture of directed cyclic graphs (DAGs), where the joint distribution in a population follows a DAG at any single point in time but potentially different DAGs across time. We also introduce an… ▽ More

    Submitted 5 September, 2020; v1 submitted 27 January, 2019; originally announced January 2019.

  14. arXiv:1805.02087  [pdf, other

    stat.ML cs.LG stat.ME

    A Constraint-Based Algorithm For Causal Discovery with Cycles, Latent Variables and Selection Bias

    Authors: Eric V. Strobl

    Abstract: Causal processes in nature may contain cycles, and real datasets may violate causal sufficiency as well as contain selection bias. No constraint-based causal discovery algorithm can currently handle cycles, latent variables and selection bias (CLS) simultaneously. I therefore introduce an algorithm called Cyclic Causal Inference (CCI) that makes sound inferences with a conditional independence ora… ▽ More

    Submitted 5 May, 2018; originally announced May 2018.

  15. arXiv:1705.09031  [pdf, other

    stat.ME stat.ML

    Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

    Authors: Eric V. Strobl, Shyam Visweswaran, Peter L. Spirtes

    Abstract: Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain on… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

  16. arXiv:1702.03877  [pdf, other

    stat.ME stat.ML

    Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery

    Authors: Eric V. Strobl, Kun Zhang, Shyam Visweswaran

    Abstract: Constraint-based causal discovery (CCD) algorithms require fast and accurate conditional independence (CI) testing. The Kernel Conditional Independence Test (KCIT) is currently one of the most popular CI tests in the non-parametric setting, but many investigators cannot use KCIT with large datasets because the test scales cubicly with sample size. We therefore devise two relaxations called the Ran… ▽ More

    Submitted 12 April, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

    Comments: R package: github.com/ericstrobl/RCIT

  17. arXiv:1607.03975  [pdf, other

    stat.ML stat.ME

    Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values

    Authors: Eric V. Strobl, Peter L. Spirtes, Shyam Visweswaran

    Abstract: The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and co… ▽ More

    Submitted 9 May, 2017; v1 submitted 13 July, 2016; originally announced July 2016.

  18. arXiv:1509.03935  [pdf

    math.ST stat.ME stat.ML

    Markov Boundary Discovery with Ridge Regularized Linear Models

    Authors: Eric V. Strobl, Shyam Visweswaran

    Abstract: Ridge regularized linear models (RRLMs), such as ridge regression and the SVM, are a popular group of methods that are used in conjunction with coefficient hypothesis testing to discover explanatory variables with a significant multivariate association to a response. However, many investigators are reluctant to draw causal interpretations of the selected variables due to the incomplete knowledge o… ▽ More

    Submitted 13 September, 2015; originally announced September 2015.

    Comments: submitted to the Journal of Causal Inference

  19. arXiv:1407.7566  [pdf

    q-bio.QM cs.LG stat.ML

    Dependence versus Conditional Dependence in Local Causal Discovery from Gene Expression Data

    Authors: Eric V. Strobl, Shyam Visweswaran

    Abstract: Motivation: Algorithms that discover variables which are causally related to a target may inform the design of experiments. With observational gene expression data, many methods discover causal variables by measuring each variable's degree of statistical dependence with the target using dependence measures (DMs). However, other methods measure each variable's ability to explain the statistical dep… ▽ More

    Submitted 28 July, 2014; originally announced July 2014.

    Comments: 11 pages, 2 algorithms, 4 figures, 5 tables

  20. arXiv:1402.0108  [pdf

    stat.ML cs.LG

    Markov Blanket Ranking using Kernel-based Conditional Dependence Measures

    Authors: Eric V. Strobl, Shyam Visweswaran

    Abstract: Develo** feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorithms… ▽ More

    Submitted 2 May, 2014; v1 submitted 1 February, 2014; originally announced February 2014.

    Comments: 10 pages, 4 figures, 2 algorithms, NIPS 2013 Workshop on Causality, code: github.com/ericstrobl/