-
Efficiently Deciding Algebraic Equivalence of Bow-Free Acyclic Path Diagrams
Authors:
Thijs van Ommen
Abstract:
For causal discovery in the presence of latent confounders, constraints beyond conditional independences exist that can enable causal discovery algorithms to distinguish more pairs of graphs. Such constraints are not well-understood yet. In the setting of linear structural equation models without bows, we study algebraic constraints and argue that these provide the most fine-grained resolution ach…
▽ More
For causal discovery in the presence of latent confounders, constraints beyond conditional independences exist that can enable causal discovery algorithms to distinguish more pairs of graphs. Such constraints are not well-understood yet. In the setting of linear structural equation models without bows, we study algebraic constraints and argue that these provide the most fine-grained resolution achievable. We propose efficient algorithms that decide whether two graphs impose the same algebraic constraints, or whether the constraints imposed by one graph are a subset of those imposed by another graph.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The risks of risk assessment: causal blind spots when using prediction models for treatment decisions
Authors:
Nan van Geloven,
Ruth H Keogh,
Wouter van Amsterdam,
Giovanni Cinà,
Jesse H. Krijthe,
Niels Peek,
Kim Luijken,
Sara Magliacane,
Paweł Morzywołek,
Thijs van Ommen,
Hein Putter,
Matthew Sperrin,
Junfeng Wang,
Daniala L. Weir,
Vanessa Didelez
Abstract:
Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as…
▽ More
Prediction models are increasingly proposed for guiding treatment decisions, but most fail to address the special role of treatments, leading to inappropriate use. This paper highlights the limitations of using standard prediction models for treatment decision support. We identify `causal blind spots' in three common approaches to handling treatments in prediction modelling: including treatment as a predictor, restricting data based on treatment status and ignoring treatments. When predictions are used to inform treatment decisions, confounders, colliders and mediators, as well as changes in treatment protocols over time may lead to misinformed decision-making. We illustrate potential harmful consequences in several medical applications. We advocate for an extension of guidelines for development, reporting and evaluation of prediction models to ensure that the intended use of the model is matched to an appropriate risk estimand. When prediction models are intended to inform treatment decisions, prediction models should specify upfront the treatment decisions they aim to support and target a prediction estimand in line with that goal. This requires a shift towards develo** predictions under the specific treatment options under consideration (`predictions under interventions'). Predictions under interventions need causal reasoning and inference techniques during development and validation. We argue that this will improve the efficacy of prediction models in guiding treatment decisions and prevent potential negative effects on patient outcomes.
△ Less
Submitted 6 May, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Fundamental Properties of Causal Entropy and Information Gain
Authors:
Francisco N. F. Q. Simoes,
Mehdi Dastani,
Thijs van Ommen
Abstract:
Recent developments enable the quantification of causal control given a structural causal model (SCM). This has been accomplished by introducing quantities which encode changes in the entropy of one variable when intervening on another. These measures, named causal entropy and causal information gain, aim to address limitations in existing information theoretical approaches for machine learning ta…
▽ More
Recent developments enable the quantification of causal control given a structural causal model (SCM). This has been accomplished by introducing quantities which encode changes in the entropy of one variable when intervening on another. These measures, named causal entropy and causal information gain, aim to address limitations in existing information theoretical approaches for machine learning tasks where causality plays a crucial role. They have not yet been properly mathematically studied. Our research contributes to the formal understanding of the notions of causal entropy and causal information gain by establishing and analyzing fundamental properties of these concepts, including bounds and chain rules. Furthermore, we elucidate the relationship between causal entropy and stochastic interventions. We also propose definitions for causal conditional entropy and causal conditional information gain. Overall, this exploration paves the way for enhancing causal machine learning tasks through the study of recently-proposed information theoretic quantities grounded in considerations about causality.
△ Less
Submitted 19 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Risk-based decision making: estimands for sequential prediction under interventions
Authors:
Kim Luijken,
Paweł Morzywołek,
Wouter van Amsterdam,
Giovanni Cinà,
Jeroen Hoogland,
Ruth Keogh,
Jesse Krijthe,
Sara Magliacane,
Thijs van Ommen,
Niels Peek,
Hein Putter,
Maarten van Smeden,
Matthew Sperrin,
Junfeng Wang,
Daniala Weir,
Vanessa Didelez,
Nan van Geloven
Abstract:
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk beca…
▽ More
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred and re-evaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Causal Entropy and Information Gain for Measuring Causal Control
Authors:
Francisco Nunes Ferreira Quialheiro Simoes,
Mehdi Dastani,
Thijs van Ommen
Abstract:
Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution.…
▽ More
Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.
△ Less
Submitted 26 January, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Graphical Representations for Algebraic Constraints of Linear Structural Equations Models
Authors:
Thijs van Ommen,
Mathias Drton
Abstract:
The observational characteristics of a linear structural equation model can be effectively described by polynomial constraints on the observed covariance matrix. However, these polynomials can be exponentially large, making them impractical for many purposes. In this paper, we present a graphical notation for many of these polynomial constraints. The expressive power of this notation is investigat…
▽ More
The observational characteristics of a linear structural equation model can be effectively described by polynomial constraints on the observed covariance matrix. However, these polynomials can be exponentially large, making them impractical for many purposes. In this paper, we present a graphical notation for many of these polynomial constraints. The expressive power of this notation is investigated both theoretically and empirically.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Algebraic Equivalence of Linear Structural Equation Models
Authors:
Thijs van Ommen,
Joris M. Mooij
Abstract:
Despite their popularity, many questions about the algebraic constraints imposed by linear structural equation models remain open problems. For causal discovery, two of these problems are especially important: the enumeration of the constraints imposed by a model, and deciding whether two graphs define the same statistical model. We show how the half-trek criterion can be used to make progress in…
▽ More
Despite their popularity, many questions about the algebraic constraints imposed by linear structural equation models remain open problems. For causal discovery, two of these problems are especially important: the enumeration of the constraints imposed by a model, and deciding whether two graphs define the same statistical model. We show how the half-trek criterion can be used to make progress in both of these problems. We apply our theoretical results to a small-scale model selection problem, and find that taking the additional algebraic constraints into account may lead to significant improvements in model selection accuracy.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions
Authors:
Sara Magliacane,
Thijs van Ommen,
Tom Claassen,
Stephan Bongers,
Philip Versteeg,
Joris M. Mooij
Abstract:
An important goal common to domain adaptation and causal inference is to make accurate predictions when the distributions for the source (or training) domain(s) and target (or test) domain(s) differ. In many cases, these different distributions can be modeled as different contexts of a single underlying system, in which each distribution corresponds to a different perturbation of the system, or in…
▽ More
An important goal common to domain adaptation and causal inference is to make accurate predictions when the distributions for the source (or training) domain(s) and target (or test) domain(s) differ. In many cases, these different distributions can be modeled as different contexts of a single underlying system, in which each distribution corresponds to a different perturbation of the system, or in causal terms, an intervention. We focus on a class of such causal domain adaptation problems, where data for one or more source domains are given, and the task is to predict the distribution of a certain target variable from measurements of other variables in one or more target domains. We propose an approach for solving these problems that exploits causal inference and does not rely on prior knowledge of the causal graph, the type of interventions or the intervention targets. We demonstrate our approach by evaluating a possible implementation on simulated and real world data.
△ Less
Submitted 29 October, 2018; v1 submitted 20 July, 2017;
originally announced July 2017.
-
Robust Probability Updating
Authors:
Thijs van Ommen,
Wouter M. Koolen,
Thijs E. Feenstra,
Peter D. Grünwald
Abstract:
This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution.…
▽ More
This paper discusses an alternative to conditioning that may be used when the probability distribution is not fully specified. It does not require any assumptions (such as CAR: coarsening at random) on the unknown distribution. The well-known Monty Hall problem is the simplest scenario where neither naive conditioning nor the CAR assumption suffice to determine an updated probability distribution. This paper thus addresses a generalization of that problem to arbitrary distributions on finite outcome spaces, arbitrary sets of `messages', and (almost) arbitrary loss functions, and provides existence and characterization theorems for robust probability updating strategies. We find that for logarithmic loss, optimality is characterized by an elegant condition, which we call RCAR (reverse coarsening at random). Under certain conditions, the same condition also characterizes optimality for a much larger class of loss functions, and we obtain an objective and general answer to how one should update probabilities in the light of new information.
△ Less
Submitted 2 May, 2016; v1 submitted 10 December, 2015;
originally announced December 2015.
-
Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It
Authors:
Peter Grünwald,
Thijs van Ommen
Abstract:
We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sa…
▽ More
We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sample size increases. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the Safe Bayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates as soon the standard posterior is not `cumulatively concentrated', and its results on our data are quite encouraging.
△ Less
Submitted 29 October, 2018; v1 submitted 11 December, 2014;
originally announced December 2014.
-
Combining predictions from linear models when training and test inputs differ
Authors:
Thijs van Ommen
Abstract:
Methods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the assumption that the test inputs are identical to the training inputs, which is seldom reasonable. By failing to take into account that prediction will generally be hard…
▽ More
Methods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the assumption that the test inputs are identical to the training inputs, which is seldom reasonable. By failing to take into account that prediction will generally be harder for test inputs that did not occur in the training set, this leads to the selection of too complex models. Based on a novel, unbiased expression for KL divergence, we propose XAIC and its special case FAIC as versions of AIC intended for prediction that use different degrees of knowledge of the test inputs. Both methods substantially differ from and may outperform all the known versions of AIC even when the training and test inputs are iid, and are especially useful for deterministic inputs and under covariate shift. Our experiments on linear models suggest that if the test and training inputs differ substantially, then XAIC and FAIC predictively outperform AIC, BIC and several other methods including Bayesian model averaging.
△ Less
Submitted 24 June, 2014;
originally announced June 2014.
-
A 5-GHz Southern Hemisphere VLBI Survey of Compact Radio Sources - II
Authors:
Z. -Q. Shen,
T. -S. Wan,
J. M. Moran,
D. L. Jauncey,
J. E. Reynolds,
A. K. Tzioumis,
R. G. Gough,
R. H. Ferris,
M. W. Sinclair,
D. -R. Jiang,
X. -Y. Hong,
S. -G. Liang,
P. G. Edwards,
M. E. Costa,
S. J. Tingay,
P. M. McCulloch,
J. E. J. Lovell,
E. A. King,
G. D. Nicolson,
D. W. Murphy,
D. L. Meier,
T. D. van Ommen,
G. L. White
Abstract:
We report the results of a 5-GHz southern-hemisphere snapshot VLBI observation of a sample of blazars. The observations were performed with the Southern Hemisphere VLBI Network plus the Shanghai station in 1993 May. Twenty-three flat-spectrum, radio-loud sources were imaged. These are the first VLBI images for 15 of the sources. Eight of the sources are EGRET (> 100 MeV) gamma-ray sources. The m…
▽ More
We report the results of a 5-GHz southern-hemisphere snapshot VLBI observation of a sample of blazars. The observations were performed with the Southern Hemisphere VLBI Network plus the Shanghai station in 1993 May. Twenty-three flat-spectrum, radio-loud sources were imaged. These are the first VLBI images for 15 of the sources. Eight of the sources are EGRET (> 100 MeV) gamma-ray sources. The milliarcsecond morphology shows a core-jet structure for 12 sources, and a single compact core for the remaining 11. No compact doubles were seen. Compared with other radio images at different epochs and/or different frequencies, 3 core-jet blazars show evidence of bent jets, and there is some evidence for superluminal motion in the cases of 2 blazars. The detailed descriptions for individual blazars are given. This is the second part of a survey: the first part was reported by Shen et al. (AJ 114(1997)1999).
△ Less
Submitted 10 March, 1998;
originally announced March 1998.
-
A 5 GHz Southern Hemisphere VLBI Survey of Compact Radio SOurces - I
Authors:
Z. -Q. Shen,
T. -S. Wan,
J. M. Moran,
D. L. Jauncey,
J. E. Reynolds,
A. K. Tzioumis,
R. G. Gough,
R. H. Ferris,
M. W. Sinclair,
D. -R. Jiang,
X. -Y. Hong,
S. -G. Liang,
M. E. Costa,
S. J. Tingay,
P. M. McCulloch,
J. E. J. Lovell,
E. A. King,
G. D. Nicolson,
D. W. Murphy,
D. L. Meier,
T. D. van Ommen,
P. G. Edwards
Abstract:
We report the results of a 5 GHz southern hemisphere VLBI survey of compact extragalactic radio sources. These observations were undertaken with the SHEVE array plus Shanghai station in November 1992. A sample of 22 sources was observed and images of 20 of them were obtained. Of the 20 sources imaged, 15 showed core-jet structure, one had a two-sided jet and 4 had only single compact cores. Elev…
▽ More
We report the results of a 5 GHz southern hemisphere VLBI survey of compact extragalactic radio sources. These observations were undertaken with the SHEVE array plus Shanghai station in November 1992. A sample of 22 sources was observed and images of 20 of them were obtained. Of the 20 sources imaged, 15 showed core-jet structure, one had a two-sided jet and 4 had only single compact cores. Eleven of the 16 core-jet (including one two-sided jet) sources show some evidence of bent jets. No compact doubles were found. A comparison with previous images and the temporal variability of the radio flux density showed evidence for superluminal motion in 4 of the sources. Five sources were high energy (>100 MeV) gamma-ray sources. Statistical analysis showed the dominance of highly polarized quasars among the detected gamma-ray sources, which emphasizes the importance of beaming effect in the gamma-ray emission.
△ Less
Submitted 14 September, 1997;
originally announced September 1997.
-
Interstellar Broadening of Images in the Gravitational Lens PKS 1830-211
Authors:
D. L. Jones,
R. A. Preston,
D. W. Murphy,
D. L. Jauncey,
J. E. Reynolds,
A. K. Tzioumis,
E. A. King,
P. M. McCulloch,
J. E. J. Lovell,
M. E. Costa,
T. D. van Ommen
Abstract:
The remarkably strong radio gravitational lens PKS 1830-211 consists of a one arcsecond diameter Einstein ring with two bright compact (milliarcsecond) components located on opposite sides of the ring. We have obtained 22 GHz VLBA data on this source to determine the intrinsic angular sizes of the compact components. Previous VLBI observations at lower frequencies indicate that the brightness te…
▽ More
The remarkably strong radio gravitational lens PKS 1830-211 consists of a one arcsecond diameter Einstein ring with two bright compact (milliarcsecond) components located on opposite sides of the ring. We have obtained 22 GHz VLBA data on this source to determine the intrinsic angular sizes of the compact components. Previous VLBI observations at lower frequencies indicate that the brightness temperatures of these components are significantly lower than 10E10 K, less than is typical for compact synchrotron radio sources and less than is implied by the short timescales of flux density variations. A possible explanation is that interstellar scattering is broadening the apparent angular size of the source and thereby reducing the observed brightness temperature. Our VLBA data support this hypothesis. At 22 GHz the measured brightness temperature is at least 10E11 K, and the deconvolved size of the core in the southwest compact component is proportional to the wavelength squared between 1.3 cm (22 GHz) and 18 cm (1.7 GHz). VLBI observations at still higher frequencies should be unaffected by interstellar scattering.
△ Less
Submitted 5 August, 1996;
originally announced August 1996.
-
Discovery of a Sub-Parsec Radio Counterjet in the Nucleus of Centaurus A
Authors:
D. L. Jones,
S. J. Tingay,
D. W. Murphy,
D. L. Meier,
D. L. Jauncey,
J. E. Reynolds,
A. K. Tzioumis,
R. A. Preston,
P. M. McCulloch,
M. E. Costa,
A. J. Kemball,
G. D. Nicolson,
J. F. H. Quick,
E. A. King,
J. E. J. Lovell,
R. W. Clay,
R. H. Ferris,
R. G. Gough,
M. W. Sinclair,
S. P. Ellingsen,
P. G. Edwards,
P. A. Jones,
T. D. van Ommen,
P. Harbison,
V. Migenes
Abstract:
A sub-parsec scale radio counterjet has been detected in the nucleus of the closest radio galaxy, Centaurus A (NGC 5128), with VLBI imaging at 2.3 and 8.4 GHz. This is one of the first detections of a VLBI counterjet and provides new constraints on the kinematics of the radio jets emerging from the nucleus of Cen A. A bright, compact core is seen at 8.4 GHz, along with a jet extending along P.A.…
▽ More
A sub-parsec scale radio counterjet has been detected in the nucleus of the closest radio galaxy, Centaurus A (NGC 5128), with VLBI imaging at 2.3 and 8.4 GHz. This is one of the first detections of a VLBI counterjet and provides new constraints on the kinematics of the radio jets emerging from the nucleus of Cen A. A bright, compact core is seen at 8.4 GHz, along with a jet extending along P.A. 51 degrees. The core is completely absorbed at 2.3 GHz. Our images show a much wider gap between the base of the main jet and the counterjet at 2.3 GHz than at 8.4 GHz and also that the core has an extraordinarily inverted spectrum. These observations provide evidence that the innermost 0.4-0.8 pc of the source is seen through a disk or torus of ionized gas which is opaque at low frequencies due to free-free absorption.
△ Less
Submitted 4 June, 1996;
originally announced June 1996.