Search | arXiv e-print repository

Large Language Models for Constrained-Based Causal Discovery

Authors: Kai-Hendrik Cohrs, Gherardo Varando, Emiliano Diaz, Vasileios Sitokonstantinou, Gustau Camps-Valls

Abstract: Causality is essential for understanding complex systems, such as the economy, the brain, and the climate. Constructing causal graphs often relies on either data-driven or expert-driven approaches, both fraught with challenges. The former methods, like the celebrated PC algorithm, face issues with data requirements and assumptions of causal sufficiency, while the latter demand substantial time and… ▽ More Causality is essential for understanding complex systems, such as the economy, the brain, and the climate. Constructing causal graphs often relies on either data-driven or expert-driven approaches, both fraught with challenges. The former methods, like the celebrated PC algorithm, face issues with data requirements and assumptions of causal sufficiency, while the latter demand substantial time and domain knowledge. This work explores the capabilities of Large Language Models (LLMs) as an alternative to domain experts for causal graph generation. We frame conditional independence queries as prompts to LLMs and employ the PC algorithm with the answers. The performance of the LLM-based conditional independence oracle on systems with known causal graphs shows a high degree of variability. We improve the performance through a proposed statistical-inspired voting schema that allows some control over false-positive and false-negative rates. Inspecting the chain-of-thought argumentation, we find causal reasoning to justify its answer to a probabilistic query. We show evidence that knowledge-based CIT could eventually become a complementary tool for data-driven causal discovery. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.18306 [pdf, other]

Learning Staged Trees from Incomplete Data

Authors: Jack Storror Carter, Manuele Leonelli, Eva Riccomagno, Gherardo Varando

Abstract: Staged trees are probabilistic graphical models capable of representing any class of non-symmetric independence via a coloring of its vertices. Several structural learning routines have been defined and implemented to learn staged trees from data, under the frequentist or Bayesian paradigm. They assume a data set has been observed fully and, in practice, observations with missing entries are eithe… ▽ More Staged trees are probabilistic graphical models capable of representing any class of non-symmetric independence via a coloring of its vertices. Several structural learning routines have been defined and implemented to learn staged trees from data, under the frequentist or Bayesian paradigm. They assume a data set has been observed fully and, in practice, observations with missing entries are either dropped or imputed before learning the model. Here, we introduce the first algorithms for staged trees that handle missingness within the learning of the model. To this end, we characterize the likelihood of staged tree models in the presence of missing data and discuss pseudo-likelihoods that approximate it. A structural expectation-maximization algorithm estimating the model directly from the full likelihood is also implemented and evaluated. A computational experiment showcases the performance of the novel learning algorithms, demonstrating that it is feasible to account for different missingness patterns when learning staged trees. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18298 [pdf, other]

Context-Specific Refinements of Bayesian Network Classifiers

Authors: Manuele Leonelli, Gherardo Varando

Abstract: Supervised classification is one of the most ubiquitous tasks in machine learning. Generative classifiers based on Bayesian networks are often used because of their interpretability and competitive accuracy. The widely used naive and TAN classifiers are specific instances of Bayesian network classifiers with a constrained underlying graph. This paper introduces novel classes of generative classifi… ▽ More Supervised classification is one of the most ubiquitous tasks in machine learning. Generative classifiers based on Bayesian networks are often used because of their interpretability and competitive accuracy. The widely used naive and TAN classifiers are specific instances of Bayesian network classifiers with a constrained underlying graph. This paper introduces novel classes of generative classifiers extending TAN and other famous types of Bayesian network classifiers. Our approach is based on staged tree models, which extend Bayesian networks by allowing for complex, context-specific patterns of dependence. We formally study the relationship between our novel classes of classifiers and Bayesian networks. We introduce and implement data-driven learning routines for our models and investigate their accuracy in an extensive computational study. The study demonstrates that models embedding asymmetric information can enhance classification accuracy. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2206.06970

arXiv:2403.14228 [pdf, other]

Recovering Latent Confounders from High-dimensional Proxy Variables

Authors: Nathan Mankovich, Homer Durand, Emiliano Diaz, Gherardo Varando, Gustau Camps-Valls

Abstract: Detecting latent confounders from proxy variables is an essential problem in causal effect estimation. Previous approaches are limited to low-dimensional proxies, sorted proxies, and binary treatments. We remove these assumptions and present a novel Proxy Confounder Factorization (PCF) framework for continuous treatment effect estimation when latent confounders manifest through high-dimensional, m… ▽ More Detecting latent confounders from proxy variables is an essential problem in causal effect estimation. Previous approaches are limited to low-dimensional proxies, sorted proxies, and binary treatments. We remove these assumptions and present a novel Proxy Confounder Factorization (PCF) framework for continuous treatment effect estimation when latent confounders manifest through high-dimensional, mixed proxy variables. For specific sample sizes, our two-step PCF implementation, using Independent Component Analysis (ICA-PCF), and the end-to-end implementation, using Gradient Descent (GD-PCF), achieve high correlation with the latent confounder and low absolute error in causal effect estimation with synthetic datasets in the high sample size regime. Even when faced with climate data, ICA-PCF recovers four components that explain $75.9\%$ of the variance in the North Atlantic Oscillation, a known confounder of precipitation patterns in Europe. Code for our PCF implementations and experiments can be found here: https://github.com/IPL-UV/confound_it. The proposed methodology constitutes a step** stone towards discovering latent confounders and can be applied to many problems in disciplines dealing with high-dimensional observed proxies, e.g., spatiotemporal fields. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.01865 [pdf, other]

Improving generalisation via anchor multivariate analysis

Authors: Homer Durand, Gherardo Varando, Nathan Mankovich, Gustau Camps-Valls

Abstract: We introduce a causal regularisation extension to anchor regression (AR) for improved out-of-distribution (OOD) generalisation. We present anchor-compatible losses, aligning with the anchor framework to ensure robustness against distribution shifts. Various multivariate analysis (MVA) algorithms, such as (Orthonormalized) PLS, RRR, and MLR, fall within the anchor framework. We observe that simple… ▽ More We introduce a causal regularisation extension to anchor regression (AR) for improved out-of-distribution (OOD) generalisation. We present anchor-compatible losses, aligning with the anchor framework to ensure robustness against distribution shifts. Various multivariate analysis (MVA) algorithms, such as (Orthonormalized) PLS, RRR, and MLR, fall within the anchor framework. We observe that simple regularisation enhances robustness in OOD settings. Estimators for selected algorithms are provided, showcasing consistency and efficacy in synthetic and real-world climate science problems. The empirical validation highlights the versatility of anchor regularisation, emphasizing its compatibility with MVA approaches and its role in enhancing replicability while guarding against distribution shifts. The extended AR framework advances causal inference methodologies, addressing the need for reliable OOD generalisation. △ Less

Submitted 11 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 21 pages, 15 figures

MSC Class: 62Hxx

arXiv:2402.13332 [pdf, other]

Causal hybrid modeling with double machine learning

Authors: Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein, Gustau Camps-Valls

Abstract: Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing Double Machine… ▽ More Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing Double Machine Learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the $Q_{10}$ model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network (DNN) approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning. △ Less

Submitted 4 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2305.13341 [pdf, other]

Discovering Causal Relations and Equations from Data

Authors: Gustau Camps-Valls, Andreas Gerhardus, Urmi Ninad, Gherardo Varando, Georg Martius, Emili Balaguer-Ballester, Ricardo Vinuesa, Emiliano Diaz, Laure Zanna, Jakob Runge

Abstract: Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing t… ▽ More Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 137 pages

arXiv:2301.00629 [pdf, other]

Learning and interpreting asymmetry-labeled DAGs: a case study on COVID-19 fear

Authors: Manuele Leonelli, Gherardo Varando

Abstract: Bayesian networks are widely used to learn and reason about the dependence structure of discrete variables. However, they are only capable of formally encoding symmetric conditional independence, which in practice is often too strict to hold. Asymmetry-labeled DAGs have been recently proposed to both extend the class of Bayesian networks by relaxing the symmetric assumption of independence and den… ▽ More Bayesian networks are widely used to learn and reason about the dependence structure of discrete variables. However, they are only capable of formally encoding symmetric conditional independence, which in practice is often too strict to hold. Asymmetry-labeled DAGs have been recently proposed to both extend the class of Bayesian networks by relaxing the symmetric assumption of independence and denote the type of dependence existing between the variables of interest. Here, we introduce novel structural learning algorithms for this class of models which, whilst being efficient, allow for a straightforward interpretation of the underlying dependence structure. A comprehensive computational study highlights the efficiency of the algorithms. A real-world data application using data from the Fear of COVID-19 Scale collected in Italy showcases their use in practice. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2206.06970 [pdf, other]

Highly Efficient Structural Learning of Sparse Staged Trees

Authors: Manuele Leonelli, Gherardo Varando

Abstract: Several structural learning algorithms for staged tree models, an asymmetric extension of Bayesian networks, have been defined. However, they do not scale efficiently as the number of variables considered increases. Here we introduce the first scalable structural learning algorithm for staged trees, which searches over a space of models where only a small number of dependencies can be imposed. A s… ▽ More Several structural learning algorithms for staged tree models, an asymmetric extension of Bayesian networks, have been defined. However, they do not scale efficiently as the number of variables considered increases. Here we introduce the first scalable structural learning algorithm for staged trees, which searches over a space of models where only a small number of dependencies can be imposed. A simulation study as well as a real-world application illustrate our routines and the practical use of such data-learned staged trees. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.04390

arXiv:2203.04390 [pdf, other]

Structural Learning of Simple Staged Trees

Authors: Manuele Leonelli, Gherardo Varando

Abstract: Bayesian networks faithfully represent the symmetric conditional independences existing between the components of a random vector. Staged trees are an extension of Bayesian networks for categorical random vectors whose graph represents non-symmetric conditional independences via vertex coloring. However, since they are based on a tree representation of the sample space, the underlying graph become… ▽ More Bayesian networks faithfully represent the symmetric conditional independences existing between the components of a random vector. Staged trees are an extension of Bayesian networks for categorical random vectors whose graph represents non-symmetric conditional independences via vertex coloring. However, since they are based on a tree representation of the sample space, the underlying graph becomes cluttered and difficult to visualize as the number of variables increases. Here we introduce the first structural learning algorithms for the class of simple staged trees, entertaining a compact coalescence of the underlying tree from which non-symmetric independences can be easily read. We show that data-learned simple staged trees often outperform Bayesian networks in model fit and illustrate how the coalesced graph is used to identify non-symmetric conditional independences. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2108.01994 [pdf, other]

Staged trees and asymmetry-labeled DAGs

Authors: Gherardo Varando, Federico Carli, Manuele Leonelli

Abstract: Bayesian networks are a widely-used class of probabilistic graphical models capable of representing symmetric conditional independence between variables of interest using the topology of the underlying graph. For categorical variables, they can be seen as a special case of the much more general class of models called staged trees, which can represent any type of non-symmetric conditional independe… ▽ More Bayesian networks are a widely-used class of probabilistic graphical models capable of representing symmetric conditional independence between variables of interest using the topology of the underlying graph. For categorical variables, they can be seen as a special case of the much more general class of models called staged trees, which can represent any type of non-symmetric conditional independence. Here we formalize the relationship between these two models and introduce a minimal Bayesian network representation of the staged tree, which can be used to read conditional independences in an intutitive way. A new labeled graph termed asymmetry-labeled directed acyclic graph is defined, whose edges are labeled to denote the type of dependence existing between any two random variables. We also present a novel algorithm to learn staged trees which only enforces a specific subset of non-symmetric independences. Various datasets are used to illustrate the methodology, highlighting the need to construct models which more flexibly encode and represent non-symmetric structures. △ Less

Submitted 5 October, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

arXiv:2106.04416 [pdf, other]

Context-Specific Causal Discovery for Categorical Data Using Staged Trees

Authors: Manuele Leonelli, Gherardo Varando

Abstract: Causal discovery algorithms aim at untangling complex causal relationships from data. Here, we study causal discovery and inference methods based on staged tree models, which can represent complex and asymmetric causal relationships between categorical variables. We provide a first graphical representation of the equivalence class of a staged tree, by looking only at a specific subset of its under… ▽ More Causal discovery algorithms aim at untangling complex causal relationships from data. Here, we study causal discovery and inference methods based on staged tree models, which can represent complex and asymmetric causal relationships between categorical variables. We provide a first graphical representation of the equivalence class of a staged tree, by looking only at a specific subset of its underlying independences. We further define a new pre-metric, inspired by the widely used structural intervention distance, to quantify the closeness between two staged trees in terms of their corresponding causal inference statements. A simulation study highlights the efficacy of staged trees in uncovering complexes, asymmetric causal relationships from data, and real-world data applications illustrate their use in practical causal analysis. △ Less

Submitted 28 February, 2023; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2012.13798 [pdf, other]

A new class of generative classifiers based on staged tree models

Authors: Federico Carli, Manuele Leonelli, Gherardo Varando

Abstract: Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of… ▽ More Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of relationships that could exist, by not allowing for context-specific independences. Here we introduce a new class of generative classifiers, called staged tree classifiers, which formally account for context-specific independence. They are constructed by a partitioning of the vertices of an event tree from which conditional independence can be formally read. The naive staged tree classifier is also defined, which extends the classic naive Bayes classifier whilst retaining the same complexity. An extensive simulation study shows that the classification accuracy of staged tree classifiers is competitive with that of state-of-the-art classifiers and an example showcases their use in practice. △ Less

Submitted 4 August, 2022; v1 submitted 26 December, 2020; originally announced December 2020.

arXiv:2006.03005 [pdf, other]

Learning DAGs without imposing acyclicity

Authors: Gherardo Varando

Abstract: We explore if it is possible to learn a directed acyclic graph (DAG) from data without imposing explicitly the acyclicity constraint. In particular, for Gaussian distributions, we frame structural learning as a sparse matrix factorization problem and we empirically show that solving an $\ell_1$-penalized optimization yields to good recovery of the true graph and, in general, to almost-DAG graphs.… ▽ More We explore if it is possible to learn a directed acyclic graph (DAG) from data without imposing explicitly the acyclicity constraint. In particular, for Gaussian distributions, we frame structural learning as a sparse matrix factorization problem and we empirically show that solving an $\ell_1$-penalized optimization yields to good recovery of the true graph and, in general, to almost-DAG graphs. Moreover, this approach is computationally efficient and is not affected by the explosion of combinatorial complexity as in classical structural learning algorithms. △ Less

Submitted 4 June, 2020; originally announced June 2020.

Comments: 16 pages, 5 figures

arXiv:2006.01448 [pdf, other]

doi 10.1109/ACCESS.2020.3018593

Sparse Cholesky covariance parametrization for recovering latent structure in ordered data

Authors: Irene Córdoba, Concha Bielza, Pedro Larrañaga, Gherardo Varando

Abstract: The sparse Cholesky parametrization of the inverse covariance matrix can be interpreted as a Gaussian Bayesian network; however its counterpart, the covariance Cholesky factor, has received, with few notable exceptions, little attention so far, despite having a natural interpretation as a hidden variable model for ordered signal data. To fill this gap, in this paper we focus on arbitrary zero patt… ▽ More The sparse Cholesky parametrization of the inverse covariance matrix can be interpreted as a Gaussian Bayesian network; however its counterpart, the covariance Cholesky factor, has received, with few notable exceptions, little attention so far, despite having a natural interpretation as a hidden variable model for ordered signal data. To fill this gap, in this paper we focus on arbitrary zero patterns in the Cholesky factor of a covariance matrix. We discuss how these models can also be extended, in analogy with Gaussian Bayesian networks, to data where no apparent order is available. For the ordered scenario, we propose a novel estimation method that is based on matrix loss penalization, as opposed to the existing regression-based approaches. The performance of this sparse model for the Cholesky factor, together with our novel estimator, is assessed in a simulation setting, as well as over spatial and temporal real data where a natural ordering arises among the variables. We give guidelines, based on the empirical results, about which of the methods analysed is more appropriate for each setting. △ Less

Submitted 19 August, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: 24 pages, 12 figures

Journal ref: IEEE Access, 8: 154614-154624, 2020

arXiv:2005.10483 [pdf, other]

Graphical continuous Lyapunov models

Authors: Gherardo Varando, Niels Richard Hansen

Abstract: The linear Lyapunov equation of a covariance matrix parametrizes the equilibrium covariance matrix of a stochastic process. This parametrization can be interpreted as a new graphical model class, and we show how the model class behaves under marginalization and introduce a method for structure learning via $\ell_1$-penalized loss minimization. Our proposed method is demonstrated to outperform alte… ▽ More The linear Lyapunov equation of a covariance matrix parametrizes the equilibrium covariance matrix of a stochastic process. This parametrization can be interpreted as a new graphical model class, and we show how the model class behaves under marginalization and introduce a method for structure learning via $\ell_1$-penalized loss minimization. Our proposed method is demonstrated to outperform alternative structure learning algorithms in a simulation study, and we illustrate its application for protein phosphorylation network reconstruction. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 10 pages, 5 figures

arXiv:2002.09573 [pdf, ps, other]

Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values

Authors: Sebastian Weichwald, Martin E Jakobsen, Phillip B Mogensen, Lasse Petersen, Nikolaj Thams, Gherardo Varando

Abstract: In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth s… ▽ More In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in non-linear systems, b) a simulation-backed explanation as to why large regression coefficients may predict causal links better in practice than small p-values and thus why normalising the data may sometimes hinder causal structure learning. For benchmark usage, we detail the algorithms here and provide implementations at https://github.com/sweichwald/tidybench . We propose the presented competition-proven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series. △ Less

Submitted 2 September, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

Journal ref: Proceedings of the NeurIPS 2019 Competition and Demonstration Track, Proceedings of Machine Learning Research, 123:27-36, 2020 ( http://proceedings.mlr.press/v123/weichwald20a.html )

arXiv:1811.04759 [pdf, ps, other]

Markov Property in Generative Classifiers

Authors: Gherardo Varando, Concha Bielza, Pedro Larrañaga, Eva Riccomagno

Abstract: We show that, for generative classifiers, conditional independence corresponds to linear constraints for the induced discrimination functions. Discrimination functions of undirected Markov network classifiers can thus be characterized by sets of linear constraints. These constraints are represented by a second order finite difference operator over functions of categorical variables. As an applicat… ▽ More We show that, for generative classifiers, conditional independence corresponds to linear constraints for the induced discrimination functions. Discrimination functions of undirected Markov network classifiers can thus be characterized by sets of linear constraints. These constraints are represented by a second order finite difference operator over functions of categorical variables. As an application we study the expressive power of generative classifiers under the undirected Markov property and we present a general method to combine discriminative and generative classifiers. △ Less

Submitted 12 November, 2018; originally announced November 2018.

Showing 1–18 of 18 results for author: Varando, G