-
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
Authors:
Mathieu Chevalley,
Patrick Schwab,
Arash Mehrjou
Abstract:
Targeted and uniform interventions to a system are crucial for unveiling causal relationships. While several methods have been developed to leverage interventional data for causal structure learning, their practical application in real-world scenarios often remains challenging. Recent benchmark studies have highlighted these difficulties, even when large numbers of single-variable intervention sam…
▽ More
Targeted and uniform interventions to a system are crucial for unveiling causal relationships. While several methods have been developed to leverage interventional data for causal structure learning, their practical application in real-world scenarios often remains challenging. Recent benchmark studies have highlighted these difficulties, even when large numbers of single-variable intervention samples are available. In this work, we demonstrate, both theoretically and empirically, that such datasets contain a wealth of causal information that can be effectively extracted under realistic assumptions about the data distribution. More specifically, we introduce the notion of interventional faithfulness, which relies on comparisons between the marginal distributions of each variable across observational and interventional settings, and we introduce a score on causal orders. Under this assumption, we are able to prove strong theoretical guarantees on the optimum of our score that also hold for large-scale settings. To empirically verify our theory, we introduce Intersort, an algorithm designed to infer the causal order from datasets containing large numbers of single-variable interventions by approximately optimizing our score. Intersort outperforms baselines (GIES, PC and EASE) on almost all simulated data settings replicating common benchmarks in the field. Our proposed novel approach to modeling interventional datasets thus offers a promising avenue for advancing causal inference, highlighting significant potential for further enhancements under realistic assumptions.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data
Authors:
Mathieu Chevalley,
Jacob Sackett-Sanders,
Yusuf Roohani,
Pascal Notin,
Artemy Bakulin,
Dariusz Brzezinski,
Kaiwen Deng,
Yuanfang Guan,
Justin Hong,
Michael Ibrahim,
Wojciech Kotlowski,
Marcin Kowiel,
Panagiotis Misiakos,
Achille Nazaret,
Markus PĆ¼schel,
Chris Wendler,
Arash Mehrjou,
Patrick Schwab
Abstract:
In drug discovery, map** interactions between genes within cellular systems is a crucial early step. This helps formulate hypotheses regarding molecular mechanisms that could potentially be targeted by future medicines. The CausalBench Challenge was an initiative to invite the machine learning community to advance the state of the art in constructing gene-gene interaction networks. These network…
▽ More
In drug discovery, map** interactions between genes within cellular systems is a crucial early step. This helps formulate hypotheses regarding molecular mechanisms that could potentially be targeted by future medicines. The CausalBench Challenge was an initiative to invite the machine learning community to advance the state of the art in constructing gene-gene interaction networks. These networks, derived from large-scale, real-world datasets of single cells under various perturbations, are crucial for understanding the causal mechanisms underlying disease biology. Using the framework provided by the CausalBench benchmark, participants were tasked with enhancing the capacity of the state of the art methods to leverage large-scale genetic perturbation data. This report provides an analysis and summary of the methods submitted during the challenge to give a partial image of the state of the art at the time of the challenge. The winning solutions significantly improved performance compared to previous baselines, establishing a new state of the art for this critical task in biology and medicine.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation Data
Authors:
Mathieu Chevalley,
Yusuf Roohani,
Arash Mehrjou,
Jure Leskovec,
Patrick Schwab
Abstract:
Causal inference is a vital aspect of multiple scientific disciplines and is routinely applied to high-impact applications such as medicine. However, evaluating the performance of causal inference methods in real-world environments is challenging due to the need for observations under both interventional and control conditions. Traditional evaluations conducted on synthetic datasets do not reflect…
▽ More
Causal inference is a vital aspect of multiple scientific disciplines and is routinely applied to high-impact applications such as medicine. However, evaluating the performance of causal inference methods in real-world environments is challenging due to the need for observations under both interventional and control conditions. Traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. To address this, we introduce CausalBench, a benchmark suite for evaluating network inference methods on real-world interventional data from large-scale single-cell perturbation experiments. CausalBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics. A systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of current methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. Thus, CausalBench opens new avenues in causal network inference research and provides a principled and reliable way to track progress in leveraging real-world interventional data.
△ Less
Submitted 3 July, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Invariant Causal Mechanisms through Distribution Matching
Authors:
Mathieu Chevalley,
Charlotte Bunne,
Andreas Krause,
Stefan Bauer
Abstract:
Learning representations that capture the underlying data generating process is a key problem for data efficient and robust use of neural networks. One key property for robustness which the learned representation should capture and which recently received a lot of attention is described by the notion of invariance. In this work we provide a causal perspective and new algorithm for learning invaria…
▽ More
Learning representations that capture the underlying data generating process is a key problem for data efficient and robust use of neural networks. One key property for robustness which the learned representation should capture and which recently received a lot of attention is described by the notion of invariance. In this work we provide a causal perspective and new algorithm for learning invariant representations. Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization, where we are able to significantly boost the score of existing models.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
By the user, for the user: A user-centric approach to quantifying the privacy of websites
Authors:
Matius Chairani,
Mathieu Chevalley,
Abderrahmane Lazraq,
Sruti Bhagavatula
Abstract:
Third-party tracking is common on almost all commercially operated websites. Prior work has studied in detail the extent of third-party tracking on the web, detection of third-party trackers, and defending against third-party tracking. Existing research and tools have also attempted to inform web users of trackers and the extent of their privacy violations. However, existing tools do not take into…
▽ More
Third-party tracking is common on almost all commercially operated websites. Prior work has studied in detail the extent of third-party tracking on the web, detection of third-party trackers, and defending against third-party tracking. Existing research and tools have also attempted to inform web users of trackers and the extent of their privacy violations. However, existing tools do not take into account users' perceptions of and understanding of the extent of trackers on the web. Taking these factors into account is important for the usability of such tools so that users can be aware and protect themselves to a reasonable and necessary extent that aligns with their overall comfort with trackers. In this paper, we elicit user perceptions and preferences about different trackers on various websites through an online survey of 43 users. We use this data to bootstrap a privacy scoring system. This scoring system weights the usage of trackers and the dispersion of user data within a page to third parties, with the type of website being visited. Our work presents a proof-of-concept methodology and tool to calculate a user-centric privacy score with preliminary bootstrap user data. We conclude with concrete future directions.
△ Less
Submitted 15 November, 2019; v1 submitted 13 November, 2019;
originally announced November 2019.