-
Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation
Authors:
Lucile Ter-Minassian,
Liran Szlak,
Ehud Karavani,
Chris Holmes,
Yishai Shimoni
Abstract:
Interpretability and transparency are essential for incorporating causal effect models from observational data into policy decision-making. They can provide trust for the model in the absence of ground truth labels to evaluate the accuracy of such models. To date, attempts at transparent causal effect estimation consist of applying post hoc explanation methods to black-box models, which are not in…
▽ More
Interpretability and transparency are essential for incorporating causal effect models from observational data into policy decision-making. They can provide trust for the model in the absence of ground truth labels to evaluate the accuracy of such models. To date, attempts at transparent causal effect estimation consist of applying post hoc explanation methods to black-box models, which are not interpretable. Here, we present BICauseTree: an interpretable balancing method that identifies clusters where natural experiments occur locally. Our approach builds on decision trees with a customized objective function to improve balancing and reduce treatment allocation bias. Consequently, it can additionally detect subgroups presenting positivity violations, exclude them, and provide a covariate-based definition of the target population we can infer from and generalize to. We evaluate the method's performance using synthetic and realistic datasets, explore its bias-interpretability tradeoff, and show that it is comparable with existing approaches.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Propensity score models are better when post-calibrated
Authors:
Rom Gutman,
Ehud Karavani,
Yishai Shimoni
Abstract:
Theoretical guarantees for causal inference using propensity scores are partly based on the scores behaving like conditional probabilities. However, scores between zero and one, especially when outputted by flexible statistical estimators, do not necessarily behave like probabilities. We perform a simulation study to assess the error in estimating the average treatment effect before and after appl…
▽ More
Theoretical guarantees for causal inference using propensity scores are partly based on the scores behaving like conditional probabilities. However, scores between zero and one, especially when outputted by flexible statistical estimators, do not necessarily behave like probabilities. We perform a simulation study to assess the error in estimating the average treatment effect before and after applying a simple and well-established post-processing method to calibrate the propensity scores. We find that post-calibration reduces the error in effect estimation for expressive uncalibrated statistical estimators, and that this improvement is not mediated by better balancing. The larger the initial lack of calibration, the larger the improvement in effect estimation, with the effect on already-calibrated estimators being very small. Given the improvement in effect estimation and that post-calibration is computationally cheap, we recommend it will be adopted when modelling propensity scores with expressive models.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
The influence of Equation of State on impact dynamics between Pluto-like bodies
Authors:
Yonatan Shimoni,
Oded Aharonson,
Raluca Rufu
Abstract:
Impacts between planetary-sized bodies can explain the origin of satellites orbiting large ($R>500$~km) trans-Neptunian objects. Their water rich composition, along with the complex phase diagram of water, make it important to accurately model the wide range of thermodynamic conditions material experiences during an impact event and in the debris disk. Since differences in the thermodynamics may i…
▽ More
Impacts between planetary-sized bodies can explain the origin of satellites orbiting large ($R>500$~km) trans-Neptunian objects. Their water rich composition, along with the complex phase diagram of water, make it important to accurately model the wide range of thermodynamic conditions material experiences during an impact event and in the debris disk. Since differences in the thermodynamics may influence the system dynamics, we seek to evaluate how the choice of an equation of state (EOS) alters the system's evolution. Specifically, we compare two EOSs that are constructed by different approaches: either by a simplified analytic description (Tillotson), or by interpolation of tabulated data (Sesame). Approximately $50$ pairs of Smoothed Particle Hydrodynamics impact simulations were performed, with similar initial conditions but different EOSs, in the parameter space in which the Pluto-Charon binary is thought to form (slow impacts between Pluto-size, water rich bodies). Generally, we show that impact outcomes (e.g., circumplanetary debris disk) are consistent between EOSs. Some differences arise, importantly in the production of satellitesimals (large intact clumps) that form in the post-impact debris disk. When utilizing an analytic EOS, the emergence of satellitesimals is highly certain, while when using the tabulated EOS it is less common. This is because for the typical densities and energies experienced in these impacts, the analytic EOS predicts very low pressure values, leading to particles artificially aggregating by a tensile instability.
△ Less
Submitted 10 September, 2021;
originally announced September 2021.
-
A discriminative approach for finding and characterizing positivity violations using decision trees
Authors:
Ehud Karavani,
Peter Bak,
Yishai Shimoni
Abstract:
The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or th…
▽ More
The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference
Authors:
Yishai Shimoni,
Ehud Karavani,
Sivan Ravid,
Peter Bak,
Tan Hung Ng,
Sharon Hensley Alford,
Denise Meade,
Yaara Goldschmidt
Abstract:
Real world observational data, together with causal inference, allow the estimation of causal effects when randomized controlled trials are not available. To be accepted into practice, such predictive models must be validated for the dataset at hand, and thus require a comprehensive evaluation toolkit, as introduced here. Since effect estimation cannot be evaluated directly, we turn to evaluating…
▽ More
Real world observational data, together with causal inference, allow the estimation of causal effects when randomized controlled trials are not available. To be accepted into practice, such predictive models must be validated for the dataset at hand, and thus require a comprehensive evaluation toolkit, as introduced here. Since effect estimation cannot be evaluated directly, we turn to evaluating the various observable properties of causal inference, namely the observed outcome and treatment assignment. We developed a toolkit that expands established machine learning evaluation methods and adds several causal-specific ones. Evaluations can be applied in cross-validation, in a train-test scheme, or on the training data. Multiple causal inference methods are implemented within the toolkit in a way that allows modular use of the underlying machine learning models. Thus, the toolkit is agnostic to the machine learning model that is used. We showcase our approach using a rheumatoid arthritis cohort (consisting of about 120K patients) extracted from the IBM MarketScan(R) Research Database. We introduce an iterative pipeline of data definition, model definition, and model evaluation. Using this pipeline, we demonstrate how each of the evaluation components helps drive model selection and refinement of data extraction criteria in a way that provides more reproducible results and ensures that the causal question is answerable with available data. Furthermore, we show how the evaluation toolkit can be used to ensure that performance is maintained when applied to subsets of the data, thus allowing exploration of questions that move towards personalized medicine.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis
Authors:
Yishai Shimoni,
Chen Yanover,
Ehud Karavani,
Yaara Goldschmnidt
Abstract:
Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a patient's outcome. Compared to classic machine learning methods, evaluation and validation of causal inference analysis is more challenging because ground truth da…
▽ More
Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a patient's outcome. Compared to classic machine learning methods, evaluation and validation of causal inference analysis is more challenging because ground truth data of counter-factual outcome can never be obtained in any real-world scenario. Here, we present a comprehensive framework for benchmarking algorithms that estimate causal effect. The framework includes unlabeled data for prediction, labeled data for validation, and code for automatic evaluation of algorithm predictions using both established and novel metrics. The data is based on real-world covariates, and the treatment assignments and outcomes are based on simulations, which provides the basis for validation. In this framework we address two questions: one of scaling, and the other of data-censoring. The framework is available as open source code at https://github.com/IBM-HRL-MLHLS/IBM-Causal-Inference-Benchmarking-Framework
△ Less
Submitted 20 March, 2018; v1 submitted 14 February, 2018;
originally announced February 2018.
-
Stochastic analysis of bistability in coherent mixed feedback loops combining transcriptional and post-transcriptional regulations
Authors:
Mor Nitzan,
Yishai Shimoni,
Oded Rosolio,
Hanah Margalit,
Ofer Biham
Abstract:
Mixed feedback loops combining transcriptional and post-transcriptional regulations are common in cellular regulatory networks. They consist of two genes, encoding a transcription factor and a small non-coding RNA (sRNA), which mutually regulate each other's expression. We present a theoretical and numerical study of coherent mixed feedback loops of this type, in which both regulations are negativ…
▽ More
Mixed feedback loops combining transcriptional and post-transcriptional regulations are common in cellular regulatory networks. They consist of two genes, encoding a transcription factor and a small non-coding RNA (sRNA), which mutually regulate each other's expression. We present a theoretical and numerical study of coherent mixed feedback loops of this type, in which both regulations are negative. Under suitable conditions, these feedback loops are expected to exhibit bistability, namely two stable states, one dominated by the transcriptional repressor and the other dominated by the sRNA. We use deterministic methods based on rate equation models, in order to identify the range of parameters in which bistability takes place. However, the deterministic models do not account for the finite lifetimes of the bistable states and the spontaneous, fluctuation-driven transitions between them. Therefore, we use stochastic methods to calculate the average lifetimes of the two states. It is found that these lifetimes strongly depend on rate coefficients such as the transcription rates of the transcriptional repressor and the sRNA. In particular, we show that the fraction of time the system spends in the sRNA dominated state follows a monotonically decreasing sigmoid function of the transcriptional repressor transcription rate. The biological relevance of these results is discussed in the context of such mixed feedback loops in {\it Escherichia coli}.
△ Less
Submitted 2 December, 2014;
originally announced December 2014.
-
Entanglement of Periodic States, the Quantum Fourier Transform and Shor's Factoring Algorithm
Authors:
Yonatan Most,
Yishai Shimoni,
Ofer Biham
Abstract:
The preprocessing stage of Shor's algorithm generates a class of quantum states referred to as periodic states, on which the quantum Fourier transform is applied. Such states also play an important role in other quantum algorithms that rely on the quantum Fourier transform. Since entanglement is believed to be a necessary resource for quantum computational speedup, we analyze the entanglement of p…
▽ More
The preprocessing stage of Shor's algorithm generates a class of quantum states referred to as periodic states, on which the quantum Fourier transform is applied. Such states also play an important role in other quantum algorithms that rely on the quantum Fourier transform. Since entanglement is believed to be a necessary resource for quantum computational speedup, we analyze the entanglement of periodic states and the way it is affected by the quantum Fourier transform. To this end, we derive a formula that evaluates the Groverian entanglement measure for periodic states. Using this formula, we explain the surprising result that the Groverian entanglement of the periodic states built up during the preprocessing stage is only slightly affected by the quantum Fourier transform.
△ Less
Submitted 20 May, 2010; v1 submitted 18 January, 2010;
originally announced January 2010.
-
Formation of Multipartite Entanglement Using Random Quantum Gates
Authors:
Yonatan Most,
Yishai Shimoni,
Ofer Biham
Abstract:
The formation of multipartite quantum entanglement by repeated operation of one and two qubit gates is examined. The resulting entanglement is evaluated using two measures: the average bipartite entanglement and the Groverian measure. A comparison is made between two geometries of the quantum register: a one dimensional chain in which two-qubit gates apply only locally between nearest neighbors…
▽ More
The formation of multipartite quantum entanglement by repeated operation of one and two qubit gates is examined. The resulting entanglement is evaluated using two measures: the average bipartite entanglement and the Groverian measure. A comparison is made between two geometries of the quantum register: a one dimensional chain in which two-qubit gates apply only locally between nearest neighbors and a non-local geometry in which such gates may apply between any pair of qubits. More specifically, we use a combination of random single qubit rotations and a fixed two-qubit gate such as the controlled-phase gate. It is found that in the non-local geometry the entanglement is generated at a higher rate. In both geometries, the Groverian measure converges to its asymptotic value more slowly than the average bipartite entanglement. These results are expected to have implications on different proposed geometries of future quantum computers with local and non-local interactions between the qubits.
△ Less
Submitted 26 August, 2007;
originally announced August 2007.
-
Groverian Entanglement Measure of Pure Quantum States with Arbitrary Partitions
Authors:
Yishai Shimoni,
Ofer Biham
Abstract:
The Groverian entanglement measure of pure quantum states of $n$ qubits is generalized to the case in which the qubits are divided into any $m \le n$ parties and the entanglement between these parties is evaluated. To demonstrate this measure we apply it to general states of three qubits and to symmetric states with any number of qubits such as the Greenberg-Horne-Zeiliner state and the W state.
The Groverian entanglement measure of pure quantum states of $n$ qubits is generalized to the case in which the qubits are divided into any $m \le n$ parties and the entanglement between these parties is evaluated. To demonstrate this measure we apply it to general states of three qubits and to symmetric states with any number of qubits such as the Greenberg-Horne-Zeiliner state and the W state.
△ Less
Submitted 18 February, 2007;
originally announced February 2007.
-
Entangled Quantum States Generated by Shor's Factoring Algorithm
Authors:
Yishai Shimoni,
Daniel Shapira,
Ofer Biham
Abstract:
The intermediate quantum states of multiple qubits, generated during the operation of Shor's factoring algorithm are analyzed. Their entanglement is evaluated using the Groverian measure. It is found that the entanglement is generated during the pre-processing stage of the algorithm and remains nearly constant during the quantum Fourier transform stage. The entanglement is found to be correlated…
▽ More
The intermediate quantum states of multiple qubits, generated during the operation of Shor's factoring algorithm are analyzed. Their entanglement is evaluated using the Groverian measure. It is found that the entanglement is generated during the pre-processing stage of the algorithm and remains nearly constant during the quantum Fourier transform stage. The entanglement is found to be correlated with the speedup achieved by the quantum algorithm compared to classical algorithms.
△ Less
Submitted 6 October, 2005;
originally announced October 2005.
-
The Groverian Measure of Entanglement for Mixed States
Authors:
Daniel Shapira,
Yishai Shimoni,
Ofer Biham
Abstract:
The Groverian entanglement measure introduced earlier for pure quantum states [O. Biham, M.A. Nielsen and T. Osborne, Phys. Rev. A 65, 062312 (2002)] is generalized to the case of mixed states, in a way that maintains its operational interpretation. The Groverian measure of a mixed state of n qubits is obtained by a purification procedure into a pure state of 2n qubits, followed by an optimizati…
▽ More
The Groverian entanglement measure introduced earlier for pure quantum states [O. Biham, M.A. Nielsen and T. Osborne, Phys. Rev. A 65, 062312 (2002)] is generalized to the case of mixed states, in a way that maintains its operational interpretation. The Groverian measure of a mixed state of n qubits is obtained by a purification procedure into a pure state of 2n qubits, followed by an optimization process based on Uhlmann's theorem, before the resulting state is fed into Grover's search algorithm. The Groverian measure, expressed in terms of the maximal success probability of the algorithm, provides an operational measure of entanglement of both pure and mixed quantum states of multiple qubits. These results may provide further insight into the role of entanglement in making quantum algorithms powerful.
△ Less
Submitted 15 August, 2005;
originally announced August 2005.
-
Algebraic analysis of quantum search with pure and mixed states
Authors:
D. Shapira,
Y. Shimoni,
O. Biham
Abstract:
An algebraic analysis of Grover's quantum search algorithm is presented for the case in which the initial state is an arbitrary pure quantum state of n qubits. This approach reveals the geometrical structure of the quantum search process, which turns out to be confined to a four-dimensional subspace of the Hilbert space. This work unifies and generalizes earlier results on the time evolution of…
▽ More
An algebraic analysis of Grover's quantum search algorithm is presented for the case in which the initial state is an arbitrary pure quantum state of n qubits. This approach reveals the geometrical structure of the quantum search process, which turns out to be confined to a four-dimensional subspace of the Hilbert space. This work unifies and generalizes earlier results on the time evolution of the amplitudes during the quantum search, the optimal number of iterations and the success probability. Furthermore, it enables a direct generalization to the case in which the initial state is a mixed state, providing an exact formula for the success probability.
△ Less
Submitted 20 April, 2005;
originally announced April 2005.
-
Characterization of pure quantum states of multiple qubits using the Groverian entanglement measure
Authors:
Yishai Shimoni,
Daniel Shapira,
Ofer Biham
Abstract:
The Groverian entanglement measure, G(psi), is applied to characterize a variety of pure quantum states |psi> of multiple qubits. The Groverian measure is calculated analytically for certain states of high symmetry, while for arbitrary states it is evaluated using a numerical procedure. In particular, it is calculated for the class of Greenberger-Horne-Zeilinger states, the W states as well as f…
▽ More
The Groverian entanglement measure, G(psi), is applied to characterize a variety of pure quantum states |psi> of multiple qubits. The Groverian measure is calculated analytically for certain states of high symmetry, while for arbitrary states it is evaluated using a numerical procedure. In particular, it is calculated for the class of Greenberger-Horne-Zeilinger states, the W states as well as for random pure states of n qubits. The entanglement generated by Grover's algorithm is evaluated by calculating G(psi) for the intermediate states that are obtained after t Grover iterations, for various initial states and for different sets of the marked states.
△ Less
Submitted 7 September, 2003;
originally announced September 2003.
-
Analysis of Grover's quantum search algorithm as a dynamical system
Authors:
O. Biham,
D. Shapira,
Y. Shimoni
Abstract:
Grover's quantum search algorithm is analyzed for the case in which the initial state is an arbitrary pure quantum state $|φ>$ of $n$ qubits. It is shown that the optimal time to perform the measurement is independent of $| φ>$, namely, it is identical to the optimal time in the original algorithm in which $| φ> = | 0>$, with the same number of marked states, $r$. The probability of success…
▽ More
Grover's quantum search algorithm is analyzed for the case in which the initial state is an arbitrary pure quantum state $|φ>$ of $n$ qubits. It is shown that the optimal time to perform the measurement is independent of $| φ>$, namely, it is identical to the optimal time in the original algorithm in which $| φ> = | 0>$, with the same number of marked states, $r$. The probability of success $P_{\rm s}$ is obtained, in terms of the amplitudes of the state $| φ>$, and is shown to be independent of $r$. A class of states, which includes fixed points and cycles of the Grover iteration operator is identified. The relevance of these results in the context of using the success probability as an entanglement measure is discussed. In particular, the Groverian entanglement measure, previously limited to a single marked state, is generalized to the case of several marked states.
△ Less
Submitted 20 July, 2003;
originally announced July 2003.