-
Mind the Graph When Balancing Data for Fairness or Robustness
Authors:
Jessica Schrouff,
Alexis Bellot,
Amal Rannen-Triki,
Alan Malek,
Isabela Albuquerque,
Arthur Gretton,
Alexander D'Amour,
Silvia Chiappa
Abstract:
Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or ro…
▽ More
Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
Authors:
Virginia Aglietti,
Ira Ktena,
Jessica Schrouff,
Eleni Sgouritsa,
Francisco J. R. Ruiz,
Alan Malek,
Alexis Bellot,
Silvia Chiappa
Abstract:
The sample efficiency of Bayesian optimization algorithms depends on carefully crafted acquisition functions (AFs) guiding the sequential collection of function evaluations. The best-performing AF can vary significantly across optimization problems, often requiring ad-hoc and problem-specific choices. This work tackles the challenge of designing novel AFs that perform well across a variety of expe…
▽ More
The sample efficiency of Bayesian optimization algorithms depends on carefully crafted acquisition functions (AFs) guiding the sequential collection of function evaluations. The best-performing AF can vary significantly across optimization problems, often requiring ad-hoc and problem-specific choices. This work tackles the challenge of designing novel AFs that perform well across a variety of experimental settings. Based on FunSearch, a recent work using Large Language Models (LLMs) for discovery in mathematical sciences, we propose FunBO, an LLM-based method that can be used to learn new AFs written in computer code by leveraging access to a limited number of evaluations for a set of objective functions. We provide the analytic expression of all discovered AFs and evaluate them on various global optimization benchmarks and hyperparameter optimization tasks. We show how FunBO identifies AFs that generalize well in and out of the training distribution of functions, thus outperforming established general-purpose AFs and achieving competitive performance against AFs that are customized to specific function types and are learned via transfer-learning algorithms.
△ Less
Submitted 1 July, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Towards Bounding Causal Effects under Markov Equivalence
Authors:
Alexis Bellot
Abstract:
Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that in general such questions cannot be answered definitively from observational data. This realization has fuelled a growing literature introducing various identifying assumptions, for example in the form of a causal diagram among relevant variables. In practice, this…
▽ More
Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that in general such questions cannot be answered definitively from observational data. This realization has fuelled a growing literature introducing various identifying assumptions, for example in the form of a causal diagram among relevant variables. In practice, this paradigm is still too rigid for many practical applications as it is generally not possible to confidently delineate the true causal diagram. In this paper, we consider the derivation of bounds on causal effects given only observational data. We propose to take as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from data. In this more ``data-driven'' setting, we provide a systematic algorithm to derive bounds on causal effects that exploit the invariant properties of the equivalence class, and that can be computed analytically. We demonstrate our method with synthetic and real data examples.
△ Less
Submitted 24 May, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Functional Causal Bayesian Optimization
Authors:
Limor Gultchin,
Virginia Aglietti,
Alexis Bellot,
Silvia Chiappa
Abstract:
We propose functional causal Bayesian optimization (fCBO), a method for finding interventions that optimize a target variable in a known causal graph. fCBO extends the CBO family of methods to enable functional interventions, which set a variable to be a deterministic function of other variables in the graph. fCBO models the unknown objectives with Gaussian processes whose inputs are defined in a…
▽ More
We propose functional causal Bayesian optimization (fCBO), a method for finding interventions that optimize a target variable in a known causal graph. fCBO extends the CBO family of methods to enable functional interventions, which set a variable to be a deterministic function of other variables in the graph. fCBO models the unknown objectives with Gaussian processes whose inputs are defined in a reproducing kernel Hilbert space, thus allowing to compute distances among vector-valued functions. In turn, this enables to sequentially select functions to explore by maximizing an expected improvement acquisition functional while kee** the typical computational tractability of standard BO settings. We introduce graphical criteria that establish when considering functional interventions allows attaining better target effects, and conditions under which selected interventions are also optimal for conditional target effects. We demonstrate the benefits of the method in a synthetic and in a real-world causal graph.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations
Authors:
Nabeel Seedat,
Fergus Imrie,
Alexis Bellot,
Zhaozhi Qian,
Mihaela van der Schaar
Abstract:
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer ''what-iF'' questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice.…
▽ More
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer ''what-iF'' questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice. To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time-series. To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all simulated scenarios with irregular sampling.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Generalization bounds and algorithms for estimating conditional average treatment effect of dosage
Authors:
Alexis Bellot,
Anish Dhir,
Giulia Prando
Abstract:
We investigate the task of estimating the conditional average causal effect of treatment-dosage pairs from a combination of observational data and assumptions on the causal relationships in the underlying system. This has been a longstanding challenge for fields of study such as epidemiology or economics that require a treatment-dosage pair to make decisions but may not be able to run randomized t…
▽ More
We investigate the task of estimating the conditional average causal effect of treatment-dosage pairs from a combination of observational data and assumptions on the causal relationships in the underlying system. This has been a longstanding challenge for fields of study such as epidemiology or economics that require a treatment-dosage pair to make decisions but may not be able to run randomized trials to precisely quantify their effect and heterogeneity across individuals. In this paper, we extend (Shalit et al, 2017) to give new bounds on the counterfactual generalization error in the context of a continuous dosage parameter which relies on a different approach to defining counterfactuals and assignment bias adjustment. This result then guides the definition of new learning objectives that can be used to train representation learning algorithms for which we show empirically new state-of-the-art performance results across several benchmark datasets for this problem, including in comparison to doubly-robust estimation methods.
△ Less
Submitted 29 May, 2022;
originally announced May 2022.
-
MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms
Authors:
Trent Kyono,
Yao Zhang,
Alexis Bellot,
Mihaela van der Schaar
Abstract:
Missing data is an important problem in machine learning practice. Starting from the premise that imputation methods should preserve the causal structure of the data, we develop a regularization scheme that encourages any baseline imputation method to be causally consistent with the underlying data generating mechanism. Our proposal is a causally-aware imputation algorithm (MIRACLE). MIRACLE itera…
▽ More
Missing data is an important problem in machine learning practice. Starting from the premise that imputation methods should preserve the causal structure of the data, we develop a regularization scheme that encourages any baseline imputation method to be causally consistent with the underlying data generating mechanism. Our proposal is a causally-aware imputation algorithm (MIRACLE). MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism, encouraging imputation to be consistent with the causal structure of the data. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation over a variety of benchmark methods across all three missingness scenarios: at random, completely at random, and not at random.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Neural graphical modelling in continuous-time: consistency guarantees and algorithms
Authors:
Alexis Bellot,
Kim Branson,
Mihaela van der Schaar
Abstract:
The discovery of structure from time series data is a key problem in fields of study working with complex systems. Most identifiability results and learning algorithms assume the underlying dynamics to be discrete in time. Comparatively few, in contrast, explicitly define dependencies in infinitesimal intervals of time, independently of the scale of observation and of the regularity of sampling. I…
▽ More
The discovery of structure from time series data is a key problem in fields of study working with complex systems. Most identifiability results and learning algorithms assume the underlying dynamics to be discrete in time. Comparatively few, in contrast, explicitly define dependencies in infinitesimal intervals of time, independently of the scale of observation and of the regularity of sampling. In this paper, we consider score-based structure learning for the study of dynamical systems. We prove that for vector fields parameterized in a large class of neural networks, least squares optimization with adaptive regularization schemes consistently recovers directed graphs of local independencies in systems of stochastic differential equations. Using this insight, we propose a score-based learning algorithm based on penalized Neural Ordinary Differential Equations (modelling the mean process) that we show to be applicable to the general setting of irregularly-sampled multivariate time series and to outperform the state of the art across a range of dynamical systems.
△ Less
Submitted 3 February, 2022; v1 submitted 6 May, 2021;
originally announced May 2021.
-
Deconfounded Score Method: Scoring DAGs with Dense Unobserved Confounding
Authors:
Alexis Bellot,
Mihaela van der Schaar
Abstract:
Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this paper we show that beyond conditional independencies, under the pr…
▽ More
Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this paper we show that beyond conditional independencies, under the principle of independent mechanisms, unobserved confounding in this setting leaves a statistical footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.
△ Less
Submitted 25 May, 2021; v1 submitted 28 March, 2021;
originally announced March 2021.
-
Policy Analysis using Synthetic Controls in Continuous-Time
Authors:
Alexis Bellot,
Mihaela van der Schaar
Abstract:
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference. Despite its popularity, the current description only considers time series aligned across units and synthetic controls expressed as linear combinations of observed control units. We propose a continuous-time alternative that models the latent counterfactual path…
▽ More
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference. Despite its popularity, the current description only considers time series aligned across units and synthetic controls expressed as linear combinations of observed control units. We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations. This model is directly applicable to the general setting of irregularly-aligned multivariate time series and may be optimized in rich function spaces -- thereby improving on some limitations of existing approaches.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Accounting for Unobserved Confounding in Domain Generalization
Authors:
Alexis Bellot,
Mihaela van der Schaar
Abstract:
This paper investigates the problem of learning robust, generalizable prediction models from a combination of multiple datasets and qualitative assumptions about the underlying data-generating model. Part of the challenge of learning robust models lies in the influence of unobserved confounders that void many of the invariances and principles of minimum error presently used for this problem. Our a…
▽ More
This paper investigates the problem of learning robust, generalizable prediction models from a combination of multiple datasets and qualitative assumptions about the underlying data-generating model. Part of the challenge of learning robust models lies in the influence of unobserved confounders that void many of the invariances and principles of minimum error presently used for this problem. Our approach is to define a different invariance property of causal solutions in the presence of unobserved confounders which, through a relaxation of this invariance, can be connected with an explicit distributionally robust optimization problem over a set of affine combination of data distributions. Concretely, our objective takes the form of a standard loss, plus a regularization term that encourages partial equality of error derivatives with respect to model parameters. We demonstrate the empirical performance of our approach on healthcare data from different modalities, including image, speech and tabular data.
△ Less
Submitted 3 February, 2022; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Learning Overlap** Representations for the Estimation of Individualized Treatment Effects
Authors:
Yao Zhang,
Alexis Bellot,
Mihaela van der Schaar
Abstract:
The choice of making an intervention depends on its potential benefit or harm in comparison to alternatives. Estimating the likely outcome of alternatives from observational data is a challenging problem as all outcomes are never observed, and selection bias precludes the direct comparison of differently intervened groups. Despite their empirical success, we show that algorithms that learn domain-…
▽ More
The choice of making an intervention depends on its potential benefit or harm in comparison to alternatives. Estimating the likely outcome of alternatives from observational data is a challenging problem as all outcomes are never observed, and selection bias precludes the direct comparison of differently intervened groups. Despite their empirical success, we show that algorithms that learn domain-invariant representations of inputs (on which to make predictions) are often inappropriate, and develop generalization bounds that demonstrate the dependence on domain overlap and highlight the need for invertible latent maps. Based on these results, we develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
△ Less
Submitted 17 February, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes
Authors:
Zhaozhi Qian,
Ahmed M. Alaa,
Alexis Bellot,
Jem Rashbass,
Mihaela van der Schaar
Abstract:
Comorbid diseases co-occur and progress via complex temporal patterns that vary among individuals. In electronic health records we can observe the different diseases a patient has, but can only infer the temporal relationship between each co-morbid condition. Learning such temporal patterns from event data is crucial for understanding disease pathology and predicting prognoses. To this end, we dev…
▽ More
Comorbid diseases co-occur and progress via complex temporal patterns that vary among individuals. In electronic health records we can observe the different diseases a patient has, but can only infer the temporal relationship between each co-morbid condition. Learning such temporal patterns from event data is crucial for understanding disease pathology and predicting prognoses. To this end, we develop deep diffusion processes (DDP) to model "dynamic comorbidity networks", i.e., the temporal relationships between comorbid disease onsets expressed through a dynamic graph. A DDP comprises events modelled as a multi-dimensional point process, with an intensity function parameterized by the edges of a dynamic weighted graph. The graph structure is modulated by a neural network that maps patient history to edge weights, enabling rich temporal representations for disease trajectories. The DDP parameters decouple into clinically meaningful components, which enables serving the dual purpose of accurate risk prediction and intelligible representation of disease pathology. We illustrate these features in experiments using cancer registry data.
△ Less
Submitted 19 January, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
A Bayesian Approach to Modelling Longitudinal Data in Electronic Health Records
Authors:
Alexis Bellot,
Mihaela van der Schaar
Abstract:
Analyzing electronic health records (EHR) poses significant challenges because often few samples are available describing a patient's health and, when available, their information content is highly diverse. The problem we consider is how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status and fixed demographic information to produce est…
▽ More
Analyzing electronic health records (EHR) poses significant challenges because often few samples are available describing a patient's health and, when available, their information content is highly diverse. The problem we consider is how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status and fixed demographic information to produce estimated survival distributions updated through a patient's follow up. We propose a nonparametric probabilistic model that generates survival trajectories from an ensemble of Bayesian trees that learns variable interactions over time without specifying beforehand the longitudinal process. We show performance improvements on Primary Biliary Cirrhosis patient data.
△ Less
Submitted 19 December, 2019;
originally announced December 2019.
-
Kernel Hypothesis Testing with Set-valued Data
Authors:
Alexis Bellot,
Mihaela van der Schaar
Abstract:
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of images of a given phenomenon. This observation pattern, however, differs from the common assumptions required for hypothesis testing: each set differs in size, may…
▽ More
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of images of a given phenomenon. This observation pattern, however, differs from the common assumptions required for hypothesis testing: each set differs in size, may have differing levels of noise, and also may incorporate nuisance variability, irrelevant for the analysis of the phenomenon of interest; all features that bias test decisions if not accounted for. In this paper, we propose to interpret sets as independent samples from a collection of latent probability distributions, and introduce kernel two-sample and independence tests in this latent space of distributions. We prove the consistency of tests and observe them to outperform in a wide range of synthetic experiments. Finally, we showcase their use in practice with experiments of healthcare and climate data, where previously heuristics were needed for feature extraction and testing.
△ Less
Submitted 2 February, 2021; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Conditional Independence Testing using Generative Adversarial Networks
Authors:
Alexis Bellot,
Mihaela van der Schaar
Abstract:
We consider the hypothesis testing problem of detecting conditional dependence, with a focus on high-dimensional feature spaces. Our contribution is a new test statistic based on samples from a generative adversarial network designed to approximate directly a conditional distribution that encodes the null hypothesis, in a manner that maximizes power (the rate of true negatives). We show that such…
▽ More
We consider the hypothesis testing problem of detecting conditional dependence, with a focus on high-dimensional feature spaces. Our contribution is a new test statistic based on samples from a generative adversarial network designed to approximate directly a conditional distribution that encodes the null hypothesis, in a manner that maximizes power (the rate of true negatives). We show that such an approach requires only that density approximation be viable in order to ensure that we control type I error (the rate of false positives); in particular, no assumptions need to be made on the form of the distributions or feature dependencies. Using synthetic simulations with high-dimensional data we demonstrate significant gains in power over competing methods. In addition, we illustrate the use of our test to discover causal markers of disease in genetic data.
△ Less
Submitted 18 December, 2019; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Gamma-ray imaging system for real-time measurements in nuclear waste characterisation
Authors:
L. Caballero,
F. Albiol Colomer,
A. Corbi Bellot,
P. Olleros Rodriguez,
J. Agramunt Ros,
C. Domingo-Pardo,
J. L. Leganes Nieto,
P. Contreras,
D. L. Perez Magan
Abstract:
A compact, portable and large field-of-view gamma camera to identify, locate and quantify gamma-ray emitting radioisotopes in real-time has been developed. The device delivers spectroscopic and imaging capabilities, which allow one to use it in a variety of nuclear waste characterisation scenarios, such as radioactivity monitoring in nuclear power plants and more specifically for the decommissioni…
▽ More
A compact, portable and large field-of-view gamma camera to identify, locate and quantify gamma-ray emitting radioisotopes in real-time has been developed. The device delivers spectroscopic and imaging capabilities, which allow one to use it in a variety of nuclear waste characterisation scenarios, such as radioactivity monitoring in nuclear power plants and more specifically for the decommissioning of nuclear facilities. The technical development of this apparatus and some examples of its application in field measurements are reported in this article. The performance of the presented gamma-camera is also benchmarked against other conventional techniques.
△ Less
Submitted 12 January, 2018;
originally announced January 2018.