Search | arXiv e-print repository

Causal Discovery with Fewer Conditional Independence Tests

Authors: Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler

Abstract: Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned abo… ▽ More Many questions in science center around the fundamental problem of understanding causal relationships. However, most constraint-based causal discovery algorithms, including the well-celebrated PC algorithm, often incur an exponential number of conditional independence (CI) tests, posing limitations in various applications. Addressing this, our work focuses on characterizing what can be learned about the underlying causal graph with a reduced number of CI tests. We show that it is possible to a learn a coarser representation of the hidden causal graph with a polynomial number of tests. This coarser representation, named Causal Consistent Partition Graph (CCPG), comprises of a partition of the vertices and a directed graph defined over its components. CCPG satisfies consistency of orientations and additional constraints which favor finer partitions. Furthermore, it reduces to the underlying causal graph when the causal graph is identifiable. As a consequence, our results offer the first efficient algorithm for recovering the true causal graph with a polynomial number of tests, in special cases where the causal graph is fully identifiable through observational data and potentially additional interventions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.19225 [pdf, other]

Synthetic Potential Outcomes for Mixtures of Treatment Effects

Authors: Bijan Mazaheri, Chandler Squires, Caroline Uhler

Abstract: Modern data analysis frequently relies on the use of large datasets, often constructed as amalgamations of diverse populations or data-sources. Heterogeneity across these smaller datasets constitutes two major challenges for causal inference: (1) the source of each sample can introduce latent confounding between treatment and effect, and (2) diverse populations may respond differently to the same… ▽ More Modern data analysis frequently relies on the use of large datasets, often constructed as amalgamations of diverse populations or data-sources. Heterogeneity across these smaller datasets constitutes two major challenges for causal inference: (1) the source of each sample can introduce latent confounding between treatment and effect, and (2) diverse populations may respond differently to the same treatment, giving rise to heterogeneous treatment effects (HTEs). The issues of latent confounding and HTEs have been studied separately but not in conjunction. In particular, previous works only report the conditional average treatment effect (CATE) among similar individuals (with respect to the measured covariates). CATEs cannot resolve mixtures of potential treatment effects driven by latent heterogeneity, which we call mixtures of treatment effects (MTEs). Inspired by method of moment approaches to mixture models, we propose "synthetic potential outcomes" (SPOs). Our new approach deconfounds heterogeneity while also guaranteeing the identifiability of MTEs. This technique bypasses full recovery of a mixture, which significantly simplifies its requirements for identifiability. We demonstrate the efficacy of SPOs on synthetic data. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2404.16907 [pdf, other]

Season combinatorial intervention predictions with Salt & Peper

Authors: Thomas Gaudelet, Alice Del Vecchio, Eli M Carrami, Juliana Cudini, Chantriolnt-Andreas Kapourani, Caroline Uhler, Lindsay Edwards

Abstract: Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate… ▽ More Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate the vast combinatorial space of concurrent genetic interventions. Addressing this, our work concentrates on estimating the effects of pairwise genetic combinations on the cellular transcriptome. We introduce two novel contributions: Salt, a biologically-inspired baseline that posits the mostly additive nature of combination effects, and Peper, a deep learning model that extends Salt's additive assumption to achieve unprecedented accuracy. Our comprehensive comparison against existing state-of-the-art methods, grounded in diverse metrics, and our out-of-distribution analysis highlight the limitations of current models in realistic settings. This analysis underscores the necessity for improved modelling techniques and data acquisition strategies, paving the way for more effective exploration of genetic intervention effects. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.05759 [pdf, other]

Membership Testing in Markov Equivalence Classes via Independence Query Oracles

Authors: Jiaqi Zhang, Kirankumar Shiragur, Caroline Uhler

Abstract: Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying c… ▽ More Understanding causal relationships between variables is a fundamental problem with broad impact in numerous scientific fields. While extensive research has been dedicated to learning causal graphs from data, its complementary concept of testing causal relationships has remained largely unexplored. While learning involves the task of recovering the Markov equivalence class (MEC) of the underlying causal graph from observational data, the testing counterpart addresses the following critical question: Given a specific MEC and observational data from some causal graph, can we determine if the data-generating causal graph belongs to the given MEC? We explore constraint-based testing methods by establishing bounds on the required number of conditional independence tests. Our bounds are in terms of the size of the maximum undirected clique ($s$) of the given MEC. In the worst case, we show a lower bound of $\exp(Ω(s))$ independence tests. We then give an algorithm that resolves the task with $\exp(O(s))$ tests, matching our lower bound. Compared to the learning problem, where algorithms often use a number of independence tests that is exponential in the maximum in-degree, this shows that testing is relatively easier. In particular, it requires exponentially less independence tests in graphs featuring high in-degrees and small clique sizes. Additionally, using the DAG associahedron, we provide a geometric interpretation of testing versus learning and discuss how our testing result can aid learning. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.14777 [pdf, other]

Causal Imputation for Counterfactual SCMs: Bridging Graphs and Latent Factor Models

Authors: Alvaro Ribot, Chandler Squires, Caroline Uhler

Abstract: We consider the task of causal imputation, where we aim to predict the outcomes of some set of actions across a wide range of possible contexts. As a running example, we consider predicting how different drugs affect cells from different cell types. We study the index-only setting, where the actions and contexts are categorical variables with a finite number of possible values. Even in this simple… ▽ More We consider the task of causal imputation, where we aim to predict the outcomes of some set of actions across a wide range of possible contexts. As a running example, we consider predicting how different drugs affect cells from different cell types. We study the index-only setting, where the actions and contexts are categorical variables with a finite number of possible values. Even in this simple setting, a practical challenge arises, since often only a small subset of possible action-context pairs have been studied. Thus, models must extrapolate to novel action-context pairs, which can be framed as a form of matrix completion with rows indexed by actions, columns indexed by contexts, and matrix entries corresponding to outcomes. We introduce a novel SCM-based model class, where the outcome is expressed as a counterfactual, actions are expressed as interventions on an instrumental variable, and contexts are defined based on the initial state of the system. We show that, under a linearity assumption, this setup induces a latent factor model over the matrix of outcomes, with an additional fixed effect term. To perform causal prediction based on this model class, we introduce simple extension to the Synthetic Interventions estimator (Agarwal et al., 2020). We evaluate several matrix completion approaches on the PRISM drug repurposing dataset, showing that our method outperforms all other considered matrix completion approaches. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 35 pages, 17 figures

arXiv:2402.08229 [pdf, other]

Causal Discovery under Off-Target Interventions

Authors: Davin Choo, Kirankumar Shiragur, Caroline Uhler

Abstract: Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions… ▽ More Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions with the natural goal of minimizing the number of interventions performed. We propose the following stochastic intervention model which subsumes existing adaptive noiseless interventions in the literature while capturing scenarios such as fat-hand interventions and CRISPR gene knockouts: any intervention attempt results in an actual intervention on a random subset of vertices, drawn from a distribution dependent on attempted action. Under this model, we study the two fundamental problems in causal discovery of verification and search and provide approximation algorithms with polylogarithmic competitive ratios and provide some preliminary experimental results. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted into AISTATS 2024

arXiv:2312.00718 [pdf, other]

Removing Biases from Molecular Representations via Information Maximization

Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola

Abstract: High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce s… ▽ More High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2310.20075 [pdf, other]

Meek Separators and Their Applications in Targeted Causal Discovery

Authors: Kirankumar Shiragur, Jiaqi Zhang, Caroline Uhler

Abstract: Learning causal structures from interventional data is a fundamental problem with broad applications across various fields. While many previous works have focused on recovering the entire causal graph, in practice, there are scenarios where learning only part of the causal graph suffices. This is called $targeted$ causal discovery. In our work, we focus on two such well-motivated problems: subset… ▽ More Learning causal structures from interventional data is a fundamental problem with broad applications across various fields. While many previous works have focused on recovering the entire causal graph, in practice, there are scenarios where learning only part of the causal graph suffices. This is called $targeted$ causal discovery. In our work, we focus on two such well-motivated problems: subset search and causal matching. We aim to minimize the number of interventions in both cases. Towards this, we introduce the $Meek~separator$, which is a subset of vertices that, when intervened, decomposes the remaining unoriented edges into smaller connected components. We then present an efficient algorithm to find Meek separators that are of small sizes. Such a procedure is helpful in designing various divide-and-conquer-based approaches. In particular, we propose two randomized algorithms that achieve logarithmic approximation for subset search and causal matching, respectively. Our results provide the first known average-case provable guarantees for both problems. We believe that this opens up possibilities to design near-optimal methods for many other targeted causal structure learning problems arising from various applications. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2307.06250 [pdf, other]

Identifiability Guarantees for Causal Disentanglement from Soft Interventions

Authors: Jiaqi Zhang, Chandler Squires, Kristjan Greenewald, Akash Srivastava, Karthikeyan Shanmugam, Caroline Uhler

Abstract: Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable.… ▽ More Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by develo** an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics. △ Less

Submitted 8 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

arXiv:2305.19884 [pdf, ps, other]

Positivity in Linear Gaussian Structural Equation Models

Authors: Asad Lodhia, Jan-Christian Hütter, Caroline Uhler, Piotr Zwiernik

Abstract: We study a notion of positivity of Gaussian directed acyclic graphical models corresponding to a non-negativity constraint on the coefficients of the associated structural equation model. We prove that this constraint is equivalent to the distribution being conditionally increasing in sequence (CIS), a well-known subclass of positively associated random variables. These distributions require knowl… ▽ More We study a notion of positivity of Gaussian directed acyclic graphical models corresponding to a non-negativity constraint on the coefficients of the associated structural equation model. We prove that this constraint is equivalent to the distribution being conditionally increasing in sequence (CIS), a well-known subclass of positively associated random variables. These distributions require knowledge of a permutation, a CIS ordering, of the nodes for which the constraint of non-negativity holds. We provide an algorithm and prove in the noise-less setting that a CIS ordering can be recovered when it exists. We extend this result to the noisy setting and provide assumptions for recovering the CIS orderings. In addition, we provide a characterization of Markov equivalence for CIS DAG models. Further, we show that when a CIS ordering is known, the corresponding class of Gaussians lies in a family of distributions in which maximum likelihood estimation is a convex problem. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 22 pages, 5 figures

arXiv:2302.00993 [pdf, other]

Unpaired Multi-Domain Causal Representation Learning

Authors: Nils Sturma, Chandler Squires, Mathias Drton, Caroline Uhler

Abstract: The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not the… ▽ More The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our identifiability results into a practical method to recover the shared latent causal graph. △ Less

Submitted 27 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.10814 [pdf, other]

Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler's Rotation Equation

Authors: Wengong **, Siranush Sarkizova, Xun Chen, Nir Hacohen, Caroline Uhler

Abstract: Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy predicti… ▽ More Protein-ligand binding prediction is a fundamental problem in AI-driven drug discovery. Prior work focused on supervised learning methods using a large set of binding affinity data for small molecules, but it is hard to apply the same strategy to other drug classes like antibodies as labelled data is limited. In this paper, we explore unsupervised approaches and reformulate binding energy prediction as a generative modeling task. Specifically, we train an energy-based model on a set of unlabelled protein-ligand complexes using SE(3) denoising score matching and interpret its log-likelihood as binding affinity. Our key contribution is a new equivariant rotation prediction network called Neural Euler's Rotation Equations (NERE) for SE(3) score matching. It predicts a rotation by modeling the force and torque between protein and ligand atoms, where the force is defined as the gradient of an energy function with respect to atom coordinates. We evaluate NERE on protein-ligand and antibody-antigen binding affinity prediction benchmarks. Our model outperforms all unsupervised baselines (physics-based and statistical potentials) and matches supervised learning methods in the antibody case. △ Less

Submitted 12 December, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2211.16467 [pdf, other]

Linear Causal Disentanglement via Interventions

Authors: Chandler Squires, Anna Seigal, Salil Bhate, Caroline Uhler

Abstract: Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessar… ▽ More Causal disentanglement seeks a representation of data involving latent variables that relate to one another via a causal model. A representation is identifiable if both the latent model and the transformation from latent to observed variables are unique. In this paper, we study observed variables that are a linear transformation of a linear latent causal model. Data from interventions are necessary for identifiability: if one latent variable is missing an intervention, we show that there exist distinct models that cannot be distinguished. Conversely, we show that a single intervention on each latent variable is sufficient for identifiability. Our proof uses a generalization of the RQ decomposition of a matrix that replaces the usual orthogonal and upper triangular conditions with analogues depending on a partial order on the rows of the matrix, with partial order determined by a latent causal model. We corroborate our theoretical results with a method for causal disentanglement that accurately recovers a latent causal model. △ Less

Submitted 11 June, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.00227 [pdf, other]

Transfer Learning with Kernel Methods

Authors: Adityanarayanan Radhakrishnan, Max Ruiz Luyten, Neha Prasad, Caroline Uhler

Abstract: Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it has been unclear how to perform transfer learning for kernel methods. In this work, we propose a transfer learning framework for kernel methods by projecting and… ▽ More Transfer learning refers to the process of adapting a model trained on a source task to a target task. While kernel methods are conceptually and computationally simple machine learning models that are competitive on a variety of tasks, it has been unclear how to perform transfer learning for kernel methods. In this work, we propose a transfer learning framework for kernel methods by projecting and translating the source model to the target task. We demonstrate the effectiveness of our framework in applications to image classification and virtual drug screening. In particular, we show that transferring modern kernels trained on large-scale image datasets can result in substantial performance increase as compared to using the same kernel trained directly on the target task. In addition, we show that transfer-learned kernels allow a more accurate prediction of the effect of drugs on cancer cell lines. For both applications, we identify simple scaling laws that characterize the performance of transfer-learned kernels as a function of the number of target examples. We explain this phenomenon in a simplified linear setting, where we are able to derive the exact scaling laws. By providing a simple and effective transfer learning framework for kernel methods, our work enables kernel methods trained on large datasets to be easily adapted to a variety of downstream target tasks. △ Less

Submitted 31 October, 2022; originally announced November 2022.

arXiv:2209.04744 [pdf, other]

Active Learning for Optimal Intervention Design in Causal Models

Authors: Jiaqi Zhang, Louis Cammarata, Chandler Squires, Themistoklis P. Sapsis, Caroline Uhler

Abstract: Sequential experimental design to discover interventions that achieve a desired outcome is a key problem in various domains including science, engineering and public policy. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effec… ▽ More Sequential experimental design to discover interventions that achieve a desired outcome is a key problem in various domains including science, engineering and public policy. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effect of interventions on the system, is critical for identifying desirable interventions more efficiently. Here, we develop a causal active learning strategy to identify interventions that are optimal, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean. The approach employs a Bayesian update for the causal model and prioritizes interventions using a carefully designed, causally informed acquisition function. This acquisition function is evaluated in closed form, allowing for fast optimization. The resulting algorithms are theoretically grounded with information-theoretic bounds and provable consistency results for linear causal models with known causal graph. We apply our approach to both synthetic data and single-cell transcriptomic data from Perturb-CITE-seq experiments to identify optimal perturbations that induce a specific cell state transition. The causally informed acquisition function generally outperforms existing criteria allowing for optimal intervention design with fewer but carefully selected samples. △ Less

Submitted 16 August, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

arXiv:2207.01237 [pdf, other]

Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors

Authors: Chandler Squires, Annie Yun, Eshaan Nichani, Raj Agrawal, Caroline Uhler

Abstract: We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define latent factor causal models (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the… ▽ More We consider the problem of learning the structure of a causal directed acyclic graph (DAG) model in the presence of latent variables. We define latent factor causal models (LFCMs) as a restriction on causal DAG models with latent variables, which are composed of clusters of observed variables that share the same latent parent and connections between these clusters given by edges pointing from the observed variables to latent variables. LFCMs are motivated by gene regulatory networks, where regulatory edges, corresponding to transcription factors, connect spatially clustered genes. We show identifiability results on this model and design a consistent three-stage algorithm that discovers clusters of observed nodes, a partial ordering over clusters, and finally, the entire structure over both observed and latent nodes. We evaluate our method in a synthetic setting, demonstrating its ability to almost perfectly recover the ground truth clustering even at relatively low sample sizes, as well as the ability to recover a significant number of the edges from observed variables to latent factors. Finally, we apply our method in a semi-synthetic setting to protein mass spectrometry data with a known ground truth network, and achieve almost perfect recovery of the ground truth variable clusters. △ Less

Submitted 5 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Causal Learning and Reasoning (CLeaR) 2022

arXiv:2206.01152 [pdf, other]

doi 10.1007/s10208-022-09581-9

Causal Structure Learning: a Combinatorial Perspective

Authors: Chandler Squires, Caroline Uhler

Abstract: In this review, we discuss approaches for learning causal structure from data, also called causal discovery. In particular, we focus on approaches for learning directed acyclic graphs (DAGs) and various generalizations which allow for some variables to be unobserved in the available data. We devote special attention to two fundamental combinatorial aspects of causal structure learning. First, we d… ▽ More In this review, we discuss approaches for learning causal structure from data, also called causal discovery. In particular, we focus on approaches for learning directed acyclic graphs (DAGs) and various generalizations which allow for some variables to be unobserved in the available data. We devote special attention to two fundamental combinatorial aspects of causal structure learning. First, we discuss the structure of the search space over causal graphs. Second, we discuss the structure of equivalence classes over causal graphs, i.e., sets of graphs which represent what can be learned from observational data alone, and how these equivalence classes can be refined by adding interventional data. △ Less

Submitted 19 December, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Journal ref: Foundations of Computational Mathematics, 2022

arXiv:2204.14126 [pdf, other]

doi 10.1073/pnas.2208779120

Wide and Deep Neural Networks Achieve Optimality for Classification

Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Abstract: While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural ne… ▽ More While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are optimal for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that achieve optimality. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and Neural Tangent Kernels, we provide explicit activation functions that can be used to construct networks that achieve optimality. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: (1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); (2) majority vote (model predictions are given by the label of the class with greatest representation in the training set); or (3) singular kernel classifiers (a set of classifiers containing those that achieve optimality). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful. △ Less

Submitted 29 April, 2022; originally announced April 2022.

arXiv:2112.14872 [pdf, other]

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Abstract: Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants (e.g. Adagrad, Adam, etc.) have become prominent methods of choice for machine learning practitioners. While a large number of works have demonstrated that these fi… ▽ More Establishing a fast rate of convergence for optimization methods is crucial to their applicability in practice. With the increasing popularity of deep learning over the past decade, stochastic gradient descent and its adaptive variants (e.g. Adagrad, Adam, etc.) have become prominent methods of choice for machine learning practitioners. While a large number of works have demonstrated that these first order optimization methods can achieve sub-linear or linear convergence, we establish local quadratic convergence for stochastic gradient descent with adaptive step size for problems such as matrix inversion. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: ICML 2021 Workshop on Beyond first-order methods in ML systems

arXiv:2112.00816 [pdf, other]

Maximum Likelihood Estimation for Brownian Motion Tree Models Based on One Sample

Authors: Michael Truell, Jan-Christian Hütter, Chandler Squires, Piotr Zwiernik, Caroline Uhler

Abstract: We study the problem of maximum likelihood estimation given one data sample ($n=1$) over Brownian Motion Tree Models (BMTMs), a class of Gaussian models on trees. BMTMs are often used as a null model in phylogenetics, where the one-sample regime is common. Specifically, we show that, almost surely, the one-sample BMTM maximum likelihood estimator (MLE) exists, is unique, and corresponds to a fully… ▽ More We study the problem of maximum likelihood estimation given one data sample ($n=1$) over Brownian Motion Tree Models (BMTMs), a class of Gaussian models on trees. BMTMs are often used as a null model in phylogenetics, where the one-sample regime is common. Specifically, we show that, almost surely, the one-sample BMTM maximum likelihood estimator (MLE) exists, is unique, and corresponds to a fully observed tree. Moreover, we provide a polynomial time algorithm for its exact computation. We also consider the MLE over all possible BMTM tree structures in the one-sample case and show that it exists almost surely, that it coincides with the MLE over diagonally dominant M-matrices, and that it admits a unique closed-form solution that corresponds to a path graph. Finally, we explore statistical properties of the one-sample BMTM MLE through numerical experiments. △ Less

Submitted 24 November, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

MSC Class: 62F30; 62H12; 90C39; 62P10

arXiv:2108.00131 [pdf, other]

doi 10.1073/pnas.2115064119

Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks

Authors: Adityanarayanan Radhakrishnan, George Stefanakis, Mikhail Belkin, Caroline Uhler

Abstract: Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications, but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width n… ▽ More Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications, but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width neural network framework for matrix completion that is simple, fast, and flexible. Simplicity and speed come from the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK). In particular, we derive the NTK for fully connected and convolutional neural networks for matrix completion. The flexibility stems from a feature prior, which allows encoding relationships between coordinates of the target matrix, akin to semi-supervised learning. The effectiveness of our framework is demonstrated through competitive results for virtual drug screening and image inpainting/reconstruction. We also provide an implementation in Python to make our framework accessible on standard hardware to a broad audience. △ Less

Submitted 21 February, 2022; v1 submitted 30 July, 2021; originally announced August 2021.

arXiv:2107.01850 [pdf, other]

Matching a Desired Causal State via Shift Interventions

Authors: Jiaqi Zhang, Chandler Squires, Caroline Uhler

Abstract: Transforming a causal system from a given initial state to a desired target state is an important task permeating multiple fields including control theory, biology, and materials science. In causal models, such transformations can be achieved by performing a set of interventions. In this paper, we consider the problem of identifying a shift intervention that matches the desired mean of a system th… ▽ More Transforming a causal system from a given initial state to a desired target state is an important task permeating multiple fields including control theory, biology, and materials science. In causal models, such transformations can be achieved by performing a set of interventions. In this paper, we consider the problem of identifying a shift intervention that matches the desired mean of a system through active learning. We define the Markov equivalence class that is identifiable from shift interventions and propose two active learning strategies that are guaranteed to exactly match a desired mean. We then derive a worst-case lower bound for the number of interventions required and show that these strategies are optimal for certain classes of graphs. In particular, we show that our strategies may require exponentially fewer interventions than the previously considered approaches, which optimize for structure learning in the underlying causal graph. In line with our theoretical results, we also demonstrate experimentally that our proposed active learning strategies require fewer interventions compared to several baselines. △ Less

Submitted 20 October, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2106.15456 [pdf, other]

A Mechanism for Producing Aligned Latent Spaces with Autoencoders

Authors: Saachi Jain, Adityanarayanan Radhakrishnan, Caroline Uhler

Abstract: Aligned latent spaces, where meaningful semantic shifts in the input space correspond to a translation in the embedding space, play an important role in the success of downstream tasks such as unsupervised clustering and data imputation. In this work, we prove that linear and nonlinear autoencoders produce aligned latent spaces by stretching along the left singular vectors of the data. We fully ch… ▽ More Aligned latent spaces, where meaningful semantic shifts in the input space correspond to a translation in the embedding space, play an important role in the success of downstream tasks such as unsupervised clustering and data imputation. In this work, we prove that linear and nonlinear autoencoders produce aligned latent spaces by stretching along the left singular vectors of the data. We fully characterize the amount of stretching in linear autoencoders and provide an initialization scheme to arbitrarily stretch along the top directions using these networks. We also quantify the amount of stretching in nonlinear autoencoders in a simplified setting. We use our theoretical results to align drug signatures across cell types in gene expression space and semantic shifts in word embedding spaces. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2105.14024 [pdf, other]

Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning

Authors: Scott Sussex, Andreas Krause, Caroline Uhler

Abstract: Causal structure learning is a key problem in many domains. Causal structures can be learnt by performing experiments on the system of interest. We address the largely unexplored problem of designing a batch of experiments that each simultaneously intervene on multiple variables. While potentially more informative than the commonly considered single-variable interventions, selecting such intervent… ▽ More Causal structure learning is a key problem in many domains. Causal structures can be learnt by performing experiments on the system of interest. We address the largely unexplored problem of designing a batch of experiments that each simultaneously intervene on multiple variables. While potentially more informative than the commonly considered single-variable interventions, selecting such interventions is algorithmically much more challenging, due to the doubly-exponential combinatorial search space over sets of composite interventions. In this paper, we develop efficient algorithms for optimizing different objective functions quantifying the informativeness of a budget-constrained batch of experiments. By establishing novel submodularity properties of these objectives, we provide approximation guarantees for our algorithms. Our algorithms empirically perform superior to both random interventions and algorithms that only select single-variable interventions. △ Less

Submitted 24 November, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: 10 pages, 2 figures, appendix, to be published in 35th Conference on Neural Information Processing Systems (NeurIPS 2021), fixed typos and clarified wording

arXiv:2102.07921 [pdf, other]

The DeCAMFounder: Non-Linear Causal Discovery in the Presence of Hidden Variables

Authors: Raj Agrawal, Chandler Squires, Neha Prasad, Caroline Uhler

Abstract: Many real-world decision-making tasks require learning causal relationships between a set of variables. Traditional causal discovery methods, however, require that all variables are observed, which is often not feasible in practical scenarios. Without additional assumptions about the unobserved variables, it is not possible to recover any causal relationships from observational data. Fortunately,… ▽ More Many real-world decision-making tasks require learning causal relationships between a set of variables. Traditional causal discovery methods, however, require that all variables are observed, which is often not feasible in practical scenarios. Without additional assumptions about the unobserved variables, it is not possible to recover any causal relationships from observational data. Fortunately, in many applied settings, additional structure among the confounders can be expected. In particular, pervasive confounding is commonly encountered and has been utilized for consistent causal estimation in linear causal models. In this paper, we present a provably consistent method to estimate causal relationships in the non-linear, pervasive confounding setting. The core of our procedure relies on the ability to estimate the confounding variation through a simple spectral decomposition of the observed data matrix. We derive a DAG score function based on this insight, prove its consistency in recovering a correct ordering of the DAG, and empirically compare it to previous approaches. We demonstrate improved performance on both simulated and real datasets by explicitly accounting for both confounders and non-linear effects. △ Less

Submitted 25 June, 2023; v1 submitted 15 February, 2021; originally announced February 2021.

Comments: To appear in Journal of the Royal Statistical Society Series B

arXiv:2101.05336 [pdf, other]

Identifying 3D Genome Organization in Diploid Organisms via Euclidean Distance Geometry

Authors: Anastasiya Belyaeva, Kaie Kubjas, Lawrence J. Sun, Caroline Uhler

Abstract: The spatial organization of the DNA in the cell nucleus plays an important role for gene regulation, DNA replication, and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the 3D organization… ▽ More The spatial organization of the DNA in the cell nucleus plays an important role for gene regulation, DNA replication, and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the 3D organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming (SDP) formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide SDP formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2011.03610 [pdf, other]

Efficient Permutation Discovery in Causal DAGs

Authors: Chandler Squires, Joshua Amaniampong, Caroline Uhler

Abstract: The problem of learning a directed acyclic graph (DAG) up to Markov equivalence is equivalent to the problem of finding a permutation of the variables that induces the sparsest graph. Without additional assumptions, this task is known to be NP-hard. Building on the minimum degree algorithm for sparse Cholesky decomposition, but utilizing DAG-specific problem structure, we introduce an efficient al… ▽ More The problem of learning a directed acyclic graph (DAG) up to Markov equivalence is equivalent to the problem of finding a permutation of the variables that induces the sparsest graph. Without additional assumptions, this task is known to be NP-hard. Building on the minimum degree algorithm for sparse Cholesky decomposition, but utilizing DAG-specific problem structure, we introduce an efficient algorithm for finding such sparse permutations. We show that on jointly Gaussian distributions, our method with depth $w$ runs in $O(p^{w+3})$ time. We compare our method with $w = 1$ to algorithms for finding sparse elimination orderings of undirected graphs, and show that taking advantage of DAG-specific problem structure leads to a significant improvement in the discovered permutation. We also compare our algorithm to provably consistent causal structure learning algorithms, such as the PC algorithm, GES, and GSP, and show that our method achieves comparable performance with a shorter runtime. Thus, our method can be used on its own for causal structure discovery. Finally, we show that there exist dense graphs on which our method achieves almost perfect performance, so that unlike most existing causal structure learning algorithms, the situations in which our algorithm achieves both good performance and good runtime are not limited to sparse graphs. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2011.03127 [pdf, other]

Causal Imputation via Synthetic Interventions

Authors: Chandler Squires, Dennis Shen, Anish Agarwal, Devavrat Shah, Caroline Uhler

Abstract: Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every… ▽ More Consider the problem of determining the effect of a compound on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (compounds) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the outcome for every pair while only having to perform experiments on a small subset of pairs. This task, which we label "causal imputation", is a generalization of the causal transportability problem. To address this challenge, we extend the recently introduced synthetic interventions (SI) estimator to handle more general data sparsity patterns. We prove that, under a latent factor model, our estimator provides valid estimates for the causal imputation task. We motivate this model by establishing a connection to the linear structural causal model literature. Finally, we consider the prominent CMAP dataset in predicting the effects of compounds on gene expression across cell types. We find that our estimator outperforms standard baselines, thus confirming its utility in biological applications. △ Less

Submitted 11 June, 2023; v1 submitted 5 November, 2020; originally announced November 2020.

arXiv:2010.09610 [pdf, other]

Increasing Depth Leads to U-Shaped Test Risk in Over-parameterized Convolutional Networks

Authors: Eshaan Nichani, Adityanarayanan Radhakrishnan, Caroline Uhler

Abstract: Recent works have demonstrated that increasing model capacity through width in over-parameterized neural networks leads to a decrease in test risk. For neural networks, however, model capacity can also be increased through depth, yet understanding the impact of increasing depth on test risk remains an open question. In this work, we demonstrate that the test risk of over-parameterized convolutiona… ▽ More Recent works have demonstrated that increasing model capacity through width in over-parameterized neural networks leads to a decrease in test risk. For neural networks, however, model capacity can also be increased through depth, yet understanding the impact of increasing depth on test risk remains an open question. In this work, we demonstrate that the test risk of over-parameterized convolutional networks is a U-shaped curve (i.e. monotonically decreasing, then increasing) with increasing depth. We first provide empirical evidence for this phenomenon via image classification experiments using both ResNets and the convolutional neural tangent kernel (CNTK). We then present a novel linear regression framework for characterizing the impact of depth on test risk, and show that increasing depth leads to a U-shaped test risk for the linear CNTK. In particular, we prove that the linear CNTK corresponds to a depth-dependent linear transformation on the original space and characterize properties of this transformation. We then analyze over-parameterized linear regression under arbitrary linear transformations and, in simplified settings, provably identify the depths which minimize each of the bias and variance terms of the test risk. △ Less

Submitted 4 June, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 27 pages, 23 figures

arXiv:2010.08120 [pdf, other]

Joint Inference of Multiple Graphs from Matrix Polynomials

Authors: Madeline Navarro, Yuhao Wang, Antonio G. Marques, Caroline Uhler, Santiago Segarra

Abstract: Inferring graph structure from observations on the nodes is an important and popular network science task. Departing from the more common inference of a single graph and motivated by social and biological networks, we study the problem of jointly inferring multiple graphs from the observation of signals at their nodes (graph signals), which are assumed to be stationary in the sought graphs. From a… ▽ More Inferring graph structure from observations on the nodes is an important and popular network science task. Departing from the more common inference of a single graph and motivated by social and biological networks, we study the problem of jointly inferring multiple graphs from the observation of signals at their nodes (graph signals), which are assumed to be stationary in the sought graphs. From a mathematical point of view, graph stationarity implies that the map** between the covariance of the signals and the sparse matrix representing the underlying graph is given by a matrix polynomial. A prominent example is that of Markov random fields, where the inverse of the covariance yields the sparse matrix of interest. From a modeling perspective, stationary graph signals can be used to model linear network processes evolving on a set of (not necessarily known) networks. Leveraging that matrix polynomials commute, a convex optimization method along with sufficient conditions that guarantee the recovery of the true graphs are provided when perfect covariance information is available. Particularly important from an empirical viewpoint, we provide high-probability bounds on the recovery error as a function of the number of signals observed and other key problem parameters. Numerical experiments using synthetic and real-world data demonstrate the effectiveness of the proposed method with perfect covariance information as well as its robustness in the noisy regime. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 13 pages, 2 figures

arXiv:2009.08574 [pdf, other]

Linear Convergence of Generalized Mirror Descent with Time-Dependent Mirrors

Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Abstract: The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more general optimization methods. In this work, we… ▽ More The Polyak-Lojasiewicz (PL) inequality is a sufficient condition for establishing linear convergence of gradient descent, even in non-convex settings. While several recent works use a PL-based analysis to establish linear convergence of stochastic gradient descent methods, the question remains as to whether a similar analysis can be conducted for more general optimization methods. In this work, we present a PL-based analysis for linear convergence of generalized mirror descent (GMD), a generalization of mirror descent with a possibly time-dependent mirror. GMD subsumes popular first order optimization methods including gradient descent, mirror descent, and preconditioned gradient descent methods such as Adagrad. Since the standard PL analysis cannot be extended naturally from GMD to stochastic GMD, we present a Taylor-series based analysis to establish sufficient conditions for linear convergence of stochastic GMD. As a corollary, our result establishes sufficient conditions and provides learning rates for linear convergence of stochastic mirror descent and Adagrad. Lastly, for functions that are locally PL*, our analysis implies existence of an interpolating solution and convergence of GMD to this solution. △ Less

Submitted 6 October, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

arXiv:2007.12098 [pdf, other]

Optimal Transport using GANs for Lineage Tracing

Authors: Neha Prasad, Karren Yang, Caroline Uhler

Abstract: In this paper, we present Super-OT, a novel approach to computational lineage tracing that combines a supervised learning framework with optimal transport based on Generative Adversarial Networks (GANs). Unlike previous approaches to lineage tracing, Super-OT has the flexibility to integrate paired data. We benchmark Super-OT based on single-cell RNA-seq data against Waddington-OT, a popular appro… ▽ More In this paper, we present Super-OT, a novel approach to computational lineage tracing that combines a supervised learning framework with optimal transport based on Generative Adversarial Networks (GANs). Unlike previous approaches to lineage tracing, Super-OT has the flexibility to integrate paired data. We benchmark Super-OT based on single-cell RNA-seq data against Waddington-OT, a popular approach for lineage tracing that also employs optimal transport. We show that Super-OT achieves gains over Waddington-OT in predicting the class outcome of cells during differentiation, since it allows the integration of additional information during training. △ Less

Submitted 5 January, 2022; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: 4 pages excluding references, 2 figures, 3 tables. Accepted at ICML 2020 Workshop on Computational Biology for Spotlight Presentation. Code can be found here: https://github.com/uhlerlab/superot

arXiv:2006.13431 [pdf, other]

Multiscale Simulations of Complex Systems by Learning their Effective Dynamics

Authors: Pantelis R. Vlachas, Georgios Arampatzis, Caroline Uhler, Petros Koumoutsakos

Abstract: Predictive simulations of complex systems are essential for applications ranging from weather forecasting to drug design. The veracity of these predictions hinges on their capacity to capture the effective system dynamics. Massively parallel simulations predict the system dynamics by resolving all spatiotemporal scales, often at a cost that prevents experimentation while their findings may not all… ▽ More Predictive simulations of complex systems are essential for applications ranging from weather forecasting to drug design. The veracity of these predictions hinges on their capacity to capture the effective system dynamics. Massively parallel simulations predict the system dynamics by resolving all spatiotemporal scales, often at a cost that prevents experimentation while their findings may not allow for generalisation. On the other hand reduced order models are fast but limited by the frequently adopted linearization of the system dynamics and/or the utilization of heuristic closures. Here we present a novel systematic framework that bridges large scale simulations and reduced order models to Learn the Effective Dynamics (LED) of diverse complex systems. The framework forms algorithmic alloys between non-linear machine learning algorithms and the Equation-Free approach for modeling complex systems. LED deploys autoencoders to formulate a map** between fine and coarse-grained representations and evolves the latent space dynamics using recurrent neural networks. The algorithm is validated on benchmark problems and we find that it outperforms state of the art reduced order models in terms of predictability and large scale simulations in terms of cost. LED is applicable to systems ranging from chemistry to fluid mechanics and reduces the computational effort by up to two orders of magnitude while maintaining the prediction accuracy of the full system dynamics. We argue that LED provides a novel potent modality for the accurate prediction of complex systems. △ Less

Submitted 19 October, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: 39 pages (Appendix included)

arXiv:2006.08532 [pdf, other]

Improved Conditional Flow Models for Molecule to Image Synthesis

Authors: Karren Yang, Samuel Goldman, Wengong **, Alex Lu, Regina Barzilay, Tommi Jaakkola, Caroline Uhler

Abstract: In this paper, we aim to synthesize cell microscopy images under different molecular interventions, motivated by practical applications to drug development. Building on the recent success of graph neural networks for learning molecular embeddings and flow-based models for image generation, we propose Mol2Image: a flow-based generative model for molecule to cell image synthesis. To generate cell fe… ▽ More In this paper, we aim to synthesize cell microscopy images under different molecular interventions, motivated by practical applications to drug development. Building on the recent success of graph neural networks for learning molecular embeddings and flow-based models for image generation, we propose Mol2Image: a flow-based generative model for molecule to cell image synthesis. To generate cell features at different resolutions and scale to high-resolution images, we develop a novel multi-scale flow architecture based on a Haar wavelet image pyramid. To maximize the mutual information between the generated images and the molecular interventions, we devise a training strategy based on contrastive learning. To evaluate our model, we propose a new set of metrics for biological image generation that are robust, interpretable, and relevant to practitioners. We show quantitatively that our method learns a meaningful embedding of the molecular intervention, which is translated into an image representation reflecting the biological effects of the intervention. △ Less

Submitted 15 June, 2020; originally announced June 2020.

MSC Class: 92-08

arXiv:2006.03735 [pdf, other]

doi 10.1038/s41467-021-21056-z

Causal Network Models of SARS-CoV-2 Expression and Aging to Identify Candidates for Drug Repurposing

Authors: Anastasiya Belyaeva, Louis Cammarata, Adityanarayanan Radhakrishnan, Chandler Squires, Karren Dai Yang, G. V. Shivashankar, Caroline Uhler

Abstract: Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. While a number of data-driven and experimental approaches have been suggested in the context of drug repurposing, a platform that systematically integrates available transcriptomic, proteomic and structural data is missing. More importantly, given that SARS-C… ▽ More Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. While a number of data-driven and experimental approaches have been suggested in the context of drug repurposing, a platform that systematically integrates available transcriptomic, proteomic and structural data is missing. More importantly, given that SARS-CoV-2 pathogenicity is highly age-dependent, it is critical to integrate aging signatures into drug discovery platforms. We here take advantage of large-scale transcriptional drug screens combined with RNA-seq data of the lung epithelium with SARS-CoV-2 infection as well as the aging lung. To identify robust druggable protein targets, we propose a principled causal framework that makes use of multiple data modalities. Our analysis highlights the importance of serine/threonine and tyrosine kinases as potential targets that intersect the SARS-CoV-2 and aging pathways. By integrating transcriptomic, proteomic and structural data that is available for many diseases, our drug discovery platform is broadly applicable. Rigorous in vitro experiments as well as clinical trials are needed to validate the identified candidate drugs. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2003.06340 [pdf, other]

On Alignment in Deep Linear Neural Networks

Authors: Adityanarayanan Radhakrishnan, Eshaan Nichani, Daniel Bernstein, Caroline Uhler

Abstract: We study the properties of alignment, a form of implicit regularization, in linear neural networks under gradient descent. We define alignment for fully connected networks with multidimensional outputs and show that it is a natural extension of alignment in networks with 1-dimensional outputs as defined by Ji and Telgarsky, 2018. While in fully connected networks, there always exists a global mini… ▽ More We study the properties of alignment, a form of implicit regularization, in linear neural networks under gradient descent. We define alignment for fully connected networks with multidimensional outputs and show that it is a natural extension of alignment in networks with 1-dimensional outputs as defined by Ji and Telgarsky, 2018. While in fully connected networks, there always exists a global minimum corresponding to an aligned solution, we analyze alignment as it relates to the training process. Namely, we characterize when alignment is an invariant of training under gradient descent by providing necessary and sufficient conditions for this invariant to hold. In such settings, the dynamics of gradient descent simplify, thereby allowing us to provide an explicit learning rate under which the network converges linearly to a global minimum. We then analyze networks with layer constraints such as convolutional networks. In this setting, we prove that gradient descent is equivalent to projected gradient descent, and that alignment is impossible with sufficiently large datasets. △ Less

Submitted 16 June, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

arXiv:2001.11940 [pdf, other]

Causal Structure Discovery from Distributions Arising from Mixtures of DAGs

Authors: Basil Saeed, Snigdha Panigrahi, Caroline Uhler

Abstract: We consider distributions arising from a mixture of causal models, where each model is represented by a directed acyclic graph (DAG). We provide a graphical representation of such mixture distributions and prove that this representation encodes the conditional independence relations of the mixture distribution. We then consider the problem of structure learning based on samples from such distribut… ▽ More We consider distributions arising from a mixture of causal models, where each model is represented by a directed acyclic graph (DAG). We provide a graphical representation of such mixture distributions and prove that this representation encodes the conditional independence relations of the mixture distribution. We then consider the problem of structure learning based on samples from such distributions. Since the mixing variable is latent, we consider causal structure discovery algorithms such as FCI that can deal with latent variables. We show that such algorithms recover a "union" of the component DAGs and can identify variables whose conditional distribution across the component DAGs vary. We demonstrate our results on synthetic and real data showing that the inferred graph identifies nodes that vary between the different mixture components. As an immediate application, we demonstrate how retrieval of this causal information can be used to cluster samples according to each mixture component. △ Less

Submitted 9 August, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

arXiv:1910.09014 [pdf, other]

Ordering-Based Causal Structure Learning in the Presence of Latent Variables

Authors: Daniel Irving Bernstein, Basil Saeed, Chandler Squires, Caroline Uhler

Abstract: We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belong… ▽ More We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belongs to the Markov equivalence class of the true model. This motivates the \emph{Sparsest Poset} formulation - that posets can be mapped to minimal IMAPs of the true model such that the sparsest of these IMAPs is Markov equivalent to the true model. Motivated by this result, we propose a greedy algorithm over the space of posets for causal structure discovery in the presence of latent confounders and compare its performance to the current state-of-the-art algorithms FCI and FCI+ on synthetic data. △ Less

Submitted 24 March, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

Comments: To appear in AISTATS 2020

arXiv:1910.09007 [pdf, other]

Permutation-Based Causal Structure Learning with Unknown Intervention Targets

Authors: Chandler Squires, Yuhao Wang, Caroline Uhler

Abstract: We consider the problem of estimating causal DAG models from a mix of observational and interventional data, when the intervention targets are partially or completely unknown. This problem is highly relevant for example in genomics, since gene knockout technologies are known to have off-target effects. We characterize the interventional Markov equivalence class of DAGs that can be identified from… ▽ More We consider the problem of estimating causal DAG models from a mix of observational and interventional data, when the intervention targets are partially or completely unknown. This problem is highly relevant for example in genomics, since gene knockout technologies are known to have off-target effects. We characterize the interventional Markov equivalence class of DAGs that can be identified from interventional data with unknown intervention targets. In addition, we propose a provably consistent algorithm for learning the interventional Markov equivalence class from such data. The proposed algorithm greedily searches over the space of permutations to minimize a novel score function. The algorithm is nonparametric, which is particularly important for applications to genomics, where the relationships between variables are often non-linear and the distribution non-Gaussian. We demonstrate the performance of our algorithm on synthetic and biological datasets. Links to an implementation of our algorithm and to a reproducible code base for our experiments can be found at https://uhlerlab.github.io/causaldag/utigsp. △ Less

Submitted 20 June, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

arXiv:1909.12362 [pdf, other]

doi 10.1073/pnas.2005013117

Overparameterized Neural Networks Implement Associative Memory

Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Caroline Uhler

Abstract: Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. Empirically, we show that: (1) overparameterized autoencoders store train… ▽ More Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. Empirically, we show that: (1) overparameterized autoencoders store training samples as attractors, and thus, iterating the learned map leads to sample recovery; (2) the same mechanism allows for encoding sequences of examples, and serves as an even more efficient mechanism for memory than autoencoding. Theoretically, we prove that when trained on a single example, autoencoders store the example as an attractor. Lastly, by treating a sequence encoder as a composition of maps, we prove that sequence encoding provides a more efficient mechanism for memory than autoencoding. △ Less

Submitted 9 September, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

arXiv:1909.04222 [pdf, other]

Covariance Matrix Estimation under Total Positivity for Portfolio Selection

Authors: Raj Agrawal, Uma Roy, Caroline Uhler

Abstract: Selecting the optimal Markowitz porfolio depends on estimating the covariance matrix of the returns of $N$ assets from $T$ periods of historical data. Problematically, $N$ is typically of the same order as $T$, which makes the sample covariance matrix estimator perform poorly, both empirically and theoretically. While various other general purpose covariance matrix estimators have been introduced… ▽ More Selecting the optimal Markowitz porfolio depends on estimating the covariance matrix of the returns of $N$ assets from $T$ periods of historical data. Problematically, $N$ is typically of the same order as $T$, which makes the sample covariance matrix estimator perform poorly, both empirically and theoretically. While various other general purpose covariance matrix estimators have been introduced in the financial economics and statistics literature for dealing with the high dimensionality of this problem, we here propose an estimator that exploits the fact that assets are typically positively dependent. This is achieved by imposing that the joint distribution of returns be multivariate totally positive of order 2 ($\text{MTP}_2$). This constraint on the covariance matrix not only enforces positive dependence among the assets, but also regularizes the covariance matrix, leading to desirable statistical properties such as sparsity. Based on stock-market data spanning over thirty years, we show that estimating the covariance matrix under $\text{MTP}_2$ outperforms previous state-of-the-art methods including shrinkage estimators and factor models. △ Less

Submitted 27 December, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: 23 pages, 4 figures

arXiv:1906.09537 [pdf, other]

Algebraic Statistics in Practice: Applications to Networks

Authors: Marta Casanellas, Sonja Petrović, Caroline Uhler

Abstract: Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal structure discovery and phylogenetics. For each pro… ▽ More Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal structure discovery and phylogenetics. For each problem we give an overview of recent results in algebraic statistics with emphasis on the statistical achievements made possible by these tools and their practical relevance for applications to other scientific disciplines. △ Less

Submitted 22 June, 2019; originally announced June 2019.

MSC Class: 62-02; 13-02

arXiv:1906.05159 [pdf, other]

Learning High-dimensional Gaussian Graphical Models under Total Positivity without Adjustment of Tuning Parameters

Authors: Yuhao Wang, Uma Roy, Caroline Uhler

Abstract: We consider the problem of estimating an undirected Gaussian graphical model when the underlying distribution is multivariate totally positive of order 2 (MTP2), a strong form of positive dependence. Such distributions are relevant for example for portfolio selection, since assets are usually positively dependent. A large body of methods have been proposed for learning undirected graphical models… ▽ More We consider the problem of estimating an undirected Gaussian graphical model when the underlying distribution is multivariate totally positive of order 2 (MTP2), a strong form of positive dependence. Such distributions are relevant for example for portfolio selection, since assets are usually positively dependent. A large body of methods have been proposed for learning undirected graphical models without the MTP2 constraint. A major limitation of these methods is that their structure recovery guarantees in the high-dimensional setting usually require a particular choice of a tuning parameter, which is unknown a priori in real world applications. We here propose a new method to estimate the underlying undirected graphical model under MTP2 and show that it is provably consistent in structure recovery without adjusting the tuning parameters. This is achieved by a constraint-based estimator that infers the structure of the underlying graphical model by testing the signs of the empirical partial correlation coefficients. We evaluate the performance of our estimator in simulations and on financial data. △ Less

Submitted 19 March, 2020; v1 submitted 12 June, 2019; originally announced June 2019.

arXiv:1906.00928 [pdf, other]

Anchored Causal Inference in the Presence of Measurement Error

Authors: Basil Saeed, Anastasiya Belyaeva, Yuhao Wang, Caroline Uhler

Abstract: We consider the problem of learning a causal graph in the presence of measurement error. This setting is for example common in genomics, where gene expression is corrupted through the measurement process. We develop a provably consistent procedure for estimating the causal structure in a linear Gaussian structural equation model from corrupted observations on its nodes, under a variety of measurem… ▽ More We consider the problem of learning a causal graph in the presence of measurement error. This setting is for example common in genomics, where gene expression is corrupted through the measurement process. We develop a provably consistent procedure for estimating the causal structure in a linear Gaussian structural equation model from corrupted observations on its nodes, under a variety of measurement error models. We provide an estimator based on the method-of-moments, which can be used in conjunction with constraint-based causal structure discovery algorithms. We prove asymptotic consistency of the procedure and also discuss finite-sample considerations. We demonstrate our method's performance through simulations and on real data, where we recover the underlying gene regulatory network from zero-inflated single-cell RNA-seq data. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1905.00516 [pdf, other]

doi 10.1214/20-AOS2007

Total positivity in exponential families with application to binary variables

Authors: Steffen Lauritzen, Caroline Uhler, Piotr Zwiernik

Abstract: We study exponential families of distributions that are multivariate totally positive of order 2 (MTP2), show that these are convex exponential families, and derive conditions for existence of the MLE. Quadratic exponential familes of MTP2 distributions contain attractive Gaussian graphical models and ferromagnetic Ising models as special examples. We show that these are defined by intersecting th… ▽ More We study exponential families of distributions that are multivariate totally positive of order 2 (MTP2), show that these are convex exponential families, and derive conditions for existence of the MLE. Quadratic exponential familes of MTP2 distributions contain attractive Gaussian graphical models and ferromagnetic Ising models as special examples. We show that these are defined by intersecting the space of canonical parameters with a polyhedral cone whose faces correspond to conditional independence relations. Hence MTP2 serves as an implicit regularizer for quadratic exponential families and leads to sparsity in the estimated graphical model. We prove that the maximum likelihood estimator (MLE) in an MTP2 binary exponential family exists if and only if both of the sign patterns $(1,-1)$ and $(-1,1)$ are represented in the sample for every pair of variables; in particular, this implies that the MLE may exist with $n=d$ observations, in stark contrast to unrestricted binary exponential families where $2^d$ observations are required. Finally, we provide a novel and globally convergent algorithm for computing the MLE for MTP2 Ising models similar to iterative proportional scaling and apply it to the analysis of data from two psychological disorders. △ Less

Submitted 26 July, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

MSC Class: 60E15; 62H99; 15B48

Journal ref: Annals of Statistics 2021, Vol. 49, 1436-1459

arXiv:1903.02054 [pdf, other]

Size of Interventional Markov Equivalence Classes in Random DAG Models

Authors: Dmitriy Katz, Karthikeyan Shanmugam, Chandler Squires, Caroline Uhler

Abstract: Directed acyclic graph (DAG) models are popular for capturing causal relationships. From observational and interventional data, a DAG model can only be determined up to its \emph{interventional Markov equivalence class} (I-MEC). We investigate the size of MECs for random DAG models generated by uniformly sampling and ordering an Erdős-Rényi graph. For constant density, we show that the expected… ▽ More Directed acyclic graph (DAG) models are popular for capturing causal relationships. From observational and interventional data, a DAG model can only be determined up to its \emph{interventional Markov equivalence class} (I-MEC). We investigate the size of MECs for random DAG models generated by uniformly sampling and ordering an Erdős-Rényi graph. For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant. We characterize I-MEC size in a similar fashion in the above settings with high precision. We show that the asymptotic expected number of interventions required to fully identify a DAG is a constant. These results are obtained by exploiting Meek rules and coupling arguments to provide sharp upper and lower bounds on the asymptotic quantities, which are then calculated numerically up to high precision. Our results have important consequences for experimental design of interventions and the development of algorithms for causal inference. △ Less

Submitted 5 March, 2019; originally announced March 2019.

Comments: 19 pages, 5 figures. Accepted to AISTATS 2019

arXiv:1902.10347 [pdf, other]

ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery

Authors: Raj Agrawal, Chandler Squires, Karren Yang, Karthik Shanmugam, Caroline Uhler

Abstract: Determining the causal structure of a set of variables is critical for both scientific inquiry and decision-making. However, this is often challenging in practice due to limited interventional data. Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal dis… ▽ More Determining the causal structure of a set of variables is critical for both scientific inquiry and decision-making. However, this is often challenging in practice due to limited interventional data. Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal discovery. That is, we assume the experimenter is interested in learning some function of the unknown graph (e.g., all descendants of a target node) subject to design constraints such as limits on the number of samples and rounds of experimentation. While it is in general computationally intractable to select an optimal experimental design strategy, we provide a tractable implementation with provable guarantees on both approximation and optimization quality based on submodularity. We evaluate the efficacy of our proposed method on both synthetic and real datasets, thereby demonstrating that our method realizes considerable performance gains over baseline strategies such as random sampling. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: To appear in AISTATS 2019

arXiv:1902.09905 [pdf, other]

Brownian motion tree models are toric

Authors: Bernd Sturmfels, Caroline Uhler, Piotr Zwiernik

Abstract: Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure leads to exact methods for maximum likelihood estimation. Our results also give new insights into the geometry of ultrametric matrices. Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure leads to exact methods for maximum likelihood estimation. Our results also give new insights into the geometry of ultrametric matrices. △ Less

Submitted 26 February, 2019; originally announced February 2019.

Comments: 22 pages, 4 figures

arXiv:1902.03515 [pdf, other]

Multi-Domain Translation by Learning Uncoupled Autoencoders

Authors: Karren D. Yang, Caroline Uhler

Abstract: Multi-domain translation seeks to learn a probabilistic coupling between marginal distributions that reflects the correspondence between different domains. We assume that data from different domains are generated from a shared latent representation based on a structural equation model. Under this assumption, we show that the problem of computing a probabilistic coupling between marginals is equiva… ▽ More Multi-domain translation seeks to learn a probabilistic coupling between marginal distributions that reflects the correspondence between different domains. We assume that data from different domains are generated from a shared latent representation based on a structural equation model. Under this assumption, we show that the problem of computing a probabilistic coupling between marginals is equivalent to learning multiple uncoupled autoencoders that embed to a given shared latent distribution. In addition, we propose a new framework and algorithm for multi-domain translation based on learning the shared latent distribution and training autoencoders under distributional constraints. A key practical advantage of our framework is that new autoencoders (i.e., new domains) can be added sequentially to the model without retraining on the other domains, which we demonstrate experimentally on image as well as genomics datasets. △ Less

Submitted 9 February, 2019; originally announced February 2019.

MSC Class: 68T01

arXiv:1810.11447 [pdf, other]

Scalable Unbalanced Optimal Transport using Generative Adversarial Networks

Authors: Karren D. Yang, Caroline Uhler

Abstract: Generative adversarial networks (GANs) are an expressive class of neural generative models with tremendous success in modeling high-dimensional continuous measures. In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework. We formulate unbalanced OT as a problem of simultaneously learning a transport map and a scaling factor th… ▽ More Generative adversarial networks (GANs) are an expressive class of neural generative models with tremendous success in modeling high-dimensional continuous measures. In this paper, we present a scalable method for unbalanced optimal transport (OT) based on the generative-adversarial framework. We formulate unbalanced OT as a problem of simultaneously learning a transport map and a scaling factor that push a source measure to a target measure in a cost-optimal manner. In addition, we propose an algorithm for solving this problem based on stochastic alternating gradient updates, similar in practice to GANs. We also provide theoretical justification for this formulation, showing that it is closely related to an existing static formulation by Liero et al. (2018), and perform numerical experiments demonstrating how this methodology can be applied to population modeling. △ Less

Submitted 3 August, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

MSC Class: 68T99

Showing 1–50 of 87 results for author: Uhler, C