-
Directed Graph Embeddings in Pseudo-Riemannian Manifolds
Authors:
Aaron Sim,
Maciej Wiatrak,
Angus Brayne,
Páidí Creed,
Saee Paliwal
Abstract:
The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorpora…
▽ More
The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness
Authors:
Adam Foster,
Árpi Vezér,
Craig A Glastonbury,
Páidí Creed,
Sam Abujudeh,
Aaron Sim
Abstract:
Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive…
▽ More
Learning meaningful representations of data that can address challenges such as batch effect correction and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we show that marginal independence between the representation and a condition variable plays a key role in both of these challenges. We propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty defined in terms of mixtures of the variational posteriors to enforce this independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, and we prove counterfactual identifiability of CoMP under additional assumptions. We demonstrate state-of-the-art performance on a set of challenging tasks including aligning human tumour samples with cancer cell-lines, predicting transcriptome-level perturbation responses, and batch correction on single-cell RNA sequencing data. We also find parallels to fair representation learning and demonstrate that CoMP is competitive on a common task in the field.
△ Less
Submitted 26 June, 2022; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Interpretable Graph Convolutional Neural Networks for Inference on Noisy Knowledge Graphs
Authors:
Daniel Neil,
Joss Briody,
Alix Lacoste,
Aaron Sim,
Paidi Creed,
Amir Saffari
Abstract:
In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges for biomedical knowledge graphs (KGs). We introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets, but also favorably accommodates noise in KGs, a pervasive issue in real-world applications. F…
▽ More
In this work, we provide a new formulation for Graph Convolutional Neural Networks (GCNNs) for link prediction on graph data that addresses common challenges for biomedical knowledge graphs (KGs). We introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets, but also favorably accommodates noise in KGs, a pervasive issue in real-world applications. Further, we explore new visualization methods for interpretable modelling and to illustrate how the learned representation can be exploited to automate dataset denoising. The results are demonstrated on a synthetic dataset, the common benchmark dataset FB15k-237, and a large biomedical knowledge graph derived from a combination of noisy and clean data sources. Using these improvements, we visualize a learned model's representation of the disease cystic fibrosis and demonstrate how to interrogate a neural network to show the potential of PPARG as a candidate therapeutic target for rheumatoid arthritis.
△ Less
Submitted 1 December, 2018;
originally announced December 2018.
-
An Algebraic Theory of Complexity for Discrete Optimisation
Authors:
David A. Cohen,
Martin C. Cooper,
Paidi Creed,
Peter G. Jeavons,
Stanislav Zivny
Abstract:
Discrete optimisation problems arise in many different areas and are studied under many different names. In many such problems the quantity to be optimised can be expressed as a sum of functions of a restricted form. Here we present a unifying theory of complexity for problems of this kind. We show that the complexity of a finite-domain discrete optimisation problem is determined by certain algebr…
▽ More
Discrete optimisation problems arise in many different areas and are studied under many different names. In many such problems the quantity to be optimised can be expressed as a sum of functions of a restricted form. Here we present a unifying theory of complexity for problems of this kind. We show that the complexity of a finite-domain discrete optimisation problem is determined by certain algebraic properties of the objective function, which we call weighted polymorphisms. We define a Galois connection between sets of rational-valued functions and sets of weighted polymorphisms and show how the closed sets of this Galois connection can be characterised.
These results provide a new approach to studying the complexity of discrete optimisation. We use this approach to identify certain maximal tractable subproblems of the general problem, and hence derive a complete classification of complexity for the Boolean case.
△ Less
Submitted 28 July, 2012;
originally announced July 2012.
-
The number of Euler tours of a random directed graph
Authors:
Páidí Creed,
Mary Cryan
Abstract:
In this paper we obtain the expectation and variance of the number of Euler tours of a random Eulerian directed graph with fixed out-degree sequence. We use this to obtain the asymptotic distribution of the number of Euler tours of a random $d$-in/$d$-out graph and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Eule…
▽ More
In this paper we obtain the expectation and variance of the number of Euler tours of a random Eulerian directed graph with fixed out-degree sequence. We use this to obtain the asymptotic distribution of the number of Euler tours of a random $d$-in/$d$-out graph and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler tours yields algorithms running in expected polynomial time for almost every $d$-in/$d$-out graph. We make use of the BEST theorem of de Bruijn, van Aardenne-Ehrenfest, Smith and Tutte, which shows that the number of Euler tours of an Eulerian directed graph with out-degree sequence $\mathbf{d}$ is the product of the number of arborescences and the term $\frac{1}{n}[\prod_{v \in V}(d_v-1)!]$. Therefore most of our effort is towards estimating the moments of the number of arborescences of a random graph with fixed out-degree sequence.
△ Less
Submitted 9 February, 2012;
originally announced February 2012.
-
The tractability of CSP classes defined by forbidden patterns
Authors:
David A. Cohen,
Martin C. Cooper,
Páidí Creed,
András Z. Salamon
Abstract:
The constraint satisfaction problem (CSP) is a general problem central to computer science and artificial intelligence. Although the CSP is NP-hard in general, considerable effort has been spent on identifying tractable subclasses. The main two approaches consider structural properties (restrictions on the hypergraph of constraint scopes) and relational properties (restrictions on the language of…
▽ More
The constraint satisfaction problem (CSP) is a general problem central to computer science and artificial intelligence. Although the CSP is NP-hard in general, considerable effort has been spent on identifying tractable subclasses. The main two approaches consider structural properties (restrictions on the hypergraph of constraint scopes) and relational properties (restrictions on the language of constraint relations). Recently, some authors have considered hybrid properties that restrict the constraint hypergraph and the relations simultaneously.
Our key contribution is the novel concept of a CSP pattern and classes of problems defined by forbidden patterns (which can be viewed as forbidding generic subproblems). We describe the theoretical framework which can be used to reason about classes of problems defined by forbidden patterns. We show that this framework generalises relational properties and allows us to capture known hybrid tractable classes.
Although we are not close to obtaining a dichotomy concerning the tractability of general forbidden patterns, we are able to make some progress in a special case: classes of problems that arise when we can only forbid binary negative patterns (generic subproblems in which only inconsistent tuples are specified). In this case we are able to characterise very large classes of tractable and NP-hard forbidden patterns. This leaves the complexity of just one case unresolved and we conjecture that this last case is tractable.
△ Less
Submitted 8 July, 2014; v1 submitted 8 March, 2011;
originally announced March 2011.
-
Sampling Eulerian orientations of triangular lattice graphs
Authors:
Paidi Creed
Abstract:
We consider the problem of sampling from the uniform distribution on the set of Eulerian orientations of subgraphs of the triangular lattice. Although it is known that this can be achieved in polynomial time for any graph, the algorithm studied here is more natural in the context of planar Eulerian graphs. We analyse the mixing time of a Markov chain on the Eulerian orientations of a planar grap…
▽ More
We consider the problem of sampling from the uniform distribution on the set of Eulerian orientations of subgraphs of the triangular lattice. Although it is known that this can be achieved in polynomial time for any graph, the algorithm studied here is more natural in the context of planar Eulerian graphs. We analyse the mixing time of a Markov chain on the Eulerian orientations of a planar graph which moves between orientations by reversing the edges of directed faces. Using path coupling and the comparison method we obtain a polynomial upper bound on the mixing time of this chain for any solid subgraph of the triangular lattice. By considering the conductance of the chain we show that there exist subgraphs with holes for which the chain will always take an exponential amount of time to converge. Finally, as an additional justification for studying a Markov chain on the set of Eulerian orientations of planar graphs, we show that the problem of counting Eulerian orientations remains #P-complete when restricted to planar graphs.
A preliminary version of this work appeared as an extended abstract in the 2nd Algorithms and Complexity in Durham workshop.
△ Less
Submitted 7 March, 2007;
originally announced March 2007.