-
PyRelationAL: a python library for active learning research and development
Authors:
Paul Scherer,
Thomas Gaudelet,
Alison Pouplin,
Alice Del Vecchio,
Suraj M S,
Oliver Bolton,
Jyothish Soman,
Jake P. Taylor-King,
Lindsay Edwards
Abstract:
In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying…
▽ More
In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit that is compatible with diverse ML frameworks (e.g. PyTorch, scikit-learn, TensorFlow, JAX). Furthermore, the library implements a wide range of published methods and provides API access to wide-ranging benchmark datasets and AL task configurations based on existing literature. The library is supplemented by an expansive set of tutorials, demos, and documentation to help users get started. PyRelationAL is maintained using modern software engineering practices -- with an inclusive contributor code of conduct -- to promote long term library quality and utilisation. PyRelationAL is available under a permissive Apache licence on PyPi and at https://github.com/RelationRx/pyrelational.
△ Less
Submitted 17 February, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro
Authors:
Paul Bertin,
Jarrid Rector-Brooks,
Deepak Sharma,
Thomas Gaudelet,
Andrew Anighoro,
Torsten Gross,
Francisco Martinez-Pena,
Eileen L. Tang,
Suraj M S,
Cristian Regep,
Jeremy Hayter,
Maksym Korablyov,
Nicholas Valiante,
Almer van der Sloot,
Mike Tyers,
Charles Roberts,
Michael M. Bronstein,
Luke L. Lairson,
Jake P. Taylor-King,
Yoshua Bengio
Abstract:
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not…
▽ More
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not necessarily generalise out of distribution. We employ a sequential model optimization search utilising a deep learning model to quickly discover synergistic drug combinations active against a cancer cell line, requiring substantially less screening than an exhaustive evaluation. Our small scale wet lab experiments only account for evaluation of ~5% of the total search space. After only 3 rounds of ML-guided in vitro experimentation (including a calibration round), we find that the set of drug pairs queried is enriched for highly synergistic combinations; two additional rounds of ML-guided experiments were performed to ensure reproducibility of trends. Remarkably, we rediscover drug combinations later confirmed to be under study within clinical trials. Moreover, we find that drug embeddings generated using only structural information begin to reflect mechanisms of action. Prior in silico benchmarking suggests we can enrich search queries by a factor of ~5-10x for highly synergistic drug combinations by using sequential rounds of evaluation when compared to random selection, or by a factor of >3x when using a pretrained model selecting all drug combinations at a single time point.
△ Less
Submitted 2 March, 2023; v1 submitted 6 February, 2022;
originally announced February 2022.
-
Utilising Graph Machine Learning within Drug Discovery and Development
Authors:
Thomas Gaudelet,
Ben Day,
Arian R. Jamasb,
Jyothish Soman,
Cristian Regep,
Gertrude Liu,
Jeremy B. R. Hayter,
Richard Vickers,
Charles Roberts,
Jian Tang,
David Roblin,
Tom L. Blundell,
Michael M. Bronstein,
Jake P. Taylor-King
Abstract:
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development…
▽ More
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarise work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest graph machine learning will become a modelling framework of choice within biomedical machine learning.
△ Less
Submitted 10 February, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Sparse Dynamic Distribution Decomposition: Efficient Integration of Trajectory and Snapshot Time Series Data
Authors:
Jake P. Taylor-King,
Cristian Regep,
Jyothish Soman,
Flawnson Tong,
Catalina Cangea,
Charlie Roberts
Abstract:
Dynamic Distribution Decomposition (DDD) was introduced in Taylor-King et. al. (PLOS Comp Biol, 2020) as a variation on Dynamic Mode Decomposition. In brief, by using basis functions over a continuous state space, DDD allows for the fitting of continuous-time Markov chains over these basis functions and as a result continuously maps between distributions. The number of parameters in DDD scales by…
▽ More
Dynamic Distribution Decomposition (DDD) was introduced in Taylor-King et. al. (PLOS Comp Biol, 2020) as a variation on Dynamic Mode Decomposition. In brief, by using basis functions over a continuous state space, DDD allows for the fitting of continuous-time Markov chains over these basis functions and as a result continuously maps between distributions. The number of parameters in DDD scales by the square of the number of basis functions; we reformulate the problem and restrict the method to compact basis functions which leads to the inference of sparse matrices only -- hence reducing the number of parameters. Finally, we demonstrate how DDD is suitable to integrate both trajectory time series (paired between subsequent time points) and snapshot time series (unpaired time points). Methods capable of integrating both scenarios are particularly relevant for the analysis of biomedical data, whereby studies observe population at fixed time points (snapshots) and individual patient journeys with repeated follow ups (trajectories).
△ Less
Submitted 11 June, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Operator Fitting for Parameter Estimation of Stochastic Differential Equations
Authors:
Asbjørn N. Riseth,
Jake P. Taylor-King
Abstract:
Estimation of parameters is a crucial part of model development. When models are deterministic, one can minimise the fitting error; for stochastic systems one must be more careful. Broadly parameterisation methods for stochastic dynamical systems fit into maximum likelihood estimation- and method of moment-inspired techniques. We propose a method where one matches a finite dimensional approximatio…
▽ More
Estimation of parameters is a crucial part of model development. When models are deterministic, one can minimise the fitting error; for stochastic systems one must be more careful. Broadly parameterisation methods for stochastic dynamical systems fit into maximum likelihood estimation- and method of moment-inspired techniques. We propose a method where one matches a finite dimensional approximation of the Koopman operator with the implied Koopman operator as generated by an extended dynamic mode decomposition approximation. One advantage of this approach is that the objective evaluation cost can be independent the number of samples for some dynamical systems. We test our approach on two simple systems in the form of stochastic differential equations, compare to benchmark techniques, and consider limited eigen-expansions of the operators being approximated. Other small variations on the technique are also considered, and we discuss the advantages to our formulation.
△ Less
Submitted 11 April, 2018; v1 submitted 15 September, 2017;
originally announced September 2017.
-
Simulated Ablation for Detection of Cells Impacting Paracrine Signalling in Histology Analysis
Authors:
Jake P. Taylor-King,
Etienne Baratchart,
Andrew Dhawan,
Elizabeth A. Coker,
Inga Hansine Rye,
Hege Russnes,
S. Jon Chapman,
David Basanta,
Andriy Marusyk
Abstract:
Intra-tumour phenotypic heterogeneity limits accuracy of clinical diagnostics and hampers the efficiency of anti-cancer therapies. Dealing with this cellular heterogeneity requires adequate understanding of its sources, which is extremely difficult, as phenotypes of tumour cells integrate hardwired (epi)mutational differences with the dynamic responses to microenvironmental cues. The later come in…
▽ More
Intra-tumour phenotypic heterogeneity limits accuracy of clinical diagnostics and hampers the efficiency of anti-cancer therapies. Dealing with this cellular heterogeneity requires adequate understanding of its sources, which is extremely difficult, as phenotypes of tumour cells integrate hardwired (epi)mutational differences with the dynamic responses to microenvironmental cues. The later come in form of both direct physical interactions, as well as inputs from gradients of secreted signalling molecules. Furthermore, tumour cells can not only receive microenvironmental cues, but also produce them. Despite high biological and clinical importance of understanding spatial aspects of paracrine signaling, adequate research tools are largely lacking. Here, a partial differential equation (PDE) based mathematical model is developed that mimics the process of cell ablation. This model suggests how each cell might contribute to the microenvironment by either absorbing or secreting diffusible factors, and quantifies the extent to which observed intensities can be explained via diffusion mediated signalling. The model allows for the separation of phenotypic responses to signalling gradients within tumour microenvironments from the combined influence of responses mediated by direct physical contact and hardwired (epi)genetic differences. The differential equation is solved around cell membrane outlines using a finite element method (FEM). The method is applied to a multi-channel immunofluorescence in situ hybridization (iFISH) stained breast cancer histological specimen and correlations are investigated between: HER2 gene amplification; HER2 protein expression; and cell interaction with the diffusible microenvironment. This approach allows partial deconvolution of the complex inputs...
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
A Mean-Field Approach to Evolving Spatial Networks, with an Application to Osteocyte Network Formation
Authors:
Jake P. Taylor-King,
David Basanta,
S. Jonathan Chapman,
Mason A. Porter
Abstract:
We consider evolving networks in which each node can have various associated properties (a state) in addition to those that arise from network structure. For example, each node can have a spatial location and a velocity, or some more abstract internal property that describes something like social trait. Edges between nodes are created and destroyed, and new nodes enter the system. We introduce a "…
▽ More
We consider evolving networks in which each node can have various associated properties (a state) in addition to those that arise from network structure. For example, each node can have a spatial location and a velocity, or some more abstract internal property that describes something like social trait. Edges between nodes are created and destroyed, and new nodes enter the system. We introduce a "local state degree distribution" (LSDD) as the degree distribution at a particular point in state space. We then make a mean-field assumption and thereby derive an integro-partial differential equation that is satisfied by the LSDD. We perform numerical experiments and find good agreement between solutions of the integro-differential equation and the LSDD from stochastic simulations of the full model. To illustrate our theory, we apply it to a simple continuum model for osteocyte network formation within bones, with a view to understanding changes that may take place during cancer. Our results suggest that increased rates of differentiation lead to higher densities of osteocytes but with a lower number of dendrites. To help provide biological context, we also include an introduction to osteocytes, the formation of osteocyte networks, and the role of osteocytes in bona metastasis.
△ Less
Submitted 31 January, 2017;
originally announced February 2017.
-
A Fractional Diffusion Equation for an n-Dimensional Correlated Levy Walk
Authors:
J. P. Taylor-King,
R. Klages,
S. Fedotov,
R. A. Van Gorder
Abstract:
Levy walks define a fundamental concept in random walk theory which allows one to model diffusive spreading that is faster than Brownian motion. They have many applications across different disciplines. However, so far the derivation of a diffusion equation for an n-dimensional correlated Levy walk remained elusive. Starting from a fractional Klein-Kramers equation here we use a moment method comb…
▽ More
Levy walks define a fundamental concept in random walk theory which allows one to model diffusive spreading that is faster than Brownian motion. They have many applications across different disciplines. However, so far the derivation of a diffusion equation for an n-dimensional correlated Levy walk remained elusive. Starting from a fractional Klein-Kramers equation here we use a moment method combined with a Cattaneo approximation to derive a fractional diffusion equation for superdiffusive short range auto-correlated Levy walks in the large time limit, and solve it. Our derivation discloses different dynamical mechanisms leading to correlated Levy walk diffusion in terms of quantities that can be measured experimentally.
△ Less
Submitted 10 June, 2016;
originally announced June 2016.
-
From birds to bacteria: generalised velocity jump processes with resting states
Authors:
Jake P. Taylor-King,
Emiel van Loon,
Gabriel Rosser,
S. Jon Chapman
Abstract:
There are various cases of animal movement where behaviour broadly switches between two modes of operation, corresponding to a long distance movement state and a resting or local movement state. Here a mathematical description of this process is formulated, adapted from Friedrich et. al. (2006). The approach allows the specification any running or waiting time distribution along with any angular a…
▽ More
There are various cases of animal movement where behaviour broadly switches between two modes of operation, corresponding to a long distance movement state and a resting or local movement state. Here a mathematical description of this process is formulated, adapted from Friedrich et. al. (2006). The approach allows the specification any running or waiting time distribution along with any angular and speed distributions. The resulting system of partial integro-differential equations are tumultuous and therefore it is necessary to both simplify and derive summary statistics. An expression for the mean squared displacement is derived which shows good agreement with experimental data from the bacterium Escherichia coli and the gull Larus fuscus. Finally a large time diffusive approximation is considered via a Cattaneo approximation (Hillen, 2004). This leads to the novel result that the effective diffusion constant is dependent on the mean and variance of the running time distribution but only on the mean of the waiting time distribution.
△ Less
Submitted 30 July, 2014;
originally announced July 2014.
-
Mathematical Modelling of Turning Delays in Swarm Robotics
Authors:
Jake P. Taylor-King,
Benjamin Franz,
Christian A. Yates,
Radek Erban
Abstract:
We investigate the effect of turning delays on the behaviour of groups of differential wheeled robots and show that the group-level behaviour can be described by a transport equation with a suitably incorporated delay. The results of our mathematical analysis are supported by numerical simulations and experiments with e-puck robots. The experimental quantity we compare to our revised model is the…
▽ More
We investigate the effect of turning delays on the behaviour of groups of differential wheeled robots and show that the group-level behaviour can be described by a transport equation with a suitably incorporated delay. The results of our mathematical analysis are supported by numerical simulations and experiments with e-puck robots. The experimental quantity we compare to our revised model is the mean time for robots to find the target area in an unknown environment. The transport equation with delay better predicts the mean time to find the target than the standard transport equation without delay.
△ Less
Submitted 30 September, 2014; v1 submitted 12 January, 2014;
originally announced January 2014.