-
Season combinatorial intervention predictions with Salt & Peper
Authors:
Thomas Gaudelet,
Alice Del Vecchio,
Eli M Carrami,
Juliana Cudini,
Chantriolnt-Andreas Kapourani,
Caroline Uhler,
Lindsay Edwards
Abstract:
Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate…
▽ More
Interventions play a pivotal role in the study of complex biological systems. In drug discovery, genetic interventions (such as CRISPR base editing) have become central to both identifying potential therapeutic targets and understanding a drug's mechanism of action. With the advancement of CRISPR and the proliferation of genome-scale analyses such as transcriptomics, a new challenge is to navigate the vast combinatorial space of concurrent genetic interventions. Addressing this, our work concentrates on estimating the effects of pairwise genetic combinations on the cellular transcriptome. We introduce two novel contributions: Salt, a biologically-inspired baseline that posits the mostly additive nature of combination effects, and Peper, a deep learning model that extends Salt's additive assumption to achieve unprecedented accuracy. Our comprehensive comparison against existing state-of-the-art methods, grounded in diverse metrics, and our out-of-distribution analysis highlight the limitations of current models in realistic settings. This analysis underscores the necessity for improved modelling techniques and data acquisition strategies, paving the way for more effective exploration of genetic intervention effects.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
PyRelationAL: a python library for active learning research and development
Authors:
Paul Scherer,
Thomas Gaudelet,
Alison Pouplin,
Alice Del Vecchio,
Suraj M S,
Oliver Bolton,
Jyothish Soman,
Jake P. Taylor-King,
Lindsay Edwards
Abstract:
In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying…
▽ More
In constrained real-world scenarios, where it may be challenging or costly to generate data, disciplined methods for acquiring informative new data points are of fundamental importance for the efficient training of machine learning (ML) models. Active learning (AL) is a sub-field of ML focused on the development of methods to iteratively and economically acquire data through strategically querying new data points that are the most useful for a particular task. Here, we introduce PyRelationAL, an open source library for AL research. We describe a modular toolkit that is compatible with diverse ML frameworks (e.g. PyTorch, scikit-learn, TensorFlow, JAX). Furthermore, the library implements a wide range of published methods and provides API access to wide-ranging benchmark datasets and AL task configurations based on existing literature. The library is supplemented by an expansive set of tutorials, demos, and documentation to help users get started. PyRelationAL is maintained using modern software engineering practices -- with an inclusive contributor code of conduct -- to promote long term library quality and utilisation. PyRelationAL is available under a permissive Apache licence on PyPi and at https://github.com/RelationRx/pyrelational.
△ Less
Submitted 17 February, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro
Authors:
Paul Bertin,
Jarrid Rector-Brooks,
Deepak Sharma,
Thomas Gaudelet,
Andrew Anighoro,
Torsten Gross,
Francisco Martinez-Pena,
Eileen L. Tang,
Suraj M S,
Cristian Regep,
Jeremy Hayter,
Maksym Korablyov,
Nicholas Valiante,
Almer van der Sloot,
Mike Tyers,
Charles Roberts,
Michael M. Bronstein,
Luke L. Lairson,
Jake P. Taylor-King,
Yoshua Bengio
Abstract:
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not…
▽ More
For large libraries of small molecules, exhaustive combinatorial chemical screens become infeasible to perform when considering a range of disease models, assay conditions, and dose ranges. Deep learning models have achieved state of the art results in silico for the prediction of synergy scores. However, databases of drug combinations are biased towards synergistic agents and these results do not necessarily generalise out of distribution. We employ a sequential model optimization search utilising a deep learning model to quickly discover synergistic drug combinations active against a cancer cell line, requiring substantially less screening than an exhaustive evaluation. Our small scale wet lab experiments only account for evaluation of ~5% of the total search space. After only 3 rounds of ML-guided in vitro experimentation (including a calibration round), we find that the set of drug pairs queried is enriched for highly synergistic combinations; two additional rounds of ML-guided experiments were performed to ensure reproducibility of trends. Remarkably, we rediscover drug combinations later confirmed to be under study within clinical trials. Moreover, we find that drug embeddings generated using only structural information begin to reflect mechanisms of action. Prior in silico benchmarking suggests we can enrich search queries by a factor of ~5-10x for highly synergistic drug combinations by using sequential rounds of evaluation when compared to random selection, or by a factor of >3x when using a pretrained model selecting all drug combinations at a single time point.
△ Less
Submitted 2 March, 2023; v1 submitted 6 February, 2022;
originally announced February 2022.
-
Utilising Graph Machine Learning within Drug Discovery and Development
Authors:
Thomas Gaudelet,
Ben Day,
Arian R. Jamasb,
Jyothish Soman,
Cristian Regep,
Gertrude Liu,
Jeremy B. R. Hayter,
Richard Vickers,
Charles Roberts,
Jian Tang,
David Roblin,
Tom L. Blundell,
Michael M. Bronstein,
Jake P. Taylor-King
Abstract:
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development…
▽ More
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarise work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest graph machine learning will become a modelling framework of choice within biomedical machine learning.
△ Less
Submitted 10 February, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Integrative Data Analytic Framework to Enhance Cancer Precision Medicine
Authors:
Thomas Gaudelet,
Noel Malod-Dognin,
Natasa Przulj
Abstract:
With the advancement of high-throughput biotechnologies, we increasingly accumulate biomedical data about diseases, especially cancer. There is a need for computational models and methods to sift through, integrate, and extract new knowledge from the diverse available data to improve the mechanistic understanding of diseases and patient care. To uncover molecular mechanisms and drug indications fo…
▽ More
With the advancement of high-throughput biotechnologies, we increasingly accumulate biomedical data about diseases, especially cancer. There is a need for computational models and methods to sift through, integrate, and extract new knowledge from the diverse available data to improve the mechanistic understanding of diseases and patient care. To uncover molecular mechanisms and drug indications for specific cancer types, we develop an integrative framework able to harness a wide range of diverse molecular and pan-cancer data. We show that our approach outperforms competing methods and can identify new associations. Furthermore, through the joint integration of data sources, our framework can also uncover links between cancer types and molecular entities for which no prior knowledge is available. Our new framework is flexible and can be easily reformulated to study any biomedical problems.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Unveiling new disease, pathway, and gene associations via multi-scale neural networks
Authors:
Thomas Gaudelet,
Noel Malod-Dognin,
Jon Sanchez-Valle,
Vera Pancaldi,
Alfonso Valencia,
Natasa Przulj
Abstract:
Diseases involve complex processes and modifications to the cellular machinery. The gene expression profile of the affected cells contains characteristic patterns linked to a disease. Hence, biological knowledge pertaining to a disease can be derived from a patient cell's profile, improving our diagnosis ability, as well as our grasp of disease risks. This knowledge can be used for drug re-purposi…
▽ More
Diseases involve complex processes and modifications to the cellular machinery. The gene expression profile of the affected cells contains characteristic patterns linked to a disease. Hence, biological knowledge pertaining to a disease can be derived from a patient cell's profile, improving our diagnosis ability, as well as our grasp of disease risks. This knowledge can be used for drug re-purposing, or by physicians to evaluate a patient's condition and co-morbidity risk. Here, we look at differential gene expression obtained from microarray technology for patients diagnosed with various diseases. Based on this data and cellular multi-scale organization, we aim to uncover disease--disease links, as well as disease-gene and disease--pathways associations. We propose neural networks with structures inspired by the multi-scale organization of a cell. We show that these models are able to correctly predict the diagnosis for the majority of the patients. Through the analysis of the trained models, we predict and validate disease-disease, disease-pathway, and disease-gene associations with comparisons to known interactions and literature search, proposing putative explanations for the novel predictions that come from our study.
△ Less
Submitted 10 April, 2020; v1 submitted 28 January, 2019;
originally announced January 2019.
-
Higher order molecular organisation as a source of biological function
Authors:
Thomas Gaudelet,
Noel Malod-Dognin,
Natasa Przulj
Abstract:
Molecular interactions have widely been modelled as networks. The local wiring patterns around molecules in molecular networks are linked with their biological functions. However, networks model only pairwise interactions between molecules and cannot explicitly and directly capture the higher order molecular organisation, such as protein complexes and pathways. Hence, we ask if hypergraphs (hypern…
▽ More
Molecular interactions have widely been modelled as networks. The local wiring patterns around molecules in molecular networks are linked with their biological functions. However, networks model only pairwise interactions between molecules and cannot explicitly and directly capture the higher order molecular organisation, such as protein complexes and pathways. Hence, we ask if hypergraphs (hypernetworks), that directly capture entire complexes and pathways along with protein-protein interactions (PPIs), carry additional functional information beyond what can be uncovered from networks of pairwise molecular interactions. The mathematical formalism of a hypergraph has long been known, but not often used in studying molecular networks due to the lack of sophisticated algorithms for mining the underlying biological information hidden in the wiring patterns of molecular systems modelled as hypernetworks.
We propose a new, multi-scale, protein interaction hypernetwork model that utilizes hypergraphs to capture different scales of protein organization, including PPIs, protein complexes and pathways. In analogy to graphlets, we introduce hypergraphlets, small, connected, non-isomorphic, induced sub-hypergraphs of a hypergraph, to quantify the local wiring patterns of these multi-scale molecular hypergraphs and to mine them for new biological information. We apply them to model the multi-scale protein networks of baker yeast and human and show that the higher order molecular organisation captured by these hypergraphs is strongly related to the underlying biology. Importantly, we demonstrate that our new models and data mining tools reveal different, but complementary biological information compared to classical PPI networks. We apply our hypergraphlets to successfully predict biological functions of uncharacterised proteins.
△ Less
Submitted 20 September, 2018; v1 submitted 13 April, 2018;
originally announced April 2018.