Search | arXiv e-print repository

arXiv:2201.11164 [pdf]

Focal cortical dysplasia as a cause of epilepsy: the current evidence of associated genes and future therapeutic treatments

Authors: Garrett Garner, Daniel Streetman, Joshua Fricker, Neal Patel, Nolan Brown, Shane Shahrestani, Julian Gendreau

Abstract: Focal cortical dysplasias (FCDs) are the most common cause of treatment resistant epilepsy affecting the pediatric population. Most individuals with FCD have seizure onset during the first five years of life and the majority will have seizures by the age of sixteen. Many cases of FCD are postulated to be the result of abnormal brain development in utero by germline or somatic gene mutations regula… ▽ More Focal cortical dysplasias (FCDs) are the most common cause of treatment resistant epilepsy affecting the pediatric population. Most individuals with FCD have seizure onset during the first five years of life and the majority will have seizures by the age of sixteen. Many cases of FCD are postulated to be the result of abnormal brain development in utero by germline or somatic gene mutations regulating neuronal growth and migration during corticogenesis. Other cases of FCD are thought to be related to infections during brain development, or even other causes still unable to be fully determined. Typical anti-seizure medications are oftentimes ineffective in FCD as well as surgery is unable to be successfully performed due to the involvement of eloquent areas of the brain or insufficient resection of the epileptogenic focus, posing a challenge for physicians. The genetic nature of FCD provides an avenue for drug development with several genetic and molecular targets undergoing study over the last two decades. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2105.00469 [pdf, other]

DRIVE: Machine Learning to Identify Drivers of Cancer with High-Dimensional Genomic Data & Imputed Labels

Authors: Adnan Akbar, Andrey Solovyev, John W Cassidy, Nirmesh Patel, Harry W Clifford

Abstract: Identifying the mutations that drive cancer growth is key in clinical decision making and precision oncology. As driver mutations confer selective advantage and thus have an increased likelihood of occurrence, frequency-based statistical models are currently favoured. These methods are not suited to rare, low frequency, driver mutations. The alternative approach to address this is through function… ▽ More Identifying the mutations that drive cancer growth is key in clinical decision making and precision oncology. As driver mutations confer selective advantage and thus have an increased likelihood of occurrence, frequency-based statistical models are currently favoured. These methods are not suited to rare, low frequency, driver mutations. The alternative approach to address this is through functional-impact scores, however methods using this approach are highly prone to false positives. In this paper, we propose a novel combination method for driver mutation identification, which uses the power of both statistical modelling and functional-impact based methods. Initial results show this approach outperforms the state-of-the-art methods in terms of precision, and provides comparable performance in terms of area under receiver operating characteristic curves (AU-ROC). We believe that data-driven systems based on machine learning, such as these, will become an integral part of precision oncology in the near future. △ Less

Submitted 2 May, 2021; originally announced May 2021.

Comments: Submission to the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:2011.06387 [pdf, other]

Dynamic allocation of limited memory resources in reinforcement learning

Authors: Nisheet Patel, Luigi Acerbi, Alexandre Pouget

Abstract: Biological brains are inherently limited in their capacity to process and store information, but are nevertheless capable of solving complex tasks with apparent ease. Intelligent behavior is related to these limitations, since resource constraints drive the need to generalize and assign importance differentially to features in the environment or memories of past experiences. Recently, there have b… ▽ More Biological brains are inherently limited in their capacity to process and store information, but are nevertheless capable of solving complex tasks with apparent ease. Intelligent behavior is related to these limitations, since resource constraints drive the need to generalize and assign importance differentially to features in the environment or memories of past experiences. Recently, there have been parallel efforts in reinforcement learning and neuroscience to understand strategies adopted by artificial and biological agents to circumvent limitations in information storage. However, the two threads have been largely separate. In this article, we propose a dynamical framework to maximize expected reward under constraints of limited resources, which we implement with a cost function that penalizes precise representations of action-values in memory, each of which may vary in its precision. We derive from first principles an algorithm, Dynamic Resource Allocator (DRA), which we apply to two standard tasks in reinforcement learning and a model-based planning task, and find that it allocates more resources to items in memory that have a higher impact on cumulative rewards. Moreover, DRA learns faster when starting with a higher resource budget than what it eventually allocates for performing well on tasks, which may explain why frontal cortical areas in biological brains appear more engaged in early stages of learning before settling to lower asymptotic levels of activity. Our work provides a normative solution to the problem of learning how to allocate costly resources to a collection of uncertain memories in a manner that is capable of adapting to changes in the environment. △ Less

Submitted 13 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: In Advances in Neural Information Processing Systems 33 (NeurIPS 2020). [16 pages: 9 main + 3 references + 4 supplementary; 4 figures: 3 main + 1 supplementary]

arXiv:1912.04174 [pdf, other]

Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

Authors: Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons, Dami Rebergen, John W Cassidy, Nirmesh Patel, Harry W Clifford

Abstract: The emerging field of precision oncology relies on the accurate pinpointing of alterations in the molecular profile of a tumor to provide personalized targeted treatments. Current methodologies in the field commonly include the application of next generation sequencing technologies to a tumor sample, followed by the identification of mutations in the DNA known as somatic variants. The differentiat… ▽ More The emerging field of precision oncology relies on the accurate pinpointing of alterations in the molecular profile of a tumor to provide personalized targeted treatments. Current methodologies in the field commonly include the application of next generation sequencing technologies to a tumor sample, followed by the identification of mutations in the DNA known as somatic variants. The differentiation of these variants from sequencing error poses a classic classification problem, which has traditionally been approached with Bayesian statistics, and more recently with supervised machine learning methods such as neural networks. Although these methods provide greater accuracy, classic neural networks lack the ability to indicate the confidence of a variant call. In this paper, we explore the performance of deep Bayesian neural networks on next generation sequencing data, and their ability to give probability estimates for somatic variant calls. In addition to demonstrating similar performance in comparison to standard neural networks, we show that the resultant output probabilities make these better suited to the disparate and highly-variable sequencing data-sets these models are likely to encounter in the real world. We aim to deliver algorithms to oncologists for which model certainty better reflects accuracy, for improved clinical application. By moving away from point estimates to reliable confidence intervals, we expect the resultant clinical and treatment decisions to be more robust and more informed by the underlying reality of the tumor molecular profile. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Comments: Bayesian Deep Learning Workshop at NeurIPS 2019. arXiv admin note: text overlap with arXiv:1912.02065

arXiv:1912.02065 [pdf, other]

Safety and Robustness in Decision Making: Deep Bayesian Recurrent Neural Networks for Somatic Variant Calling in Cancer

Authors: Geoffroy Dubourg-Felonneau, Omar Darwish, Christopher Parsons, Dami Rebergen, John W Cassidy, Nirmesh Patel, Harry W Clifford

Abstract: The genomic profile underlying an individual tumor can be highly informative in the creation of a personalized cancer treatment strategy for a given patient; a practice known as precision oncology. This involves next generation sequencing of a tumor sample and the subsequent identification of genomic aberrations, such as somatic mutations, to provide potential candidates of targeted therapy. The i… ▽ More The genomic profile underlying an individual tumor can be highly informative in the creation of a personalized cancer treatment strategy for a given patient; a practice known as precision oncology. This involves next generation sequencing of a tumor sample and the subsequent identification of genomic aberrations, such as somatic mutations, to provide potential candidates of targeted therapy. The identification of these aberrations from sequencing noise and germline variant background poses a classic classification-style problem. This has been previously broached with many different supervised machine learning methods, including deep-learning neural networks. However, these neural networks have thus far not been tailored to give any indication of confidence in the mutation call, meaning an oncologist could be targeting a mutation with a low probability of being true. To address this, we present here a deep bayesian recurrent neural network for cancer variant calling, which shows no degradation in performance compared to standard neural networks. This approach enables greater flexibility through different priors to avoid overfitting to a single dataset. We will be incorporating this approach into software for oncologists to obtain safe, robust, and statistically confident somatic mutation calls for precision oncology treatment choices. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: Safety and Robustness in Decision Making Workshop at NeurIPS 2019

arXiv:1911.13259 [pdf, other]

Flatsomatic: A Method for Compression of Somatic Mutation Profiles in Cancer

Authors: Geoffroy Dubourg-Felonneau, Yasmeen Kussad, Dominic Kirkham, John W Cassidy, Nirmesh Patel, Harry W Clifford

Abstract: In this study, we present Flatsomatic - a Variational Auto Encoder (VAE) optimized to compress somatic mutations that allow for unbiased data compression whilst maintaining the signal. We compared two different neural network architectures for the VAE: Multilayer Perceptron (MLP) and bidirectional LSTM. The somatic profiles we used to train our models consisted of 8,062 Pan-Cancer patients from Th… ▽ More In this study, we present Flatsomatic - a Variational Auto Encoder (VAE) optimized to compress somatic mutations that allow for unbiased data compression whilst maintaining the signal. We compared two different neural network architectures for the VAE: Multilayer Perceptron (MLP) and bidirectional LSTM. The somatic profiles we used to train our models consisted of 8,062 Pan-Cancer patients from The Cancer Genome Atlas and 989 cell lines from the COSMIC cell line project. The profiles for each patient were represented by the genomic loci where somatic mutations occurred and, to reduce sparsity, the locations with a frequency <5 were removed. We enhanced the VAE performance by changing its evidence lower bound, and devised an F1-score based loss showing that it helps the VAE learn better than with binary cross-entropy. We also employed beta-VAE to weight the variational regularisation term in the loss function and showed the best performance through a preliminary function to increase the weight of the regularisation term with each epoch. We assessed the reconstruction ability of the VAE using the micro F1-score metric and showed that our best performing model was a 2-layer deep MLP VAE. Our analysis also showed that the size of the latent space did not have a significant effect on the VAE learning ability. We compared the Flatsomatic embeddings created to a lower dimension version of the data from principal component analysis, showing superior performance of Flatsomatic, and performed K-means clustering on both datasets to draw comparisons to known cancer types of each profile. Finally, we present results that confirm that the Flatsomatic representations of 64 dimensions maintain the same predictive power as the original 8,298 dimensions vector, through prediction of drug response. △ Less

Submitted 27 November, 2019; originally announced November 2019.

Comments: Learning Meaningful Representations of Life Workshop at NeurIPS 2019. arXiv admin note: substantial text overlap with arXiv:1911.09008

arXiv:1911.12774 [pdf, other]

Effective Sub-clonal Cancer Representation to Predict Tumor Evolution

Authors: Adnan Akbar, Geoffroy Dubourg-Felonneau, Andrey Solovyev, John W Cassidy, Nirmesh Patel, Harry W Clifford

Abstract: The majority of cancer treatments end in failure due to Intra-Tumor Heterogeneity (ITH). ITH in cancer is represented by clonal evolution where different sub-clones compete with each other for resources under conditions of Darwinian natural selection. Predicting the growth of these sub-clones within a tumour is among the key challenges of modern cancer research. Predicting tumor behavior enables t… ▽ More The majority of cancer treatments end in failure due to Intra-Tumor Heterogeneity (ITH). ITH in cancer is represented by clonal evolution where different sub-clones compete with each other for resources under conditions of Darwinian natural selection. Predicting the growth of these sub-clones within a tumour is among the key challenges of modern cancer research. Predicting tumor behavior enables the creation of risk profiles for patients and the optimisation of their treatment by therapeutically targeting sub-clones more likely to grow. Current research efforts in this space are focused on mathematical modelling of population genetics to quantify the selective advantage of sub-clones, thus enabling predictions of which sub-clones are more likely to grow. These tumor evolution models are based on assumptions which are not valid for real-world tumor micro-environment. Furthermore, these models are often fit on a single instance of a tumor, and hence prediction models cannot be validated. This paper presents an alternative approach for predicting cancer evolution using a data-driven machine learning method. Our proposed method is based on the intuition that if we can capture the true characteristics of sub-clones within a tumor and represent it in the form of features, a sophisticated machine learning algorithm can be trained to predict its behavior. The work presented here provides a novel approach to predicting cancer evolution, utilizing a data-driver approach. We strongly believe that the accumulation of data from microbiologists, oncologists and machine learning researchers could be used to encapsulate the true essence of tumor sub-clones, and can play a vital role in selecting the best cancer treatments for patients. △ Less

Submitted 28 November, 2019; originally announced November 2019.

Comments: Learning Meaningful Representations of Life Workshop at NeurIPS 2019

arXiv:1911.09008 [pdf, other]

Learning Embeddings from Cancer Mutation Sets for Classification Tasks

Authors: Geoffroy Dubourg-Felonneau, Yasmeen Kussad, Dominic Kirkham, John W Cassidy, Nirmesh Patel, Harry W Clifford

Abstract: Analysis of somatic mutation profiles from cancer patients is essential in the development of cancer research. However, the low frequency of most mutations and the varying rates of mutations across patients makes the data extremely challenging to statistically analyze as well as difficult to use in classification problems, for clustering, visualization or for learning useful information. Thus, the… ▽ More Analysis of somatic mutation profiles from cancer patients is essential in the development of cancer research. However, the low frequency of most mutations and the varying rates of mutations across patients makes the data extremely challenging to statistically analyze as well as difficult to use in classification problems, for clustering, visualization or for learning useful information. Thus, the creation of low dimensional representations of somatic mutation profiles that hold useful information about the DNA of cancer cells will facilitate the use of such data in applications that will progress precision medicine. In this paper, we talk about the open problem of learning from somatic mutations, and present Flatsomatic: a solution that utilizes variational autoencoders (VAEs) to create latent representations of somatic profiles. The work done in this paper shows great potential for this method, with the VAE embeddings performing better than PCA for a clustering task, and performing equally well to the raw high dimensional data for a classification task. We believe the methods presented herein can be of great value in future research and in bringing data-driven models into precision oncology. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Comments: Sets & Partitions Workshop at NeurIPS 2019

arXiv:1811.11674 [pdf, other]

Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant Detection

Authors: Luke R Harries, Suyi Zhang, Geoffroy Dubourg-Felonneau, James H R Farmery, Jonathan Sinai, Belle Taylor, Nirmesh Patel, John W Cassidy, John Shawe-Taylor, Harry W Clifford

Abstract: DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise in improving on these variant calls, but the input for these is still a standardized "pile-up" image, which is not always best suited. In this paper,… ▽ More DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise in improving on these variant calls, but the input for these is still a standardized "pile-up" image, which is not always best suited. In this paper, we present a novel method for generating images from DNA sequencing data, which interlaces the human reference genome with personalized sequencing output, to maximize usage of sequencing reads and improve machine learning algorithm performance. We demonstrate the success of this in improving standard germline variant calling. We also furthered this approach to include somatic variant calling across tumor/normal data with Siamese networks. These approaches can be used in machine learning applications on sequencing data with the hope of improving clinical outcomes, and are freely available for noncommercial use at www.ccg.ai. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:cs/0101200

Report number: ML4H/2018/103

arXiv:1811.10455 [pdf, other]

A Framework for Implementing Machine Learning on Omics Data

Authors: Geoffroy Dubourg-Felonneau, Timothy Cannings, Fergal Cotter, Hannah Thompson, Nirmesh Patel, John W Cassidy, Harry W Clifford

Abstract: The potential benefits of applying machine learning methods to -omics data are becoming increasingly apparent, especially in clinical settings. However, the unique characteristics of these data are not always well suited to machine learning techniques. These data are often generated across different technologies in different labs, and frequently with high dimensionality. In this paper we present a… ▽ More The potential benefits of applying machine learning methods to -omics data are becoming increasingly apparent, especially in clinical settings. However, the unique characteristics of these data are not always well suited to machine learning techniques. These data are often generated across different technologies in different labs, and frequently with high dimensionality. In this paper we present a framework for combining -omics data sets, and for handling high dimensional data, making -omics research more accessible to machine learning applications. We demonstrate the success of this framework through integration and analysis of multi-analyte data for a set of 3,533 breast cancers. We then use this data-set to predict breast cancer patient survival for individuals at risk of an impending event, with higher accuracy and lower variance than methods trained on individual data-sets. We hope that our pipelines for data-set generation and transformation will open up -omics data to machine learning researchers. We have made these freely available for noncommercial use at www.ccg.ai. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/102

arXiv:1703.01107 [pdf, other]

An intracardiac electrogram model to bridge virtual hearts and implantable cardiac devices

Authors: Weiwei Ai, Nitish Patel, Partha Roop, Avinash Malik, Nathan Allen, Mark L. Trew

Abstract: Virtual heart models have been proposed to enhance the safety of implantable cardiac devices through closed loop validation. To communicate with a virtual heart, devices have been driven by cardiac signals at specific sites. As a result, only the action potentials of these sites are sensed. However, the real device implanted in the heart will sense a complex combination of near and far-field extra… ▽ More Virtual heart models have been proposed to enhance the safety of implantable cardiac devices through closed loop validation. To communicate with a virtual heart, devices have been driven by cardiac signals at specific sites. As a result, only the action potentials of these sites are sensed. However, the real device implanted in the heart will sense a complex combination of near and far-field extracellular potential signals. Therefore many device functions, such as blanking periods and refractory periods, are designed to handle these unexpected signals. To represent these signals, we develop an intracardiac electrogram (IEGM) model as an interface between the virtual heart and the device. The model can capture not only the local excitation but also far-field signals and pacing afterpotentials. Moreover, the sensing controller can specify unipolar or bipolar electrogram (EGM) sensing configurations and introduce various oversensing and undersensing modes. The simulation results show that the model is able to reproduce clinically observed sensing problems, which significantly extends the capabilities of the virtual heart model in the context of device validation. △ Less

Submitted 3 March, 2017; originally announced March 2017.

arXiv:1608.03530 [pdf]

Semi-Supervised Prediction of Gene Regulatory Networks Using Machine Learning Algorithms

Authors: Nihir Patel, Jason T. L. Wang

Abstract: Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learni… ▽ More Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabeled data for training. We investigate inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabeled data. We then apply our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluate the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: 25 pages

arXiv:1603.05315 [pdf, ps, other]

Towards the Emulation of the Cardiac Conduction System for Pacemaker Testing

Authors: Eugene Yip, Sidharta Andalam, Partha S. Roop, Avinash Malik, Mark Trew, Weiwei Ai, Nitish Patel

Abstract: The heart is a vital organ that relies on the orchestrated propagation of electrical stimuli to coordinate each heart beat. Abnormalities in the heart's electrical behaviour can be managed with a cardiac pacemaker. Recently, the closed-loop testing of pacemakers with an emulation (real-time simulation) of the heart has been proposed. An emulated heart would provide realistic reactions to the pacem… ▽ More The heart is a vital organ that relies on the orchestrated propagation of electrical stimuli to coordinate each heart beat. Abnormalities in the heart's electrical behaviour can be managed with a cardiac pacemaker. Recently, the closed-loop testing of pacemakers with an emulation (real-time simulation) of the heart has been proposed. An emulated heart would provide realistic reactions to the pacemaker as if it were a real heart. This enables developers to interrogate their pacemaker design without having to engage in costly or lengthy clinical trials. Many high-fidelity heart models have been developed, but are too computationally intensive to be simulated in real-time. Heart models, designed specifically for the closed-loop testing of pacemakers, are too abstract to be useful in the testing of physical pacemakers. In the context of pacemaker testing, this paper presents a more computationally efficient heart model that generates realistic continuous-time electrical signals. The heart model is composed of cardiac cells that are connected by paths. Significant improvements were made to an existing cardiac cell model to stabilise its activation behaviour and to an existing path model to capture the behaviour of continuous electrical propagation. We provide simulation results that show our ability to faithfully model complex re-entrant circuits (that cause arrhythmia) that existing heart models can not. △ Less

Submitted 17 March, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

Showing 1–13 of 13 results for author: Patel, N