-
Whole genome sequencing identifies putative associations between genomic polymorphisms and clinical response to the antiepileptic drug levetiracetam
Authors:
DV Vavoulis,
AT Pagnamenta,
SJL Knight,
MM Pentony,
M Armstrong,
EC Galizia,
S Balestrini,
SM Sisodiya,
JC Taylor
Abstract:
In the context of pharmacogenomics, whole genome sequencing provides a powerful approach for identifying correlations between response variability to specific drugs and genomic polymorphisms in a population, in an unbiased manner. In this study, we employed whole genome sequencing of DNA samples from patients showing extreme response (n=72) and non-response (n=27) to the antiepileptic drug levetir…
▽ More
In the context of pharmacogenomics, whole genome sequencing provides a powerful approach for identifying correlations between response variability to specific drugs and genomic polymorphisms in a population, in an unbiased manner. In this study, we employed whole genome sequencing of DNA samples from patients showing extreme response (n=72) and non-response (n=27) to the antiepileptic drug levetiracetam, in order to identify genomic variants that underlie response to the drug. Although no common SNP (MAF>5%) crossed the conventional genome-wide significance threshold of 5e-8, we found common polymorphisms in genes SPNS3, HDC, MDGA2, NSG1 and RASGEF1C, which collectively predict clinical response to levetiracetam in our cohort with ~91% predictive accuracy. Among these genes, HDC, NSG1, MDGA2 and RASGEF1C are potentially implicated in synaptic neurotransmission, while SPNS3 is an atypical solute carrier transporter homologous to SV2A, the known molecular target of levetiracetam. Furthermore, we performed gene- and pathway-based statistical analysis on sets of rare and low-frequency variants (MAF<5%) and we identified associations between the following genes or pathways and response to levetiracetam: a) genes PRKCB and DLG2, which are involved in glutamatergic neurotransmission, a known target of anticonvulsants, including levetiracetam; b) genes FILIP1 and SEMA6D, which are involved in axon guidance and modelling of neural connections; and c) pathways with a role in synaptic neurotransmission, such as WNT5A-dependent internalization of FZD4 and disinhibition of SNARE formation. In summary, our approach to utilise whole genome sequencing on subjects with extreme response phenotypes is a feasible route to generate plausible hypotheses for investigating the genetic factors underlying drug response variability in cases of pharmaco-resistant epilepsy.
△ Less
Submitted 23 September, 2019;
originally announced September 2019.
-
Exploring Bayesian approaches to eQTL map** through probabilistic programming
Authors:
Dimitrios V Vavoulis
Abstract:
The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such models is the implementation of bespoke inference methodologies, a process which can become quite laborious, when multiple candidate models are b…
▽ More
The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such models is the implementation of bespoke inference methodologies, a process which can become quite laborious, when multiple candidate models are being considered. We describe automatic, black-box inference in such models using Stan, a popular probabilistic programming language. The utilisation of systems like Stan can facilitate model prototy** and testing, thus accelerating the data modelling process. The code described in this chapter can be found at https://github.com/dvav/eQTLBookChapter.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
DGEclust: differential expression analysis of clustered count data
Authors:
Dimitrios V Vavoulis,
Margherita Francescatto,
Peter Heutink,
Julian Gough
Abstract:
Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported…
▽ More
Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported by the data) and uncertainty in parameter estimation. We show how this methodology can be utilised in differential expression analysis and we demonstrate its applicability on a more general class of problems and higher accuracy, when compared to popular alternatives. DGEclust is freely available at https://bitbucket.org/DimitrisVavoulis/dgeclust
△ Less
Submitted 4 May, 2014;
originally announced May 2014.
-
Non-parametric Bayesian modelling of digital gene expression data
Authors:
Dimitrios V. Vavoulis,
Julian Gough
Abstract:
Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or…
▽ More
Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues.
△ Less
Submitted 17 January, 2013;
originally announced January 2013.
-
A self-organizing state-space-model approach for parameter estimation in Hodgkin-Huxley-type models of single neurons
Authors:
Dimitrios V. Vavoulis,
Volko A. Straub,
John A. D. Aston,
Jianfeng Feng
Abstract:
Traditionally, parameter estimation in biophysical neuron and neural network models usually adopts a global search algorithm, often combined with a local search method in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conduc…
▽ More
Traditionally, parameter estimation in biophysical neuron and neural network models usually adopts a global search algorithm, often combined with a local search method in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conductance-based models of single neurons from a different perspective. By adopting a hidden-dynamical-systems formalism, we expressed parameter estimation as an inference problem in these systems, which can then be tackled using well-established statistical inference methods. The particular method we used was Kitagawa's self-organizing state-space model, which was applied on a number of Hodgkin-Huxley models using simulated or actual electrophysiological data. We showed that the algorithm can be used to estimate a large number of parameters, including maximal conductances, reversal potentials, kinetics of ionic currents and measurement noise, based on low-dimensional experimental data and sufficiently informative priors in the form of pre-defined constraints imposed on model parameters. The algorithm remained operational even when very noisy experimental data were used. Importantly, by combining the self-organizing state-space model with an adaptive sampling algorithm akin to the Covariance Matrix Adaptation Evolution Strategy we achieved a significant reduction in the variance of parameter estimates. The algorithm did not require the explicit formulation of a cost function and it was straightforward to apply on compartmental models and multiple data sets. Overall, the proposed methodology is particularly suitable for resolving high-dimensional inference problems based on noisy electrophysiological data and, therefore, a potentially useful tool in the construction of biophysical neuron models.
△ Less
Submitted 29 October, 2011; v1 submitted 21 June, 2011;
originally announced June 2011.