Search | arXiv e-print repository

arXiv:1909.10580 [pdf]

Whole genome sequencing identifies putative associations between genomic polymorphisms and clinical response to the antiepileptic drug levetiracetam

Authors: DV Vavoulis, AT Pagnamenta, SJL Knight, MM Pentony, M Armstrong, EC Galizia, S Balestrini, SM Sisodiya, JC Taylor

Abstract: In the context of pharmacogenomics, whole genome sequencing provides a powerful approach for identifying correlations between response variability to specific drugs and genomic polymorphisms in a population, in an unbiased manner. In this study, we employed whole genome sequencing of DNA samples from patients showing extreme response (n=72) and non-response (n=27) to the antiepileptic drug levetir… ▽ More In the context of pharmacogenomics, whole genome sequencing provides a powerful approach for identifying correlations between response variability to specific drugs and genomic polymorphisms in a population, in an unbiased manner. In this study, we employed whole genome sequencing of DNA samples from patients showing extreme response (n=72) and non-response (n=27) to the antiepileptic drug levetiracetam, in order to identify genomic variants that underlie response to the drug. Although no common SNP (MAF>5%) crossed the conventional genome-wide significance threshold of 5e-8, we found common polymorphisms in genes SPNS3, HDC, MDGA2, NSG1 and RASGEF1C, which collectively predict clinical response to levetiracetam in our cohort with ~91% predictive accuracy. Among these genes, HDC, NSG1, MDGA2 and RASGEF1C are potentially implicated in synaptic neurotransmission, while SPNS3 is an atypical solute carrier transporter homologous to SV2A, the known molecular target of levetiracetam. Furthermore, we performed gene- and pathway-based statistical analysis on sets of rare and low-frequency variants (MAF<5%) and we identified associations between the following genes or pathways and response to levetiracetam: a) genes PRKCB and DLG2, which are involved in glutamatergic neurotransmission, a known target of anticonvulsants, including levetiracetam; b) genes FILIP1 and SEMA6D, which are involved in axon guidance and modelling of neural connections; and c) pathways with a role in synaptic neurotransmission, such as WNT5A-dependent internalization of FZD4 and disinhibition of SNARE formation. In summary, our approach to utilise whole genome sequencing on subjects with extreme response phenotypes is a feasible route to generate plausible hypotheses for investigating the genetic factors underlying drug response variability in cases of pharmaco-resistant epilepsy. △ Less

Submitted 23 September, 2019; originally announced September 2019.

arXiv:1906.05150 [pdf, other]

Exploring Bayesian approaches to eQTL map** through probabilistic programming

Authors: Dimitrios V Vavoulis

Abstract: The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such models is the implementation of bespoke inference methodologies, a process which can become quite laborious, when multiple candidate models are b… ▽ More The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such models is the implementation of bespoke inference methodologies, a process which can become quite laborious, when multiple candidate models are being considered. We describe automatic, black-box inference in such models using Stan, a popular probabilistic programming language. The utilisation of systems like Stan can facilitate model prototy** and testing, thus accelerating the data modelling process. The code described in this chapter can be found at https://github.com/dvav/eQTLBookChapter. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: 25 pages, 3 figures; to appear as a book chapter in "eQTL Analysis: Methods and Protocols", a volume for the series "Methods in Molecular Biology" published by Springer

arXiv:1405.0723 [pdf, other]

doi 10.1186/s13059-015-0604-6

DGEclust: differential expression analysis of clustered count data

Authors: Dimitrios V Vavoulis, Margherita Francescatto, Peter Heutink, Julian Gough

Abstract: Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported… ▽ More Most published studies on the statistical analysis of count data generated by next-generation sequencing technologies have paid surprisingly little attention on cluster analysis. We present a statistical methodology (DGEclust) for clustering digital expression data, which (contrary to alternative methods) simultaneously addresses the problem of model selection (i.e. how many clusters are supported by the data) and uncertainty in parameter estimation. We show how this methodology can be utilised in differential expression analysis and we demonstrate its applicability on a more general class of problems and higher accuracy, when compared to popular alternatives. DGEclust is freely available at https://bitbucket.org/DimitrisVavoulis/dgeclust △ Less

Submitted 4 May, 2014; originally announced May 2014.

Comments: 26 pages, 7 figures

Journal ref: Genome Biology 2015, 16:39

arXiv:1301.4144 [pdf, other]

doi 10.4172/jcsb.1000131

Non-parametric Bayesian modelling of digital gene expression data

Authors: Dimitrios V. Vavoulis, Julian Gough

Abstract: Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or… ▽ More Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues. △ Less

Submitted 17 January, 2013; originally announced January 2013.

Journal ref: J Comput Sci Syst Biol 7:001-009 (2013)

arXiv:1106.4317 [pdf, ps, other]

doi 10.1371/journal.pcbi.1002401

A self-organizing state-space-model approach for parameter estimation in Hodgkin-Huxley-type models of single neurons

Authors: Dimitrios V. Vavoulis, Volko A. Straub, John A. D. Aston, Jianfeng Feng

Abstract: Traditionally, parameter estimation in biophysical neuron and neural network models usually adopts a global search algorithm, often combined with a local search method in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conduc… ▽ More Traditionally, parameter estimation in biophysical neuron and neural network models usually adopts a global search algorithm, often combined with a local search method in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conductance-based models of single neurons from a different perspective. By adopting a hidden-dynamical-systems formalism, we expressed parameter estimation as an inference problem in these systems, which can then be tackled using well-established statistical inference methods. The particular method we used was Kitagawa's self-organizing state-space model, which was applied on a number of Hodgkin-Huxley models using simulated or actual electrophysiological data. We showed that the algorithm can be used to estimate a large number of parameters, including maximal conductances, reversal potentials, kinetics of ionic currents and measurement noise, based on low-dimensional experimental data and sufficiently informative priors in the form of pre-defined constraints imposed on model parameters. The algorithm remained operational even when very noisy experimental data were used. Importantly, by combining the self-organizing state-space model with an adaptive sampling algorithm akin to the Covariance Matrix Adaptation Evolution Strategy we achieved a significant reduction in the variance of parameter estimates. The algorithm did not require the explicit formulation of a cost function and it was straightforward to apply on compartmental models and multiple data sets. Overall, the proposed methodology is particularly suitable for resolving high-dimensional inference problems based on noisy electrophysiological data and, therefore, a potentially useful tool in the construction of biophysical neuron models. △ Less

Submitted 29 October, 2011; v1 submitted 21 June, 2011; originally announced June 2011.

Journal ref: Vavoulis DV, Straub VA, Aston JAD, Feng J (2012) A Self-Organizing State-Space-Model Approach for Parameter Estimation in Hodgkin-Huxley-Type Models of Single Neurons. PLoS Comput Biol 8(3): e1002401

Showing 1–5 of 5 results for author: Vavoulis, D