-
Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant Detection
Authors:
Luke R Harries,
Suyi Zhang,
Geoffroy Dubourg-Felonneau,
James H R Farmery,
Jonathan Sinai,
Belle Taylor,
Nirmesh Patel,
John W Cassidy,
John Shawe-Taylor,
Harry W Clifford
Abstract:
DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise in improving on these variant calls, but the input for these is still a standardized "pile-up" image, which is not always best suited. In this paper,…
▽ More
DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise in improving on these variant calls, but the input for these is still a standardized "pile-up" image, which is not always best suited. In this paper, we present a novel method for generating images from DNA sequencing data, which interlaces the human reference genome with personalized sequencing output, to maximize usage of sequencing reads and improve machine learning algorithm performance. We demonstrate the success of this in improving standard germline variant calling. We also furthered this approach to include somatic variant calling across tumor/normal data with Siamese networks. These approaches can be used in machine learning applications on sequencing data with the hope of improving clinical outcomes, and are freely available for noncommercial use at www.ccg.ai.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Stochasticity and the limits to confidence when estimating R_0 of Ebola and other emerging infectious diseases
Authors:
Bradford P Taylor,
Jonathan Dushoff,
Joshua S Weitz
Abstract:
Dynamic models - often deterministic in nature - were used to estimate the basic reproductive number, R_0, of the 2014-5 Ebola virus disease (EVD) epidemic outbreak in West Africa. Estimates of R_0 were then used to project the likelihood for large outbreak sizes, e.g., exceeding hundreds of thousands of cases. Yet fitting deterministic models can lead to over-confidence in the confidence interval…
▽ More
Dynamic models - often deterministic in nature - were used to estimate the basic reproductive number, R_0, of the 2014-5 Ebola virus disease (EVD) epidemic outbreak in West Africa. Estimates of R_0 were then used to project the likelihood for large outbreak sizes, e.g., exceeding hundreds of thousands of cases. Yet fitting deterministic models can lead to over-confidence in the confidence intervals of the fitted R_0, and, in turn, the type and scope of necessary interventions. In this manuscript we propose a hybrid stochastic-deterministic method to estimate R_0 and associated confidence intervals (CIs). The core idea is that stochastic realizations of an underlying deterministic model can be used to evaluate the compatibility of candidate values of R_0 with observed epidemic curves. The compatibility is based on comparing the distribution of expected epidemic growth rates with the observed epidemic growth rate given "process noise", i.e., arising due to stochastic transmission, recovery and death events. By applying our method to reported EVD case counts from Guinea, Liberia and Sierra Leone, we show that prior estimates of R_0 based on deterministic fits appear to be more confident than analysis of stochastic trajectories suggests should be possible. Moving forward, we recommend including a hybrid stochastic-deterministic fitting procedure when quantifying the full R_0 CI at the onset of an epidemic due to multiple sources of noise.
△ Less
Submitted 21 January, 2016;
originally announced January 2016.
-
The virus of my virus is my friend: ecological effects of virophage with alternative modes of coinfection
Authors:
Bradford P. Taylor,
Michael H. Cortez,
Joshua S. Weitz
Abstract:
Virophages are viruses that rely on the replication machinery of other viruses to reproduce within eukaryotic hosts. Two different modes of coinfection have been posited based on experimental observation. In one mode, the virophage and virus enter the host independently. In the other mode, the virophage adheres to the virus so both virophage and virus enter the host together. Here we ask: what are…
▽ More
Virophages are viruses that rely on the replication machinery of other viruses to reproduce within eukaryotic hosts. Two different modes of coinfection have been posited based on experimental observation. In one mode, the virophage and virus enter the host independently. In the other mode, the virophage adheres to the virus so both virophage and virus enter the host together. Here we ask: what are the ecological effects of these different modes of coinfection? In particular, what ecological effects are common to both infection modes, and what are the differences particular to each mode? We develop a pair of biophysically motivated ODE models of viral-host population dynamics, corresponding to dynamics arising from each mode of infection. We find both modes of coinfection allow for the coexistence of the virophage, virus, and host either at a stable fixed point or through cyclical dynamics. In both models, virophage tend to be the most abundant population and their presence always reduces the viral abundance and increases the host abundance. However, we do find qualitative differences between models. For example, via extensive sampling of biologically relevant parameter space, we only observe bistability when the virophage and virus enter the host together. We discuss how such differences may be leveraged to help identify modes of infection in natural environments from population level data.
△ Less
Submitted 12 May, 2014; v1 submitted 21 December, 2013;
originally announced December 2013.
-
Evaluation of Competing J domain:Hsp70 Complex Models in Light of Existing Mutational and NMR Data
Authors:
Rui Sousa,
Jianwen Jiang,
Eileen M. Lafer,
Andrew P. Hinck,
Li** Wang,
Alexander B. Taylor,
E. Guy Maes
Abstract:
Ahmad et al. recently presented an NMR-based model for a bacterial DnaJ J domain:DnaK(Hsp70):ADP complex(1) that differs significantly from the crystal structure of a disulfide linked mammalian auxilin J domain:Hsc70 complex that we previously published(2). They claimed that their model could better account for existing mutational data, was in better agreement with previous NMR studies, and that t…
▽ More
Ahmad et al. recently presented an NMR-based model for a bacterial DnaJ J domain:DnaK(Hsp70):ADP complex(1) that differs significantly from the crystal structure of a disulfide linked mammalian auxilin J domain:Hsc70 complex that we previously published(2). They claimed that their model could better account for existing mutational data, was in better agreement with previous NMR studies, and that the presence of a cross-link in our structure made it irrelevant to understanding J:Hsp70 interactions. Here we detail extensive NMR and mutational data relevant to understanding J:Hsp70 function and show that, in fact, our structure is much better able to account for the mutational data and is in much better agreement with a previous NMR study of a mammalian polyoma virus T-ag J domain:Hsc70 complex than is the Ahmad et al. complex, and that our structure is predictive and provides insight into J:Hsp70 interactions and mechanism of ATPase activation.
△ Less
Submitted 14 December, 2011;
originally announced December 2011.