Showing 1–2 of 2 results for author: Chen, K C

Search v0.5.6 released 2020-02-24

arXiv:1506.01744 [pdf, other]

stat.ML cs.LG math.ST q-bio.GN

Spectral Learning of Large Structured HMMs for Comparative Epigenomics

Authors: Chicheng Zhang, Jimin Song, Kevin C Chen, Kamalika Chaudhuri

Abstract: We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types. A natural model for chromatin data in one cell type is a Hidden Markov Model (HMM); we model the relationship between multiple cell types by connecting their hidden states by a fixed tree of known structure. The main cha… ▽ More We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types. A natural model for chromatin data in one cell type is a Hidden Markov Model (HMM); we model the relationship between multiple cell types by connecting their hidden states by a fixed tree of known structure. The main challenge with learning parameters of such models is that iterative methods such as EM are very slow, while naive spectral methods result in time and space complexity exponential in the number of cell types. We exploit properties of the tree structure of the hidden states to provide spectral algorithms that are more computationally efficient for current biological datasets. We provide sample complexity bounds for our algorithm and evaluate it experimentally on biological data from nine human cell types. Finally, we show that beyond our specific model, some of our algorithmic ideas can be applied to other graphical models. △ Less

Submitted 4 June, 2015; originally announced June 2015.

Comments: 27 pages, 3 figures
arXiv:1111.1426 [pdf, ps, other]

q-bio.GN cs.CE

SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding

Authors: Rajat S. Roy, Kevin C. Chen, Anirvan M. Sengupta, Alexander Schliep

Abstract: Scaffolding is an important subproblem in "de novo" genome assembly in which mate pair data are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ, a set of simple linear inequalities derived from the geometry of contigs on the line that can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a c… ▽ More Scaffolding is an important subproblem in "de novo" genome assembly in which mate pair data are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ, a set of simple linear inequalities derived from the geometry of contigs on the line that can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate pairs and can be used as a preprocessing step for any scaffolding algorithm. We tested the SLIQ inequalities on five real data sets ranging in complexity from simple bacterial genomes to complex mammalian genomes and compared the results to the majority voting procedure used by many other scaffolding algorithms. SLIQ predicted the relative positions and orientations of the contigs with high accuracy in all cases and gave more accurate position predictions than majority voting for complex genomes, in particular the human genome. Finally, we present a simple scaffolding algorithm that produces linear scaffolds given a contig digraph. We show that our algorithm is very efficient compared to other scaffolding algorithms while maintaining high accuracy in predicting both contig positions and orientations for real data sets. △ Less

Submitted 9 November, 2011; v1 submitted 6 November, 2011; originally announced November 2011.

Comments: 16 pages, 6 figures, 7 tables

Search v0.5.6 released 2020-02-24