-
Reply to "Issues arising from benchmarking single-cell RNA sequencing imputation methods"
Authors:
Mo Huang,
Nancy R. Zhang
Abstract:
In our Brief Communication (DOI: 10.1038/s41592-018-0033-z), we presented the method SAVER for recovering true gene expression levels in noisy single cell RNA sequencing data. We evaluated the performance of SAVER, along with comparable methods MAGIC and scImpute, in an RNA FISH validation experiment and a data downsampling experiment. In a Comment [arXiv:1908.07084v1], Li & Li were concerned with…
▽ More
In our Brief Communication (DOI: 10.1038/s41592-018-0033-z), we presented the method SAVER for recovering true gene expression levels in noisy single cell RNA sequencing data. We evaluated the performance of SAVER, along with comparable methods MAGIC and scImpute, in an RNA FISH validation experiment and a data downsampling experiment. In a Comment [arXiv:1908.07084v1], Li & Li were concerned with the use of the downsampled datasets, specifically focusing on clustering results obtained from the Zeisel et al. data. Here, we will address these comments and, furthermore, amend the data downsampling experiment to demonstrate that the findings from the data downsampling experiment in our Brief Communication are valid.
△ Less
Submitted 5 September, 2019;
originally announced September 2019.
-
Scanning a Poisson Random Field for Local Signals
Authors:
Nancy R. Zhang,
Benjamin Yakir,
Charlie L. Xia,
David Siegmund
Abstract:
The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with…
▽ More
The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with over-dispersion. As a specific, detailed example, we consider the detection of insertions and deletions by paired-end DNA-sequencing. We propose several statistics for this problem, compare their power under current experimental designs, and illustrate their application on an Illumina Platinum Genomes data set.
△ Less
Submitted 12 June, 2014;
originally announced June 2014.
-
Importance Sampling of Word Patterns in DNA and Protein Sequences
Authors:
Hock Peng Chan,
Nancy R. Zhang,
Louis H. Y. Chen
Abstract:
Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associ…
▽ More
Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: Palindromes and inverted repeats, patterns arising from position specific weight matrices and co-occurrences of pairs of motifs.
△ Less
Submitted 26 November, 2008;
originally announced November 2008.