-
Suppressing Background Radiation Using Poisson Principal Component Analysis
Authors:
P. Tandon,
P. Huggins,
A. Dubrawski,
S. Labov,
K. Nelson
Abstract:
Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source det…
▽ More
Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source detection score. We have shown the method's utility in a threat detection system using mobile spectrometers in urban scenes (Tandon et al 2012). While it is commonly assumed that measured photon counts follow a Poisson process, standard PCA makes a Gaussian assumption about the data distribution, which may be a poor approximation when photon counts are low. This paper studies whether and in what conditions PCA with a Poisson-based loss function (Poisson PCA) can outperform standard Gaussian PCA in modeling background radiation to enable more sensitive and specific nuclear threat detection.
△ Less
Submitted 26 May, 2016;
originally announced May 2016.
-
Canonical Autocorrelation Analysis
Authors:
Maria De-Arteaga,
Artur Dubrawski,
Peter Huggins
Abstract:
We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivar…
▽ More
We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivariate correlations within just one set of variables. This can be useful when we look for hidden parsimonious structures in data, each involving only a small subset of all features. In addition, the discovered correlations are highly interpretable as they are formed by pairs of sparse linear combinations of the original features. We show how CAA can be of use as a tool for anomaly detection when the expected structure of correlations is not followed by anomalous data. We illustrate the utility of CAA in two application domains where single-class and unsupervised learning of correlation structures are particularly relevant: breast cancer diagnosis and radiation threat detection. When applied to the Wisconsin Breast Cancer data, single-class CAA is competitive with supervised methods used in literature. On the radiation threat detection task, unsupervised CAA performs significantly better than an unsupervised alternative prevalent in the domain, while providing valuable additional insights for threat analysis.
△ Less
Submitted 19 November, 2015;
originally announced November 2015.
-
Layered Heaps Beating Standard and Fibonacci Heaps in Practice
Authors:
Peter Huggins
Abstract:
We consider the classic problem of designing heaps. Standard binary heaps run faster in practice than Fibonacci heaps but have worse time guarantees. Here we present a new type of heap, a layered heap, that runs faster in practice than both standard binary and Fibonacci heaps, but has asymptotic insert times better than that of binary heaps. Our heap is defined recursively and maximum run time spe…
▽ More
We consider the classic problem of designing heaps. Standard binary heaps run faster in practice than Fibonacci heaps but have worse time guarantees. Here we present a new type of heap, a layered heap, that runs faster in practice than both standard binary and Fibonacci heaps, but has asymptotic insert times better than that of binary heaps. Our heap is defined recursively and maximum run time speed up occurs when a recursion depth of 1 is used, i.e. a heap of heaps.
△ Less
Submitted 12 October, 2015;
originally announced October 2015.
-
Bayes estimators for phylogenetic reconstruction
Authors:
Peter Huggins,
Wenbin Li,
David Haws,
Thomas Friedrich,
**ze Liu,
Ruriko Yoshida
Abstract:
Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet most reconstruction methods like ML do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate which is closest on average to the samples. This ``median'' tree is known as the Bayes es…
▽ More
Tree reconstruction methods are often judged by their accuracy, measured by how close they get to the true tree. Yet most reconstruction methods like ML do not explicitly maximize this accuracy. To address this problem, we propose a Bayesian solution. Given tree samples, we propose finding the tree estimate which is closest on average to the samples. This ``median'' tree is known as the Bayes estimator (BE). The BE literally maximizes posterior expected accuracy, measured in terms of closeness (distance) to the true tree. We discuss a unified framework of BE trees, focusing especially on tree distances which are expressible as squared euclidean distances. Notable examples include Robinson--Foulds distance, quartet distance, and squared path difference. Using simulated data, we show Bayes estimators can be efficiently computed in practice by hill climbing. We also show that Bayes estimators achieve higher accuracy, compared to maximum likelihood and neighbor joining.
△ Less
Submitted 21 November, 2009; v1 submitted 3 November, 2009;
originally announced November 2009.