-
GEDI: Scalable Algorithms for Genotype Error Detection and Imputation
Authors:
Justin Kennedy,
Ion I. Mandoiu,
Bogdan Pasaniuc
Abstract:
Genome-wide association studies generate very large datasets that require scalable analysis algorithms. In this report we describe the GEDI software package, which implements efficient algorithms for performing several common tasks in the analysis of population genotype data, including genotype error detection and correction, imputation of both randomly missing and untyped genotypes, and genotyp…
▽ More
Genome-wide association studies generate very large datasets that require scalable analysis algorithms. In this report we describe the GEDI software package, which implements efficient algorithms for performing several common tasks in the analysis of population genotype data, including genotype error detection and correction, imputation of both randomly missing and untyped genotypes, and genotype phasing. Experimental results show that GEDI achieves high accuracy with a runtime scaling linearly with the number of markers and samples. The open source C++ code of GEDI, released under the GNU General Public License, is available for download at http://dna.engr.uconn.edu/software/GEDI/
△ Less
Submitted 9 November, 2009;
originally announced November 2009.
-
High-Throughput SNP Genoty** by SBE/SBH
Authors:
Ion I. Mandoiu,
Claudia Prajescu
Abstract:
Despite much progress over the past decade, current Single Nucleotide Polymorphism (SNP) genoty** technologies still offer an insufficient degree of multiplexing when required to handle user-selected sets of SNPs. In this paper we propose a new genoty** assay architecture combining multiplexed solution-phase single-base extension (SBE) reactions with sequencing by hybridization (SBH) using u…
▽ More
Despite much progress over the past decade, current Single Nucleotide Polymorphism (SNP) genoty** technologies still offer an insufficient degree of multiplexing when required to handle user-selected sets of SNPs. In this paper we propose a new genoty** assay architecture combining multiplexed solution-phase single-base extension (SBE) reactions with sequencing by hybridization (SBH) using universal DNA arrays such as all $k$-mer arrays. In addition to PCR amplification of genomic DNA, SNP genoty** using SBE/SBH assays involves the following steps: (1) Synthesizing primers complementing the genomic sequence immediately preceding SNPs of interest; (2) Hybridizing these primers with the genomic DNA; (3) Extending each primer by a single base using polymerase enzyme and dideoxynucleotides labeled with 4 different fluorescent dyes; and finally (4) Hybridizing extended primers to a universal DNA array and determining the identity of the bases that extend each primer by hybridization pattern analysis. Our contributions include a study of multiplexing algorithms for SBE/SBH genoty** assays and preliminary experimental results showing the achievable tradeoffs between the number of array probes and primer length on one hand and the number of SNPs that can be assayed simultaneously on the other. Simulation results on datasets both randomly generated and extracted from the NCBI dbSNP database suggest that the SBE/SBH architecture provides a flexible and cost-effective alternative to genoty** assays currently used in the industry, enabling genoty** of up to hundreds of thousands of user-specified SNPs per assay.
△ Less
Submitted 14 December, 2005;
originally announced December 2005.
-
Multicommodity Flow Algorithms for Buffered Global Routing
Authors:
Christoph Albrecht,
Andrew B. Kahng,
Ion I. Mandoiu,
Alexander Zelikovsky
Abstract:
In this paper we describe a new algorithm for buffered global routing according to a prescribed buffer site map. Specifically, we describe a provably good multi-commodity flow based algorithm that finds a global routing minimizing buffer and wire congestion subject to given constraints on routing area (wirelength and number of buffers) and sink delays. Our algorithm allows computing the tradeoff…
▽ More
In this paper we describe a new algorithm for buffered global routing according to a prescribed buffer site map. Specifically, we describe a provably good multi-commodity flow based algorithm that finds a global routing minimizing buffer and wire congestion subject to given constraints on routing area (wirelength and number of buffers) and sink delays. Our algorithm allows computing the tradeoff curve between routing area and wire/buffer congestion under any combination of delay and capacity constraints, and simultaneously performs buffer/wire sizing, as well as layer and pin assignment. Experimental results show that near-optimal results are obtained with a practical runtime.
△ Less
Submitted 6 August, 2005;
originally announced August 2005.
-
Exact and Approximation Algorithms for DNA Tag Set Design
Authors:
Ion I. Mandoiu,
Dragos Trinca
Abstract:
In this paper we propose new solution methods for designing tag sets for use in universal DNA arrays. First, we give integer linear programming formulations for two previous formalizations of the tag set design problem, and show that these formulations can be solved to optimality for instance sizes of practical interest by using general purpose optimization packages. Second, we note the benefits…
▽ More
In this paper we propose new solution methods for designing tag sets for use in universal DNA arrays. First, we give integer linear programming formulations for two previous formalizations of the tag set design problem, and show that these formulations can be solved to optimality for instance sizes of practical interest by using general purpose optimization packages. Second, we note the benefits of periodic tags, and establish an interesting connection between the tag design problem and the problem of packing the maximum number of vertex-disjoint directed cycles in a given graph. We show that combining a simple greedy cycle packing algorithm with a previously proposed alphabetic tree search strategy yields an increase of over 40% in the number of tags compared to previous methods.
△ Less
Submitted 22 March, 2005;
originally announced March 2005.
-
Highly Scalable Algorithms for Robust String Barcoding
Authors:
Bhaskar DasGupta,
Kishori M. Konwar,
Ion I. Mandoiu,
Alex A. Shvartsman
Abstract:
String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized…
▽ More
String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem.
△ Less
Submitted 14 February, 2005;
originally announced February 2005.
-
Improved Tag Set Design and Multiplexing Algorithms for Universal Arrays
Authors:
Ion I. Mandoiu,
Claudia Prajescu,
Dragos Trinca
Abstract:
In this paper we address two optimization problems arising in the design of genomic assays based on universal tag arrays. First, we address the universal array tag set design problem. For this problem, we extend previous formulations to incorporate antitag-to-antitag hybridization constraints in addition to constraints on antitag-to-tag hybridization specificity, establish a constructive upper b…
▽ More
In this paper we address two optimization problems arising in the design of genomic assays based on universal tag arrays. First, we address the universal array tag set design problem. For this problem, we extend previous formulations to incorporate antitag-to-antitag hybridization constraints in addition to constraints on antitag-to-tag hybridization specificity, establish a constructive upper bound on the maximum number of tags satisfying the extended constraints, and propose a simple greedy tag selection algorithm. Second, we give methods for improving the multiplexing rate in large-scale genomic assays by combining primer selection with tag assignment. Experimental results on simulated data show that this integrated optimization leads to reductions of up to 50% in the number of required arrays.
△ Less
Submitted 10 February, 2005;
originally announced February 2005.
-
Approximation Algorithms for Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints
Authors:
K. Konwar,
I. Mandoiu,
A. Russell,
A. Shvartsman
Abstract:
A critical problem in the emerging high-throughput genoty** protocols is to minimize the number of polymerase chain reaction (PCR) primers required to amplify the single nucleotide polymorphism loci of interest. In this paper we study PCR primer set selection with amplification length and uniqueness constraints from both theoretical and practical perspectives. We give a greedy algorithm that a…
▽ More
A critical problem in the emerging high-throughput genoty** protocols is to minimize the number of polymerase chain reaction (PCR) primers required to amplify the single nucleotide polymorphism loci of interest. In this paper we study PCR primer set selection with amplification length and uniqueness constraints from both theoretical and practical perspectives. We give a greedy algorithm that achieves a logarithmic approximation factor for the problem of minimizing the number of primers subject to a given upperbound on the length of PCR amplification products. We also give, using randomized rounding, the first non-trivial approximation algorithm for a version of the problem that requires unique amplification of each amplification target. Empirical results on randomly generated testcases as well as testcases extracted from the from the National Center for Biotechnology Information's genomic databases show that our algorithms are highly scalable and produce better results compared to previous heuristics.
△ Less
Submitted 27 July, 2004; v1 submitted 28 June, 2004;
originally announced June 2004.