Skip to main content

Showing 1–12 of 12 results for author: Kingsford, C

.
  1. arXiv:2311.03592  [pdf, other

    cs.DS q-bio.GN

    Sketching methods with small window guarantee using minimum decycling sets

    Authors: Guillaume Marçais, Dan DeBlasio, Carl Kingsford

    Abstract: Most sequence sketching methods work by selecting specific $k$-mers from sequences so that the similarity between two sequences can be estimated using only the sketches. Estimating sequence similarity is much faster using sketches than using sequence alignment, hence sketching methods are used to reduce the computational requirements of computational biology software packages. Applications using s… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Code available at https://github.com/Kingsford-Group/mdsscope

  2. arXiv:2305.10577  [pdf, other

    cs.DM q-bio.GN

    Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

    Authors: Yutong Qiu, Yihang Shen, Carl Kingsford

    Abstract: The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al.~(2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone p… ▽ More

    Submitted 8 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  3. arXiv:2109.09264  [pdf, other

    cs.LG stat.ML

    Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection

    Authors: Yihang Shen, Carl Kingsford

    Abstract: Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, develo** effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of e… ▽ More

    Submitted 12 February, 2024; v1 submitted 19 September, 2021; originally announced September 2021.

    Comments: This work has already been accepted in AutoML 2023

  4. arXiv:2001.06550  [pdf, other

    cs.DS q-bio.QM

    Lower density selection schemes via small universal hitting sets with short remaining path length

    Authors: Hongyu Zheng, Carl Kingsford, Guillaume Marçais

    Abstract: Universal hitting sets are sets of words that are unavoidable: every long enough sequence is hit by the set (i.e., it contains a word from the set). There is a tight relationship between universal hitting sets and minimizers schemes, where minimizers schemes with low density (i.e., efficient schemes) correspond to universal hitting sets of small size. Local schemes are a generalization of minimize… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: 16+7 pages. Accepted to RECOMB 2020

  5. arXiv:1908.02894  [pdf, other

    cs.LG stat.ML

    How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design

    Authors: Maria-Florina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, Tuomas Sandholm, Ellen Vitercik

    Abstract: Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case in… ▽ More

    Submitted 25 April, 2021; v1 submitted 7 August, 2019; originally announced August 2019.

  6. arXiv:1604.03132  [pdf

    q-bio.GN cs.DS

    Efficient Index Maintenance Under Dynamic Genome Modification

    Authors: Nitish Gupta, Komal Sanjeev, Tim Wall, Carl Kingsford, Rob Patro

    Abstract: Efficient text indexing data structures have enabled large-scale genomic sequence analysis and are used to help solve problems ranging from assembly to read map**. However, these data structures typically assume that the underlying reference text is static and will not change over the course of the queries being made. Some progress has been made in exploring how certain text indices, like the su… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

    Comments: paper accepted at the RECOMB-Seq 2016

  7. Optimal Seed Solver: Optimizing Seed Selection in Read Map**

    Authors: Hongyi Xin, Richard Zhu, Sunny Nahar, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan, Onur Mutlu

    Abstract: Motivation: Optimizing seed selection is an important problem in read map**. The number of non-overlap** seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both o… ▽ More

    Submitted 26 June, 2015; originally announced June 2015.

    Comments: 10 pages of main text. 6 pages of supplementary materials. Under review by Oxford Bioinformatics

    Journal ref: Bioinformatics, Jun 1;32(11):1632-42, 2016

  8. arXiv:1308.3700  [pdf, other

    q-bio.GN cs.CE

    Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms

    Authors: Rob Patro, Stephen M. Mount, Carl Kingsford

    Abstract: RNA-seq has rapidly become the de facto technique to measure gene expression. However, the time required for analysis has not kept up with the pace of data generation. Here we introduce Sailfish, a novel computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Sailfish entirely avoids map** reads, which is a time-consuming step in all current met… ▽ More

    Submitted 16 August, 2013; originally announced August 2013.

    Comments: 28 pages, 2 main figures, 2 algorithm displays, 5 supplementary figures and 2 supplementary notes. Accompanying software available at http://www.cs.cmu.edu/~ckingsf/software/sailfish

  9. arXiv:1307.7862  [pdf, other

    q-bio.QM q-bio.GN

    Multiscale Identification of Topological Domains in Chromatin

    Authors: Darya Filippova, Rob Patro, Geet Duggal, Carl Kingsford

    Abstract: Recent chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales… ▽ More

    Submitted 30 July, 2013; originally announced July 2013.

    Comments: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)

  10. Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

    Authors: Saket Navlakha, Carl Kingsford

    Abstract: Often questions arise about old or extinct networks. What proteins interacted in a long-extinct ancestor species of yeast? Who were the central players in the Last.fm social network 3 years ago? Our ability to answer such questions has been limited by the unavailability of past versions of networks. To overcome these limitations, we propose several algorithms for reconstructing a network's history… ▽ More

    Submitted 30 August, 2010; originally announced August 2010.

    Comments: 16 pages, 10 figures

    ACM Class: G.2.2; G.3; H.2.8

  11. arXiv:0905.1064  [pdf, other

    math.CO

    Vertices of degree k in edge-minimal, k-edge-connected graphs

    Authors: Carl Kingsford, Guillaume Marçais

    Abstract: Halin showed that every edge minimal, k-vertex connected graph has a vertex of degree k. In this note, we prove the analogue to Halin's theorem for edge-minimal, k-edge-connected graphs. We show there are two vertices of degree k in every edge-minimal, k-edge-connected graph.

    Submitted 7 May, 2009; originally announced May 2009.

    Comments: 3 pages

    MSC Class: 05C40

  12. arXiv:0905.1053  [pdf, other

    math.CO

    A synthesis for exactly 3-edge-connected graphs

    Authors: Carl Kingsford, Guillaume Marçais

    Abstract: A multigraph is exactly k-edge-connected if there are exactly k edge-disjoint paths between any pair of vertices. We characterize the class of exactly 3-edge-connected graphs, giving a synthesis involving two operations by which every exactly 3-edge-connected multigraph can be generated. Slightly modified syntheses give the planar exactly 3-edge-connected graphs and the exactly 3-edge-connected… ▽ More

    Submitted 7 May, 2009; originally announced May 2009.

    Comments: 15 pages, 4 figures Submitted to FOCS 2009

    MSC Class: 05C40