Skip to main content

Showing 1–3 of 3 results for author: Treangen, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.09381  [pdf, other

    cs.LG

    GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly

    Authors: Ali Azizpour, Advait Balaji, Todd J. Treangen, Santiago Segarra

    Abstract: Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  2. arXiv:1910.04358  [pdf, other

    q-bio.GN cs.IR

    Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)

    Authors: Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, R. A. Leo Elworth, Tharun Medini, Todd Treangen, Anshumali Shrivastava

    Abstract: DNA sequencing, especially of microbial genomes and metagenomes, has been at the core of recent research advances in large-scale comparative genomics. The data deluge has resulted in exponential growth in genomic datasets over the past years and has shown no sign of slowing down. Several recent attempts have been made to tame the computational burden of sequence search on these terabyte and petaby… ▽ More

    Submitted 30 April, 2022; v1 submitted 10 October, 2019; originally announced October 2019.

    Comments: 9 pages

  3. arXiv:1910.02611  [pdf, other

    cs.DS cs.IR

    RAMBO: Repeated And Merged BloOm Filter for Ultra-fast Multiple Set Membership Testing (MSMT) on Large-Scale Data

    Authors: Gaurav Gupta, Minghao Yan, Benjamin Coleman, R. A. Leo Elworth, Tharun Medini, Todd Treangen, Anshumali Shrivastava

    Abstract: Multiple Set Membership Testing (MSMT) is a well-known problem in a variety of search and query applications. Given a dataset of K different sets and a query q, it aims to find all of the sets containing the query. Trivially, an MSMT instance can be reduced to K membership testing instances, each with the same q, leading to O(K) query time with a simple array of Bloom Filters. We propose a data-st… ▽ More

    Submitted 17 July, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: 14 pages, 5 figures