Skip to main content

Showing 1–24 of 24 results for author: Firtina, C

.
  1. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  2. arXiv:2311.12527  [pdf, other

    cs.AR q-bio.GN q-bio.QM

    MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Ma, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  3. arXiv:2311.02029  [pdf

    q-bio.GN cs.AR q-bio.QM

    MetaTrinity: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation

    Authors: Arvid E. Gollwitzer, Mohammed Alser, Joel Bergtholdt, Joel Lindegger, Maximilian-David Rumpf, Can Firtina, Serghei Mangul, Onur Mutlu

    Abstract: Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to dete… ▽ More

    Submitted 16 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

  4. arXiv:2310.16908  [pdf

    q-bio.GN cs.AR q-bio.QM

    SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

    Authors: Maximilian-David Rumpf, Mohammed Alser, Arvid E. Gollwitzer, Joel Lindegger, Nour Almadhoun, Can Firtina, Serghei Mangul, Onur Mutlu

    Abstract: Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, optimized exact and heuristic algorithms are still being develop… ▽ More

    Submitted 21 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  5. arXiv:2310.05037  [pdf, other

    q-bio.GN q-bio.QM

    RawAlign: Accurate, Fast, and Scalable Raw Nanopore Signal Map** via Combining Seeding and Alignment

    Authors: Joël Lindegger, Can Firtina, Nika Mansouri Ghiasi, Mohammad Sadrosadati, Mohammed Alser, Onur Mutlu

    Abstract: Nanopore-based sequencers generate a series of raw electrical signal measurements that represent the contents of a biological sequence molecule passing through the sequencer's nanopore. If the raw signal is analyzed in real-time, an irrelevant molecule can be ejected from the nanopore before it is completely sequenced, reducing sequencing time. To meet the low-latency and high-throughput requireme… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  6. arXiv:2310.04366  [pdf, other

    cs.AR cs.ET q-bio.GN

    Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors

    Authors: Taha Shahroodi, Gagandeep Singh, Mahdi Zahedi, Haiyu Mao, Joel Lindegger, Can Firtina, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: Basecalling, an essential step in many genome analysis studies, relies on large Deep Neural Networks (DNNs) to achieve high accuracy. Unfortunately, these DNNs are computationally slow and inefficient, leading to considerable delays and resource constraints in the sequence analysis process. A Computation-In-Memory (CIM) architecture using memristors can significantly accelerate the performance of… ▽ More

    Submitted 26 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

  7. arXiv:2309.05771  [pdf, other

    q-bio.GN q-bio.QM

    RawHash2: Map** Raw Nanopore Signals Using Hash-Based Seeding and Adaptive Quantization

    Authors: Can Firtina, Melina Soysal, Joël Lindegger, Onur Mutlu

    Abstract: Summary: Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis. Real-time analysis of raw signals is essential to utilize the unique features that nanopore sequencing provides, enabling the early stop** of the sequencing of a read or the entire sequencing run based on the analysis. The state-of-the-art mechanism, RawHash, offers the first hash… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

  8. arXiv:2305.00492  [pdf, ps, other

    cs.AR q-bio.GN

    Accelerating Genome Analysis via Algorithm-Architecture Co-Design

    Authors: Onur Mutlu, Can Firtina

    Abstract: High-throughput sequencing (HTS) technologies have revolutionized the field of genomics, enabling rapid and cost-effective genome analysis for various applications. However, the increasing volume of genomic data generated by HTS technologies presents significant challenges for computational techniques to effectively analyze genomes. To address these challenges, several algorithm-architecture co-de… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

    Comments: To appear as an invited special session paper at DAC 2023

  9. RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes

    Authors: Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, Onur Mutlu

    Abstract: Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the se… ▽ More

    Submitted 1 June, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

    Comments: To appear in proceedings of ISMB/ECCB 2023

  10. arXiv:2212.04953  [pdf, other

    q-bio.GN cs.AI cs.LG

    TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

    Authors: Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu

    Abstract: Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for… ▽ More

    Submitted 14 September, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

  11. Utopia: Fast and Efficient Address Translation via Hybrid Restrictive & Flexible Virtual-to-Physical Address Map**s

    Authors: Konstantinos Kanellopoulos, Rahul Bera, Kosta Stojiljkovic, Nisa Bostanci, Can Firtina, Rachata Ausavarungnirun, Rakesh Kumar, Nastaran Ha**azar, Mohammad Sadrosadati, Nandita Vijaykumar, Onur Mutlu

    Abstract: Conventional virtual memory (VM) frameworks enable a virtual address to flexibly map to any physical address. This flexibility necessitates large data structures to store virtual-to-physical map**s, which leads to high address translation latency and large translation-induced interference in the memory hierarchy. On the other hand, restricting the address map** so that a virtual address can on… ▽ More

    Submitted 6 October, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: To appear in 56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023

    ACM Class: C.0

  12. arXiv:2211.03079  [pdf, other

    cs.AR cs.DC q-bio.GN

    RUBICON: A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers

    Authors: Gagandeep Singh, Mohammed Alser, Kristof Denolf, Can Firtina, Alireza Khodamoradi, Meryem Banu Cavlak, Henk Corporaal, Onur Mutlu

    Abstract: Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The accuracy and speed of basecalling have critical implications for all later steps in genome analysis. Many researchers adopt complex deep learning-based models to perform basecalling without considering the compute demands… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 November, 2022; originally announced November 2022.

  13. arXiv:2209.08600  [pdf, other

    cs.AR cs.DS q-bio.GN

    GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Map**

    Authors: Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu

    Abstract: Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The secon… ▽ More

    Submitted 17 December, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

    Comments: 17 pages, 13 figures

  14. arXiv:2207.09765  [pdf, other

    cs.AR cs.AI cs.LG q-bio.GN q-bio.QM

    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

    Authors: Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie Kim, Taha Shahroodi, Meryem Banu Cavlak, Joel Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, Onur Mutlu

    Abstract: Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highl… ▽ More

    Submitted 21 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ACM TACO

  15. arXiv:2206.01932  [pdf, other

    cs.AR q-bio.GN

    Demeter: A Fast and Energy-Efficient Food Profiler using Hyperdimensional Computing in Memory

    Authors: Taha Shahroodi, Mahdi Zahedi, Can Firtina, Mohammed Alser, Stephan Wong, Onur Mutlu, Said Hamdioui

    Abstract: Food profiling is an essential step in any food monitoring system needed to prevent health risks and potential frauds in the food industry. Significant improvements in sequencing technologies are pushing food profiling to become the main computational bottleneck. State-of-the-art profilers are unfortunately too costly for food profiling. Our goal is to design a food profiler that solves the main… ▽ More

    Submitted 24 August, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

  16. arXiv:2205.07957  [pdf

    q-bio.GN cs.AR q-bio.QM

    Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis

    Authors: Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, Onur Mutlu

    Abstract: We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologie… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2008.00961

  17. SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Map**

    Authors: Damla Senol Cali, Konstantinos Kanellopoulos, Joel Lindegger, Zülal Bingöl, Gurpreet S. Kalsi, Ziyi Zuo, Can Firtina, Meryem Banu Cavlak, Jeremie Kim, Nika Mansouri Ghiasi, Gagandeep Singh, Juan Gómez-Luna, Nour Almadhoun Alserr, Mohammed Alser, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

    Abstract: A critical step of genome sequence analysis is the map** of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-to-sequence map**). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in… ▽ More

    Submitted 31 May, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: To appear in ISCA'22

  18. arXiv:2203.16261  [pdf

    q-bio.GN cs.DC cs.SE stat.AP

    Packaging, containerization, and virtualization of computational omics methods: Advances, challenges, and opportunities

    Authors: Mohammed Alser, Sharon Waymost, Ram Ayyala, Brendan Lawlor, Richard J. Abdill, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, Andre M. Ribeiro-dos-Santos, Can Firtina, Nour Almadhoun, Varuni Sarwal, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do, Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul

    Abstract: Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging, virtualization, and containerization are different approaches to satisfy this need by wrap** omics tools in additional softwa… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  19. arXiv:2202.10400  [pdf, other

    cs.AR cs.DC cs.OS q-bio.GN

    GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

    Authors: Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, Onur Mutlu

    Abstract: Read map** is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select th… ▽ More

    Submitted 6 April, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: Published at ASPLOS 2022

  20. FastRemap: A Tool for Quickly Remap** Reads between Genome Assemblies

    Authors: Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Can Alkan, Onur Mutlu

    Abstract: A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool. With the explosion of available genomic data sets and references, high-performance remap** tools will be even more important for kee** up with the computation… ▽ More

    Submitted 4 September, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: FastRemap is open source and all scripts needed to replicate the results in this paper can be found at https://github.com/CMU-SAFARI/FastRemap

    Journal ref: Bioinformatics, Sep 30; 38(19):4633-4635, 2022

  21. BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis

    Authors: Can Firtina, Jisung Park, Mohammed Alser, Jeremie S. Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, Onur Mutlu

    Abstract: Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only e… ▽ More

    Submitted 23 May, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published in NARGAB

    Journal ref: NAR Genomics and Bioinformatics, vol. 5, no. 1, p. lqad004, Mar. 2023

  22. arXiv:2009.07692  [pdf, other

    cs.AR q-bio.GN

    GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

    Authors: Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

    Abstract: Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major co… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: To appear in MICRO 2020

  23. arXiv:1912.08735  [pdf, other

    q-bio.GN cs.CE

    AirLift: A Fast and Comprehensive Technique for Remap** Alignments between Reference Genomes

    Authors: Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Mohammed Alser, Nastaran Ha**azar, Can Alkan, Onur Mutlu

    Abstract: As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read map** and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should… ▽ More

    Submitted 21 November, 2022; v1 submitted 18 December, 2019; originally announced December 2019.

  24. Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

    Authors: Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, Onur Mutlu

    Abstract: Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further used in downstream genome analysis. Unfortunately, long reads have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis.… ▽ More

    Submitted 7 March, 2020; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: 9 pages, 1 figure. Accepted in Bioinformatics

    Journal ref: Bioinformatics . 2020 Jun 1;36(12):3669-3679