Search | arXiv e-print repository

Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

Authors: Yu-Ning Huang, Michael I. Love, Cynthia Flaire Ronkowski, Dhrithi Deshpande, Lynn M. Schriml, Annie Wong-Beringer, Barend Mons, Russell Corbett-Detig, Christopher I Hunter, Jason H. Moore, Lana X. Garmire, T. B. K. Reddy, Winston A. Hide, Atul J. Butte, Mark D. Robinson, Serghei Mangul

Abstract: Metadata, often termed "data about data," is crucial for organizing, understanding, and managing vast omics datasets. It aids in efficient data discovery, integration, and interpretation, enabling users to access, comprehend, and utilize data effectively. Its significance spans the domains of scientific research, facilitating data reproducibility, reusability, and secondary analysis. However, nume… ▽ More Metadata, often termed "data about data," is crucial for organizing, understanding, and managing vast omics datasets. It aids in efficient data discovery, integration, and interpretation, enabling users to access, comprehend, and utilize data effectively. Its significance spans the domains of scientific research, facilitating data reproducibility, reusability, and secondary analysis. However, numerous perceptual and technical barriers hinder the sharing of metadata among researchers. These barriers compromise the reliability of research results and hinder integrative meta-analyses of omics studies . This study highlights the key barriers to metadata sharing, including the lack of uniform standards, privacy and legal concerns, limitations in study design, limited incentives, inadequate infrastructure, and the dearth of well-trained personnel for metadata management and reuse. Proposed solutions include emphasizing the promotion of standardization, educational efforts, the role of journals and funding agencies, incentives and rewards, and the improvement of infrastructure. More accurate, reliable, and impactful research outcomes are achievable if the scientific community addresses these barriers, facilitating more accurate, reliable, and impactful research outcomes. △ Less

Submitted 22 November, 2023; originally announced January 2024.

arXiv:2311.02029 [pdf]

MetaTrinity: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation

Authors: Arvid E. Gollwitzer, Mohammed Alser, Joel Bergtholdt, Joel Lindegger, Maximilian-David Rumpf, Can Firtina, Serghei Mangul, Onur Mutlu

Abstract: Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to dete… ▽ More Metagenomics, the study of genome sequences of diverse organisms cohabiting in a shared environment, has experienced significant advancements across various medical and biological fields. Metagenomic analysis is crucial, for instance, in clinical applications such as infectious disease screening and the diagnosis and early detection of diseases such as cancer. A key task in metagenomics is to determine the species present in a sample and their relative abundances. Currently, the field is dominated by either alignment-based tools, which offer high accuracy but are computationally expensive, or alignment-free tools, which are fast but lack the needed accuracy for many applications. In response to this dichotomy, we introduce MetaTrinity, a tool based on heuristics, to achieve a fundamental improvement in accuracy-runtime tradeoff over existing methods. We benchmark MetaTrinity against two leading metagenomic classifiers, each representing different ends of the performance-accuracy spectrum. On one end, Kraken2, a tool optimized for performance, shows modest accuracy yet a rapid runtime. The other end of the spectrum is governed by Metalign, a tool optimized for accuracy. Our evaluations show that MetaTrinity achieves an accuracy comparable to Metalign while gaining a 4x speedup without any loss in accuracy. This directly equates to a fourfold improvement in runtime-accuracy tradeoff. Compared to Kraken2, MetaTrinity requires a 5x longer runtime yet delivers a 17x improvement in accuracy. This demonstrates a 3.4x enhancement in the accuracy-runtime tradeoff for MetaTrinity. This dual comparison positions MetaTrinity as a broadly applicable solution for metagenomic classification, combining advantages of both ends of the spectrum: speed and accuracy. MetaTrinity is publicly available at https://github.com/CMU-SAFARI/MetaTrinity. △ Less

Submitted 16 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.16908 [pdf]

SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

Authors: Maximilian-David Rumpf, Mohammed Alser, Arvid E. Gollwitzer, Joel Lindegger, Nour Almadhoun, Can Firtina, Serghei Mangul, Onur Mutlu

Abstract: Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, optimized exact and heuristic algorithms are still being develop… ▽ More Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses. Due to its high computational complexity, optimized exact and heuristic algorithms are still being developed. We find that these methods are highly sensitive to the underlying data, its quality, and various hyperparameters. Despite their wide use, no in-depth analysis has been performed, potentially falsely discarding genetic sequences from further analysis and unnecessarily inflating computational costs. We provide the first analysis and benchmark of this heterogeneity. We deliver an actionable overview of the 11 most widely used state-of-the-art methods for comparing genomic sequences. We also inform readers about their advantages and downsides using thorough experimental evaluation and different real datasets from all major manufacturers (i.e., Illumina, ONT, and PacBio). SequenceLab is publicly available at https://github.com/CMU-SAFARI/SequenceLab. △ Less

Submitted 21 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.16994 [pdf]

A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater

Authors: Viorel Munteanu, Victor Gordeev, Michael Saldana, Eva Aßmann, Justin Maine Su, Nicolae Drabcinski, Oksana Zlenko, Maryna Kit, Felicia Iordachi, Khooshbu Kantibhai Patel, Abdullah Al Nahid, Likhitha Chittampalli, Yidian Xu, Pavel Skums, Shelesh Agrawal, Martin Hölzer, Adam Smith, Alex Zelikovsky, Serghei Mangul

Abstract: In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allo… ▽ More In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allowing to continuously monitor the prevalence of viral lineages in communities of various size at a fraction of the time, cost, and logistic effort and serving as an early warning system for emerging variants, critical for developed communities and especially for underserved ones. Importantly, lineage prevalence estimates obtained with this approach aren't distorted by biases related to clinical testing accessibility and participation. However, the relative performance of bioinformatics methods used to measure relative lineage abundances from wastewater sequencing data is unknown, preventing both the research community and public health authorities from making informed decisions regarding computational tool selection. Here, we perform comprehensive benchmarking of 18 bioinformatics methods for estimating the relative abundance of SARS-CoV-2 (sub)lineages in wastewater by using data from 36 in vitro mixtures of synthetic lineage and sublineage genomes. In addition, we use simulated data from 78 mixtures of lineages and sublineages co-occurring in the clinical setting with proportions mirroring their prevalence ratios observed in real data. Importantly, we investigate how the accuracy of the evaluated methods is impacted by the sequencing technology used, the associated error rate, the read length, read depth, but also by the exposure of the synthetic RNA mixtures to wastewater, with the goal of capturing the effects induced by the wastewater matrix, including RNA fragmentation and degradation. △ Less

Submitted 21 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: For correspondence: [email protected]

arXiv:2309.13326 [pdf]

SARS-CoV-2 Wastewater Genomic Surveillance: Approaches, Challenges, and Opportunities

Authors: Viorel Munteanu, Michael Saldana, Dumitru Ciorba, Viorel Bostan, Justin Maine Su, Nadiia Kasianchuk, Nitesh Kumar Sharma, Sergey Knyazev, Victor Gordeev, Eva Aßmann, Andrei Lobiuc, Mihai Covasa, Keith A. Crandall, Wenhao O. Ouyang, Nicholas C. Wu, Christopher Mason, Braden T Tierney, Alexander G Lucaci, Alex Zelikovsky, Fatemeh Mohebbi, Pavel Skums, Cynthia Gibas, Jessica Schlueter, Piotr Rzymski, Helena Solo-Gabriele , et al. (3 additional authors not shown)

Abstract: During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly iden… ▽ More During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly identify and define novel lineages from wastewater (WW) alone. While WWGS has significant advantages in monitoring SARS-CoV-2 viral spread, technical challenges remain, including poor sequencing coverage and quality due to viral RNA degradation. As a result, the viral RNAs in wastewater have low concentrations and are often fragmented, making sequencing difficult. WWGS analysis requires advanced computational tools that are yet to be developed and benchmarked. The existing bioinformatics tools used to analyze wastewater sequencing data are often based on previously developed methods for quantifying the expression of transcripts or viral diversity. Those methods were not developed for wastewater sequencing data specifically, and are not optimized to address unique challenges associated with wastewater. While specialized tools for analysis of wastewater sequencing data have also been developed recently, it remains to be seen how they will perform given the ongoing evolution of SARS-CoV-2 and the decline in testing and patient-based genomic surveillance. Here, we discuss opportunities and challenges associated with WWGS, including sample preparation, sequencing technology, and bioinformatics methods. △ Less

Submitted 30 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: V Munteanu and M Saldana contributed equally to this work. M Hölzer, A Smith and S Mangul jointly supervised this work. For correspondence: [email protected]

arXiv:2308.09558 [pdf]

Genomic reproducibility in the bioinformatics era

Authors: Pelin Icer Baykal, Paweł P. Łabaj, Florian Markowetz, Lynn M. Schriml, Daniel J. Stekhoven, Serghei Mangul, Niko Beerenwinkel

Abstract: In biomedical research, validation of a new scientific discovery is tied to the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility still remain imprecise. Here, we argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent genomics results across technical replicates, is key to gener… ▽ More In biomedical research, validation of a new scientific discovery is tied to the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility still remain imprecise. Here, we argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent genomics results across technical replicates, is key to generating scientific knowledge and enabling medical applications. We first discuss different concepts of reproducibility and then focus on reproducibility in the context of genomics, aiming to establish clear definitions of relevant terms. We then focus on the role of bioinformatics tools and their impact on genomic reproducibility and assess methods of evaluating bioinformatics tools in terms of genomic reproducibility. Lastly, we suggest best practices for enhancing genomic reproducibility, with an emphasis on assessing the performance of bioinformatics tools through rigorous testing across multiple technical replicates. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 10 pages, 2 figures, 2 tables

MSC Class: J.3

arXiv:2203.16261 [pdf]

Packaging, containerization, and virtualization of computational omics methods: Advances, challenges, and opportunities

Authors: Mohammed Alser, Sharon Waymost, Ram Ayyala, Brendan Lawlor, Richard J. Abdill, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, Andre M. Ribeiro-dos-Santos, Can Firtina, Nour Almadhoun, Varuni Sarwal, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do, Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul

Abstract: Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging, virtualization, and containerization are different approaches to satisfy this need by wrap** omics tools in additional softwa… ▽ More Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging, virtualization, and containerization are different approaches to satisfy this need by wrap** omics tools in additional software that makes the omics tools easier to install and use. Here, we systematically review practices across prominent packaging, virtualization, and containerization platforms. We outline the challenges, advantages, and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers, and system administrators. We also propose principles to make packaging, virtualization, and containerization of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2104.14005 [pdf]

Unlocking capacities of viral genomics for the COVID-19 pandemic response

Authors: Sergey Knyazev, Karishma Chhugani, Varuni Sarwal, Ram Ayyala, Harman Singh, Smruthi Karthikeyan, Dhrithi Deshpande, Zoia Comarova, Angela Lu, Yuri Porozov, Ai** Wu, Malak Abedalthagafi, Shivashankar Nagaraj, Adam Smith, Pavel Skums, Jason Ladner, Tommy Tsan-Yuk Lam, Nicholas Wu, Alex Zelikovsky, Rob Knight, Keith Crandall, Serghei Mangul

Abstract: More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encod… ▽ More More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encoded in these vast amounts of data requires substantial effort across the research and public health communities. Studies of SARS-CoV-2 genomes have been critical in tracking the spread of variants and understanding its epidemic dynamics, and may prove crucial for controlling future epidemics and alleviating significant public health burdens. Together, genomic data and bioinformatics methods enable broad-scale investigations of the spread of SARS-CoV-2 at the local, national, and global scales and allow researchers the ability to efficiently track the emergence of novel variants, reconstruct epidemic dynamics, and provide important insights into drug and vaccine development and disease control. Here, we discuss the tremendous opportunities that genomics offers to unlock the effective use of SARS-CoV-2 genomic data for efficient public health surveillance and guiding timely responses to COVID-19. △ Less

Submitted 4 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

arXiv:2102.01521 [pdf]

doi 10.1128/mSystems.00095-21

Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure

Authors: Halie M. Rando, Adam L. MacLean, Alexandra J. Lee, Ronan Lordan, Sandipan Ray, Vikas Bansal, Ashwin N. Skelly, Elizabeth Sell, John J. Dziak, Lamonica Shinholster, Lucy D'Agostino McGowan, Marouen Ben Guebila, Nils Wellhausen, Sergey Knyazev, Simina M. Boca, Stephen Capone, Yanjun Qi, YoSon Park, Yuchen Sun, David Mai, Joel D. Boerckel, Christian Brueffer, James Brian Byrd, Jeremy P. Kamil, **hui Wang , et al. (9 additional authors not shown)

Abstract: The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the… ▽ More The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease. △ Less

Submitted 3 December, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

arXiv:2010.10402 [pdf]

Diversity in immunogenomics: the value and the challenge

Authors: Kerui Peng, Yana Safonova, Mikhail Shugay, Alice Popejoy, Oscar Rodriguez, Felix Breden, Petter Brodin, Amanda M. Burkhardt, Carlos Bustamante, Van-Mai Cao-Lormeau, Martin M. Corcoran, Darragh Duffy, Macarena Fuentes Guajardo, Ricardo Fujita, Victor Greiff, Vanessa D. Jonsson, Xiao Liu, Lluis Quintana-Murci, Maura Rossetti, Jianming Xie, Gur Yaari, Wei Zhang, Malak S. Abedalthagafi, Khalid O. Adekoya, Rahaman A. Ahmed , et al. (10 additional authors not shown)

Abstract: With the advent of high-throughput sequencing technologies, the fields of immunogenomics and adaptive immune receptor repertoire research are facing both opportunities and challenges. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become an increasingly important tool to characterize T and B cell responses in settings of interest. However, the majority of AIRR-seq studies conducted… ▽ More With the advent of high-throughput sequencing technologies, the fields of immunogenomics and adaptive immune receptor repertoire research are facing both opportunities and challenges. Adaptive immune receptor repertoire sequencing (AIRR-seq) has become an increasingly important tool to characterize T and B cell responses in settings of interest. However, the majority of AIRR-seq studies conducted so far were performed in individuals of European ancestry, restricting the ability to identify variation in human adaptive immune responses across populations and limiting their applications. As AIRR-seq studies depend on the ability to assign VDJ sequence reads to the correct germline gene segments, efforts to characterize the genomic loci that encode adaptive immune receptor genes in different populations are urgently needed. The availability of comprehensive germline gene databases and further applications of AIRR-seq studies to individuals of non-European ancestry will substantially enhance our understanding of human adaptive immune responses, promote the development of effective diagnostics and treatments, and eventually advance precision medicine. △ Less

Submitted 1 March, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: 22 pages,1 table

arXiv:2010.02391 [pdf]

RNA-seq data science: From raw data to effective interpretation

Authors: Dhrithi Deshpande, Karishma Chhugani, Yutong Chang, Aaron Karlsberg, Caitlin Loeffler, **yang Zhang, Agata Muszynska, Jeremy Rotman, Laura Tao, Brunilda Balliu, Elizabeth Tseng, Eleazar Eskin, Fangqing Zhao, Pejman Mohammadi, Pawel P Labaj, Serghei Mangul

Abstract: RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It gene… ▽ More RNA-sequencing (RNA-seq) has become an exemplar technology in modern biology and clinical applications over the past decade. It has gained immense popularity in the recent years driven by continuous efforts of the bioinformatics community to develop accurate and scalable computational tools. RNA-seq is a method of analyzing the RNA content of a sample using the modern sequencing platforms. It generates enormous amounts of transcriptomic data in the form of nucleotide sequences, known as reads. RNA-seq analysis enables the probing of genes and corresponding transcripts which is essential for answering important biological questions, such as detecting novel exons, transcripts, gene expressions, and studying alternative splicing structure. However, obtaining meaningful biological signals from raw data using computational methods is challenging due to the limitations of modern sequencing technologies. The need to leverage these technological challenges have pushed the rapid development of many novel computational tools which have evolved and diversified in accordance with technological advancements, leading to the current myriad population of RNA-seq tools. Our review provides a systemic overview of RNA-seq technology and 235 available RNA-seq tools across various domains published from 2008 to 2020, discussing the interdisciplinary nature of bioinformatics involved in RNA sequencing, analysis, and software development. △ Less

Submitted 16 February, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

arXiv:2003.00110 [pdf]

doi 10.1186/s13059-021-02443-7

Technology dictates algorithms: Recent developments in read alignment

Authors: Mohammed Alser, Jeremy Rotman, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, Serghei Mangul

Abstract: Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants… ▽ More Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies. △ Less

Submitted 9 July, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

Journal ref: Genome Biol . Aug 26;22(1):249, 2021

arXiv:2002.12268 [pdf]

Refining the conference experience for junior scientists in the wake of climate change

Authors: Ruth Johnson, Andrada Fiscutean, Serghei Mangul

Abstract: With the ever-increasing carbon footprint associated with conferences, scientists can learn to refine their conference experiences when they do need to travel. We offer insight on how to optimize the conference experience through attending speaker sessions, giving presentations, and networking. With the ever-increasing carbon footprint associated with conferences, scientists can learn to refine their conference experiences when they do need to travel. We offer insight on how to optimize the conference experience through attending speaker sessions, giving presentations, and networking. △ Less

Submitted 17 June, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

arXiv:2001.05127 [pdf]

Recommendations to enhance rigor and reproducibility in biomedical research

Authors: Jaqueline J. Brito, Jun Li, Jason H. Moore, Casey S. Greene, Nicole A. Nogoy, Lana X. Garmire, Serghei Mangul

Abstract: Computational methods have reshaped the landscape of modern biology. While the biomedical community is increasingly dependent on computational tools, the mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present academic software for which essential materials are or become unavailable, such as… ▽ More Computational methods have reshaped the landscape of modern biology. While the biomedical community is increasingly dependent on computational tools, the mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present academic software for which essential materials are or become unavailable, such as source code and documentation. Publications that lack such information compromise the role of peer review in evaluating technical strength and scientific contribution. Incomplete ancillary information for an academic software package may bias or limit any subsequent work produced with the tool. We provide eight recommendations across four different domains to improve reproducibility, transparency, and rigor in computational biology - precisely on the main values which should be emphasized in life science curricula. Our recommendations for improving software availability, usability, and archival stability aim to foster a sustainable data science ecosystem in biomedicine and life science research. △ Less

Submitted 27 July, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

arXiv:1911.11304 [pdf]

Metagenomics for clinical diagnostics: technologies and informatics

Authors: Caitlin Loeffler, Keylie M. Gibson, Lana Martin, Liz Chang, Jeremy Rotman, Ian V. Toma, Christopher E. Mason, Eleazar Eskin, Joseph P. Zackular, Keith A. Crandall, David Koslicki, Serghei Mangul

Abstract: The human-associated microbiome is closely tied to human health and is of substantial clinical interest. Metagenomics-based tools are emerging for clinical diagnostics, tracking the spread of diseases, and surveillance of potential pathogens. In some cases, these tools are overcoming limitations of traditional clinical approaches. Metagenomics has limitations barring the tools from clinical valida… ▽ More The human-associated microbiome is closely tied to human health and is of substantial clinical interest. Metagenomics-based tools are emerging for clinical diagnostics, tracking the spread of diseases, and surveillance of potential pathogens. In some cases, these tools are overcoming limitations of traditional clinical approaches. Metagenomics has limitations barring the tools from clinical validation. Once these hurdles are overcome, clinical metagenomics will inform doctors of the best, targeted treatment for their patients and provide early detection of disease. Here we present an overview of metagenomics methods with a discussion of computational challenges and limitations. △ Less

Submitted 7 August, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: 75 pages, 7 figures, 2 tables, 4 supplementary table, review paper

arXiv:1909.12469 [pdf]

Telescope: an interactive tool for managing large scale analysis from mobile devices

Authors: Jaqueline J. Brito, Thiago Mosqueiro, Jeremy Rotman, Victor Xue, Douglas J. Chapski, Juan De la Hoz, Paulo Matias, Lana Martin, Alex Zelikovsky, Matteo Pellegrinni, Serghei Mangul

Abstract: In today's world of big data, computational analysis has become a key driver of biomedical research. Recent exponential growth in the volume of available omics data has reshaped the landscape of contemporary biology, creating demand for a continuous feedback loop that seamlessly integrates experimental biology techniques and bioinformatics tools. High-performance computational facilities are capab… ▽ More In today's world of big data, computational analysis has become a key driver of biomedical research. Recent exponential growth in the volume of available omics data has reshaped the landscape of contemporary biology, creating demand for a continuous feedback loop that seamlessly integrates experimental biology techniques and bioinformatics tools. High-performance computational facilities are capable of processing considerable volumes of data, yet often lack an easy-to-use interface to guide the user in supervising and adjusting bioinformatics analysis in real-time. Here we report the development of Telescope, a novel interactive tool that interfaces with high-performance computational clusters to deliver an intuitive user interface for controlling and monitoring bioinformatics analyses in real-time. Telescope was designed to natively operate with a simple and straightforward interface using Web 2.0 technology compatible with most modern devices (e.g., tablets and personal smartphones). Telescope provides a modern and elegant solution to integrate computational analyses into the experimental environment of biomedical research. Additionally, it allows biomedical researchers to leverage the power of large computational facilities in a user-friendly manner. Telescope is freely available at https://github.com/Mangul-Lab-USC/telescope. △ Less

Submitted 5 December, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

Showing 1–16 of 16 results for author: Mangul, S