-
A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater
Authors:
Viorel Munteanu,
Victor Gordeev,
Michael Saldana,
Eva Aßmann,
Justin Maine Su,
Nicolae Drabcinski,
Oksana Zlenko,
Maryna Kit,
Felicia Iordachi,
Khooshbu Kantibhai Patel,
Abdullah Al Nahid,
Likhitha Chittampalli,
Yidian Xu,
Pavel Skums,
Shelesh Agrawal,
Martin Hölzer,
Adam Smith,
Alex Zelikovsky,
Serghei Mangul
Abstract:
In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allo…
▽ More
In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allowing to continuously monitor the prevalence of viral lineages in communities of various size at a fraction of the time, cost, and logistic effort and serving as an early warning system for emerging variants, critical for developed communities and especially for underserved ones. Importantly, lineage prevalence estimates obtained with this approach aren't distorted by biases related to clinical testing accessibility and participation. However, the relative performance of bioinformatics methods used to measure relative lineage abundances from wastewater sequencing data is unknown, preventing both the research community and public health authorities from making informed decisions regarding computational tool selection. Here, we perform comprehensive benchmarking of 18 bioinformatics methods for estimating the relative abundance of SARS-CoV-2 (sub)lineages in wastewater by using data from 36 in vitro mixtures of synthetic lineage and sublineage genomes. In addition, we use simulated data from 78 mixtures of lineages and sublineages co-occurring in the clinical setting with proportions mirroring their prevalence ratios observed in real data. Importantly, we investigate how the accuracy of the evaluated methods is impacted by the sequencing technology used, the associated error rate, the read length, read depth, but also by the exposure of the synthetic RNA mixtures to wastewater, with the goal of capturing the effects induced by the wastewater matrix, including RNA fragmentation and degradation.
△ Less
Submitted 21 January, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
SARS-CoV-2 Wastewater Genomic Surveillance: Approaches, Challenges, and Opportunities
Authors:
Viorel Munteanu,
Michael Saldana,
Dumitru Ciorba,
Viorel Bostan,
Justin Maine Su,
Nadiia Kasianchuk,
Nitesh Kumar Sharma,
Sergey Knyazev,
Victor Gordeev,
Eva Aßmann,
Andrei Lobiuc,
Mihai Covasa,
Keith A. Crandall,
Wenhao O. Ouyang,
Nicholas C. Wu,
Christopher Mason,
Braden T Tierney,
Alexander G Lucaci,
Alex Zelikovsky,
Fatemeh Mohebbi,
Pavel Skums,
Cynthia Gibas,
Jessica Schlueter,
Piotr Rzymski,
Helena Solo-Gabriele
, et al. (3 additional authors not shown)
Abstract:
During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly iden…
▽ More
During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly identify and define novel lineages from wastewater (WW) alone. While WWGS has significant advantages in monitoring SARS-CoV-2 viral spread, technical challenges remain, including poor sequencing coverage and quality due to viral RNA degradation. As a result, the viral RNAs in wastewater have low concentrations and are often fragmented, making sequencing difficult. WWGS analysis requires advanced computational tools that are yet to be developed and benchmarked. The existing bioinformatics tools used to analyze wastewater sequencing data are often based on previously developed methods for quantifying the expression of transcripts or viral diversity. Those methods were not developed for wastewater sequencing data specifically, and are not optimized to address unique challenges associated with wastewater. While specialized tools for analysis of wastewater sequencing data have also been developed recently, it remains to be seen how they will perform given the ongoing evolution of SARS-CoV-2 and the decline in testing and patient-based genomic surveillance. Here, we discuss opportunities and challenges associated with WWGS, including sample preparation, sequencing technology, and bioinformatics methods.
△ Less
Submitted 30 January, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Antigenic cooperation in Viral Populations: Transformation of Functions of Intra-Host Viral Variants
Authors:
Leonid Bunimovich,
Athulya Ram,
Pavel Skums
Abstract:
In this paper we study intra-host viral adaptation by antigenic cooperation - a mechanism of immune escape that serves as an alternative to the standard mechanism of escape by continuous genomic diversification and allows to explain a number of experimental observations associated with the establishment of chronic infections by highly mutable viruses. Within this mechanism, the topology of a cross…
▽ More
In this paper we study intra-host viral adaptation by antigenic cooperation - a mechanism of immune escape that serves as an alternative to the standard mechanism of escape by continuous genomic diversification and allows to explain a number of experimental observations associated with the establishment of chronic infections by highly mutable viruses. Within this mechanism, the topology of a cross-immunoreactivity network forces intra-host viral variants to specialize for complementary roles and adapt to host's immune response as a quasi-social ecosystem. Here we study dynamical changes in immune adaptation caused by evolutionary and epidemiological events. First, we show that the emergence of a viral variant with altered antigenic features may result in a rapid re-arrangement of the viral ecosystem and a change in the roles played by existing viral variants. In particular, it may push the population under immune escape by genomic diversification towards the stable state of adaptation by antigenic cooperation. Next, we study the effect of a viral transmission between two chronically infected hosts, which results in merging of two intra-host viral populations in the state of stable immune-adapted equilibrium. In this case, we also describe how the newly formed viral population adapts to the host's environment by changing the functions of its members. The results are obtained analytically for minimal cross-immunoreactivity networks and numerically for larger populations.
△ Less
Submitted 5 April, 2023; v1 submitted 2 May, 2022;
originally announced May 2022.
-
Unlocking capacities of viral genomics for the COVID-19 pandemic response
Authors:
Sergey Knyazev,
Karishma Chhugani,
Varuni Sarwal,
Ram Ayyala,
Harman Singh,
Smruthi Karthikeyan,
Dhrithi Deshpande,
Zoia Comarova,
Angela Lu,
Yuri Porozov,
Ai** Wu,
Malak Abedalthagafi,
Shivashankar Nagaraj,
Adam Smith,
Pavel Skums,
Jason Ladner,
Tommy Tsan-Yuk Lam,
Nicholas Wu,
Alex Zelikovsky,
Rob Knight,
Keith Crandall,
Serghei Mangul
Abstract:
More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encod…
▽ More
More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encoded in these vast amounts of data requires substantial effort across the research and public health communities. Studies of SARS-CoV-2 genomes have been critical in tracking the spread of variants and understanding its epidemic dynamics, and may prove crucial for controlling future epidemics and alleviating significant public health burdens. Together, genomic data and bioinformatics methods enable broad-scale investigations of the spread of SARS-CoV-2 at the local, national, and global scales and allow researchers the ability to efficiently track the emergence of novel variants, reconstruct epidemic dynamics, and provide important insights into drug and vaccine development and disease control. Here, we discuss the tremendous opportunities that genomics offers to unlock the effective use of SARS-CoV-2 genomic data for efficient public health surveillance and guiding timely responses to COVID-19.
△ Less
Submitted 4 June, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Scale-free spanning trees: complexity, bounds and algorithms
Authors:
Yury Orlovich,
Kirill Kukharenko,
Volker Kaibel,
Pavel Skums
Abstract:
We introduce and study the general problem of finding a most "scale-free-like" spanning tree of a connected graph. It is motivated by a particular problem in epidemiology, and may be useful in studies of various dynamical processes in networks. We employ two possible objective functions for this problem and introduce the corresponding algorithmic problems termed $m$-SF and $s$-SF Spanning Tree pro…
▽ More
We introduce and study the general problem of finding a most "scale-free-like" spanning tree of a connected graph. It is motivated by a particular problem in epidemiology, and may be useful in studies of various dynamical processes in networks. We employ two possible objective functions for this problem and introduce the corresponding algorithmic problems termed $m$-SF and $s$-SF Spanning Tree problems. We prove that those problems are APX- and NP-hard, respectively, even in the classes of cubic, bipartite and split graphs. We study the relations between scale-free spanning tree problems and the max-leaf spanning tree problem, which is the classical algorithmic problem closest to ours. For split graphs, we explicitly describe the structure of optimal spanning trees and graphs with extremal solutions. Finally, we propose two Integer Linear Programming formulations and two fast heuristics for the $s$-SF Spanning Tree problem, and experimentally assess their performance using simulated and real data.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Technology dictates algorithms: Recent developments in read alignment
Authors:
Mohammed Alser,
Jeremy Rotman,
Kodi Taraszka,
Huwenbo Shi,
Pelin Icer Baykal,
Harry Taegyun Yang,
Victor Xue,
Sergey Knyazev,
Benjamin D. Singer,
Brunilda Balliu,
David Koslicki,
Pavel Skums,
Alex Zelikovsky,
Can Alkan,
Onur Mutlu,
Serghei Mangul
Abstract:
Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants…
▽ More
Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies.
△ Less
Submitted 9 July, 2020; v1 submitted 28 February, 2020;
originally announced March 2020.
-
Graph fractal dimension and structure of fractal networks: a combinatorial perspective
Authors:
Pavel Skums,
Leonid Bunimovich
Abstract:
In this paper we study self-similar and fractal networks from the combinatorial perspective. We establish analogues of topological (Lebesgue) and fractal (Hausdorff) dimensions for graphs and demonstrate that they are naturally related to known graph-theoretical characteristics: rank dimension and product (or Prague or Nešetřil-Rödl) dimension. Our approach reveals how self-similarity and fractali…
▽ More
In this paper we study self-similar and fractal networks from the combinatorial perspective. We establish analogues of topological (Lebesgue) and fractal (Hausdorff) dimensions for graphs and demonstrate that they are naturally related to known graph-theoretical characteristics: rank dimension and product (or Prague or Nešetřil-Rödl) dimension. Our approach reveals how self-similarity and fractality of a network are defined by a pattern of overlaps between densely connected network communities. It allows us to identify fractal graphs, explore the relations between graph fractality, graph colorings and graph Kolmogorov complexity, and analyze the fractality of several classes of graphs and network models, as well as of a number of real-life networks. We demonstrate the application of our framework to evolutionary studies by revealing the growth of self-organization of heterogeneous viral populations over the course of their intra-host evolution, thus suggesting mechanisms of their gradual adaptation to the host's environment. As far as the authors know, the proposed approach is the first theoretical framework for study of network fractality within the combinatorial paradigm. The obtained results lay a foundation for studying fractal properties of complex networks using combinatorial methods and algorithms.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Graph Hausdorff dimension, Kolmogorov complexity and construction of fractal graphs
Authors:
Leonid Bunimovich,
Pavel Skums
Abstract:
In this paper we introduce and study discrete analogues of Lebesgue and Hausdorff dimensions for graphs. It turned out that they are closely related to well-known graph characteristics such as rank dimension and Prague (or Nešetřil-Rödl) dimension. It allows us to formally define fractal graphs and establish fractality of some graph classes. We show, how Hausdorff dimension of graphs is related to…
▽ More
In this paper we introduce and study discrete analogues of Lebesgue and Hausdorff dimensions for graphs. It turned out that they are closely related to well-known graph characteristics such as rank dimension and Prague (or Nešetřil-Rödl) dimension. It allows us to formally define fractal graphs and establish fractality of some graph classes. We show, how Hausdorff dimension of graphs is related to their Kolmogorov complexity. We also demonstrate fruitfulness of this interdisciplinary approach by discovering a novel property of general compact metric spaces using ideas from hypergraphs theory and by proving an estimation for Prague dimension of almost all graphs using methods from algorithmic information theory.
△ Less
Submitted 20 March, 2019; v1 submitted 15 July, 2016;
originally announced July 2016.
-
Krausz dimension and its generalizations in special graph classes
Authors:
Olga Glebova,
Yury Metelsky,
Pavel Skums
Abstract:
A {\it krausz $(k,m)$-partition} of a graph $G$ is the partition of $G$ into cliques, such that any vertex belongs to at most $k$ cliques and any two cliques have at most $m$ vertices in common. The {\it $m$-krausz} dimension $kdim_m(G)$ of the graph $G$ is the minimum number $k$ such that $G$ has a krausz $(k,m)$-partition. 1-krausz dimension is known and studied krausz dimension of graph…
▽ More
A {\it krausz $(k,m)$-partition} of a graph $G$ is the partition of $G$ into cliques, such that any vertex belongs to at most $k$ cliques and any two cliques have at most $m$ vertices in common. The {\it $m$-krausz} dimension $kdim_m(G)$ of the graph $G$ is the minimum number $k$ such that $G$ has a krausz $(k,m)$-partition. 1-krausz dimension is known and studied krausz dimension of graph $kdim(G)$.
In this paper we prove, that the problem $"kdim(G)\leq 3"$ is polynomially solvable for chordal graphs, thus partially solving the problem of P. Hlineny and J. Kratochvil. We show, that the problem of finding $m$-krausz dimension is NP-hard for every $m\geq 1$, even if restricted to (1,2)-colorable graphs, but the problem $"kdim_m(G)\leq k"$ is polynomially solvable for $(\infty,1)$-polar graphs for every fixed $k,m\geq 1$.
△ Less
Submitted 18 July, 2011;
originally announced July 2011.
-
$H$-product and $H$-threshold graphs
Authors:
Pavel Skums
Abstract:
This paper is the continuation of the research of the author and his colleagues of the {\it canonical} decomposition of graphs. The idea of the canonical decomposition is to define the binary operation on the set of graphs and to represent the graph under study as a product of prime elements with respect to this operation. We consider the graph together with the arbitrary partition of its vertex s…
▽ More
This paper is the continuation of the research of the author and his colleagues of the {\it canonical} decomposition of graphs. The idea of the canonical decomposition is to define the binary operation on the set of graphs and to represent the graph under study as a product of prime elements with respect to this operation. We consider the graph together with the arbitrary partition of its vertex set into $n$ subsets ($n$-partitioned graph). On the set of $n$-partitioned graphs distinguished up to isomorphism we consider the binary algebraic operation $\circ_H$ ($H$-product of graphs), determined by the digraph $H$. It is proved, that every operation $\circ_H$ defines the unique factorization as a product of prime factors. We define $H$-threshold graphs as graphs, which could be represented as the product $\circ_{H}$ of one-vertex factors, and the threshold-width of the graph $G$ as the minimum size of $H$ such, that $G$ is $H$-threshold. $H$-threshold graphs generalize the classes of threshold graphs and difference graphs and extend their properties. We show, that the threshold-width is defined for all graphs, and give the characterization of graphs with fixed threshold-width. We study in detail the graphs with threshold-widths 1 and 2.
△ Less
Submitted 10 June, 2011; v1 submitted 21 November, 2010;
originally announced November 2010.
-
Reconstruction of p-disconnected graphs
Authors:
Pavel Skums
Abstract:
We prove that Kelly-Ulam conjecture is true for p-disconnected graphs.
We prove that Kelly-Ulam conjecture is true for p-disconnected graphs.
△ Less
Submitted 25 April, 2008;
originally announced April 2008.