Search | arXiv e-print repository

A rigorous benchmarking of methods for SARS-CoV-2 lineage abundance estimation in wastewater

Authors: Viorel Munteanu, Victor Gordeev, Michael Saldana, Eva Aßmann, Justin Maine Su, Nicolae Drabcinski, Oksana Zlenko, Maryna Kit, Felicia Iordachi, Khooshbu Kantibhai Patel, Abdullah Al Nahid, Likhitha Chittampalli, Yidian Xu, Pavel Skums, Shelesh Agrawal, Martin Hölzer, Adam Smith, Alex Zelikovsky, Serghei Mangul

Abstract: In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allo… ▽ More In light of the continuous transmission and evolution of SARS-CoV-2 coupled with a significant decline in clinical testing, there is a pressing need for scalable, cost-effective, long-term, passive surveillance tools to effectively monitor viral variants circulating in the population. Wastewater genomic surveillance of SARS-CoV-2 has arrived as an alternative to clinical genomic surveillance, allowing to continuously monitor the prevalence of viral lineages in communities of various size at a fraction of the time, cost, and logistic effort and serving as an early warning system for emerging variants, critical for developed communities and especially for underserved ones. Importantly, lineage prevalence estimates obtained with this approach aren't distorted by biases related to clinical testing accessibility and participation. However, the relative performance of bioinformatics methods used to measure relative lineage abundances from wastewater sequencing data is unknown, preventing both the research community and public health authorities from making informed decisions regarding computational tool selection. Here, we perform comprehensive benchmarking of 18 bioinformatics methods for estimating the relative abundance of SARS-CoV-2 (sub)lineages in wastewater by using data from 36 in vitro mixtures of synthetic lineage and sublineage genomes. In addition, we use simulated data from 78 mixtures of lineages and sublineages co-occurring in the clinical setting with proportions mirroring their prevalence ratios observed in real data. Importantly, we investigate how the accuracy of the evaluated methods is impacted by the sequencing technology used, the associated error rate, the read length, read depth, but also by the exposure of the synthetic RNA mixtures to wastewater, with the goal of capturing the effects induced by the wastewater matrix, including RNA fragmentation and degradation. △ Less

Submitted 21 January, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: For correspondence: [email protected]

arXiv:2309.13326 [pdf]

SARS-CoV-2 Wastewater Genomic Surveillance: Approaches, Challenges, and Opportunities

Authors: Viorel Munteanu, Michael Saldana, Dumitru Ciorba, Viorel Bostan, Justin Maine Su, Nadiia Kasianchuk, Nitesh Kumar Sharma, Sergey Knyazev, Victor Gordeev, Eva Aßmann, Andrei Lobiuc, Mihai Covasa, Keith A. Crandall, Wenhao O. Ouyang, Nicholas C. Wu, Christopher Mason, Braden T Tierney, Alexander G Lucaci, Alex Zelikovsky, Fatemeh Mohebbi, Pavel Skums, Cynthia Gibas, Jessica Schlueter, Piotr Rzymski, Helena Solo-Gabriele , et al. (3 additional authors not shown)

Abstract: During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly iden… ▽ More During the SARS-CoV-2 pandemic, wastewater-based genomic surveillance (WWGS) emerged as an efficient viral surveillance tool that takes into account asymptomatic cases and can identify known and novel mutations and offers the opportunity to assign known virus lineages based on the detected mutations profiles. WWGS can also hint towards novel or cryptic lineages, but it is difficult to clearly identify and define novel lineages from wastewater (WW) alone. While WWGS has significant advantages in monitoring SARS-CoV-2 viral spread, technical challenges remain, including poor sequencing coverage and quality due to viral RNA degradation. As a result, the viral RNAs in wastewater have low concentrations and are often fragmented, making sequencing difficult. WWGS analysis requires advanced computational tools that are yet to be developed and benchmarked. The existing bioinformatics tools used to analyze wastewater sequencing data are often based on previously developed methods for quantifying the expression of transcripts or viral diversity. Those methods were not developed for wastewater sequencing data specifically, and are not optimized to address unique challenges associated with wastewater. While specialized tools for analysis of wastewater sequencing data have also been developed recently, it remains to be seen how they will perform given the ongoing evolution of SARS-CoV-2 and the decline in testing and patient-based genomic surveillance. Here, we discuss opportunities and challenges associated with WWGS, including sample preparation, sequencing technology, and bioinformatics methods. △ Less

Submitted 30 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: V Munteanu and M Saldana contributed equally to this work. M Hölzer, A Smith and S Mangul jointly supervised this work. For correspondence: [email protected]

arXiv:2205.01014 [pdf, other]

Antigenic cooperation in Viral Populations: Transformation of Functions of Intra-Host Viral Variants

Authors: Leonid Bunimovich, Athulya Ram, Pavel Skums

Abstract: In this paper we study intra-host viral adaptation by antigenic cooperation - a mechanism of immune escape that serves as an alternative to the standard mechanism of escape by continuous genomic diversification and allows to explain a number of experimental observations associated with the establishment of chronic infections by highly mutable viruses. Within this mechanism, the topology of a cross… ▽ More In this paper we study intra-host viral adaptation by antigenic cooperation - a mechanism of immune escape that serves as an alternative to the standard mechanism of escape by continuous genomic diversification and allows to explain a number of experimental observations associated with the establishment of chronic infections by highly mutable viruses. Within this mechanism, the topology of a cross-immunoreactivity network forces intra-host viral variants to specialize for complementary roles and adapt to host's immune response as a quasi-social ecosystem. Here we study dynamical changes in immune adaptation caused by evolutionary and epidemiological events. First, we show that the emergence of a viral variant with altered antigenic features may result in a rapid re-arrangement of the viral ecosystem and a change in the roles played by existing viral variants. In particular, it may push the population under immune escape by genomic diversification towards the stable state of adaptation by antigenic cooperation. Next, we study the effect of a viral transmission between two chronically infected hosts, which results in merging of two intra-host viral populations in the state of stable immune-adapted equilibrium. In this case, we also describe how the newly formed viral population adapts to the host's environment by changing the functions of its members. The results are obtained analytically for minimal cross-immunoreactivity networks and numerically for larger populations. △ Less

Submitted 5 April, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: 39 pages (including Appendix), 21 images

arXiv:2104.14005 [pdf]

Unlocking capacities of viral genomics for the COVID-19 pandemic response

Authors: Sergey Knyazev, Karishma Chhugani, Varuni Sarwal, Ram Ayyala, Harman Singh, Smruthi Karthikeyan, Dhrithi Deshpande, Zoia Comarova, Angela Lu, Yuri Porozov, Ai** Wu, Malak Abedalthagafi, Shivashankar Nagaraj, Adam Smith, Pavel Skums, Jason Ladner, Tommy Tsan-Yuk Lam, Nicholas Wu, Alex Zelikovsky, Rob Knight, Keith Crandall, Serghei Mangul

Abstract: More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encod… ▽ More More than any other infectious disease epidemic, the COVID-19 pandemic has been characterized by the generation of large volumes of viral genomic data at an incredible pace due to recent advances in high-throughput sequencing technologies, the rapid global spread of SARS-CoV-2, and its persistent threat to public health. However, distinguishing the most epidemiologically relevant information encoded in these vast amounts of data requires substantial effort across the research and public health communities. Studies of SARS-CoV-2 genomes have been critical in tracking the spread of variants and understanding its epidemic dynamics, and may prove crucial for controlling future epidemics and alleviating significant public health burdens. Together, genomic data and bioinformatics methods enable broad-scale investigations of the spread of SARS-CoV-2 at the local, national, and global scales and allow researchers the ability to efficiently track the emergence of novel variants, reconstruct epidemic dynamics, and provide important insights into drug and vaccine development and disease control. Here, we discuss the tremendous opportunities that genomics offers to unlock the effective use of SARS-CoV-2 genomic data for efficient public health surveillance and guiding timely responses to COVID-19. △ Less

Submitted 4 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

arXiv:2005.13703 [pdf, other]

doi 10.1089/cmb.2020.0500

Scale-free spanning trees: complexity, bounds and algorithms

Authors: Yury Orlovich, Kirill Kukharenko, Volker Kaibel, Pavel Skums

Abstract: We introduce and study the general problem of finding a most "scale-free-like" spanning tree of a connected graph. It is motivated by a particular problem in epidemiology, and may be useful in studies of various dynamical processes in networks. We employ two possible objective functions for this problem and introduce the corresponding algorithmic problems termed $m$-SF and $s$-SF Spanning Tree pro… ▽ More We introduce and study the general problem of finding a most "scale-free-like" spanning tree of a connected graph. It is motivated by a particular problem in epidemiology, and may be useful in studies of various dynamical processes in networks. We employ two possible objective functions for this problem and introduce the corresponding algorithmic problems termed $m$-SF and $s$-SF Spanning Tree problems. We prove that those problems are APX- and NP-hard, respectively, even in the classes of cubic, bipartite and split graphs. We study the relations between scale-free spanning tree problems and the max-leaf spanning tree problem, which is the classical algorithmic problem closest to ours. For split graphs, we explicitly describe the structure of optimal spanning trees and graphs with extremal solutions. Finally, we propose two Integer Linear Programming formulations and two fast heuristics for the $s$-SF Spanning Tree problem, and experimentally assess their performance using simulated and real data. △ Less

Submitted 27 May, 2020; originally announced May 2020.

MSC Class: 05C05 (Primary); 92D30; 90C10(Secondary) ACM Class: G.2.1; G.2.2

arXiv:2003.00110 [pdf]

doi 10.1186/s13059-021-02443-7

Technology dictates algorithms: Recent developments in read alignment

Authors: Mohammed Alser, Jeremy Rotman, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, Serghei Mangul

Abstract: Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants… ▽ More Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of genomic data in the form of nucleotide sequences or reads. Aligning reads onto reference genomes enables the identification of individual-specific genetic variants and is an essential step of the majority of genomic analysis pipelines. Aligned reads are essential for answering important biological questions, such as detecting mutations driving various human diseases and complex traits as well as identifying species present in metagenomic samples. The read alignment problem is extremely challenging due to the large size of analyzed datasets and numerous technological limitations of sequencing platforms, and researchers have developed novel bioinformatics algorithms to tackle these difficulties. Importantly, computational algorithms have evolved and diversified in accordance with technological advances, leading to todays diverse array of bioinformatics tools. Our review provides a survey of algorithmic foundations and methodologies across 107 alignment methods published between 1988 and 2020, for both short and long reads. We provide rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read aligners. We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques. We also discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology, including whole transcriptome, adaptive immune repertoire, and human microbiome studies. △ Less

Submitted 9 July, 2020; v1 submitted 28 February, 2020; originally announced March 2020.

Journal ref: Genome Biol . Aug 26;22(1):249, 2021

arXiv:1912.11385 [pdf, other]

Graph fractal dimension and structure of fractal networks: a combinatorial perspective

Authors: Pavel Skums, Leonid Bunimovich

Abstract: In this paper we study self-similar and fractal networks from the combinatorial perspective. We establish analogues of topological (Lebesgue) and fractal (Hausdorff) dimensions for graphs and demonstrate that they are naturally related to known graph-theoretical characteristics: rank dimension and product (or Prague or Nešetřil-Rödl) dimension. Our approach reveals how self-similarity and fractali… ▽ More In this paper we study self-similar and fractal networks from the combinatorial perspective. We establish analogues of topological (Lebesgue) and fractal (Hausdorff) dimensions for graphs and demonstrate that they are naturally related to known graph-theoretical characteristics: rank dimension and product (or Prague or Nešetřil-Rödl) dimension. Our approach reveals how self-similarity and fractality of a network are defined by a pattern of overlaps between densely connected network communities. It allows us to identify fractal graphs, explore the relations between graph fractality, graph colorings and graph Kolmogorov complexity, and analyze the fractality of several classes of graphs and network models, as well as of a number of real-life networks. We demonstrate the application of our framework to evolutionary studies by revealing the growth of self-organization of heterogeneous viral populations over the course of their intra-host evolution, thus suggesting mechanisms of their gradual adaptation to the host's environment. As far as the authors know, the proposed approach is the first theoretical framework for study of network fractality within the combinatorial paradigm. The obtained results lay a foundation for studying fractal properties of complex networks using combinatorial methods and algorithms. △ Less

Submitted 23 December, 2019; originally announced December 2019.

Comments: arXiv admin note: text overlap with arXiv:1607.04703

arXiv:1607.04703 [pdf, other]

Graph Hausdorff dimension, Kolmogorov complexity and construction of fractal graphs

Authors: Leonid Bunimovich, Pavel Skums

Abstract: In this paper we introduce and study discrete analogues of Lebesgue and Hausdorff dimensions for graphs. It turned out that they are closely related to well-known graph characteristics such as rank dimension and Prague (or Nešetřil-Rödl) dimension. It allows us to formally define fractal graphs and establish fractality of some graph classes. We show, how Hausdorff dimension of graphs is related to… ▽ More In this paper we introduce and study discrete analogues of Lebesgue and Hausdorff dimensions for graphs. It turned out that they are closely related to well-known graph characteristics such as rank dimension and Prague (or Nešetřil-Rödl) dimension. It allows us to formally define fractal graphs and establish fractality of some graph classes. We show, how Hausdorff dimension of graphs is related to their Kolmogorov complexity. We also demonstrate fruitfulness of this interdisciplinary approach by discovering a novel property of general compact metric spaces using ideas from hypergraphs theory and by proving an estimation for Prague dimension of almost all graphs using methods from algorithmic information theory. △ Less

Submitted 20 March, 2019; v1 submitted 15 July, 2016; originally announced July 2016.

MSC Class: 05C10 (Primary) 05C62; 54F45 (Secondary)

arXiv:1107.3597 [pdf, ps, other]

Krausz dimension and its generalizations in special graph classes

Authors: Olga Glebova, Yury Metelsky, Pavel Skums

Abstract: A {\it krausz $(k,m)$-partition} of a graph $G$ is the partition of $G$ into cliques, such that any vertex belongs to at most $k$ cliques and any two cliques have at most $m$ vertices in common. The {\it $m$-krausz} dimension $kdim_m(G)$ of the graph $G$ is the minimum number $k$ such that $G$ has a krausz $(k,m)$-partition. 1-krausz dimension is known and studied krausz dimension of graph… ▽ More A {\it krausz $(k,m)$-partition} of a graph $G$ is the partition of $G$ into cliques, such that any vertex belongs to at most $k$ cliques and any two cliques have at most $m$ vertices in common. The {\it $m$-krausz} dimension $kdim_m(G)$ of the graph $G$ is the minimum number $k$ such that $G$ has a krausz $(k,m)$-partition. 1-krausz dimension is known and studied krausz dimension of graph $kdim(G)$. In this paper we prove, that the problem $"kdim(G)\leq 3"$ is polynomially solvable for chordal graphs, thus partially solving the problem of P. Hlineny and J. Kratochvil. We show, that the problem of finding $m$-krausz dimension is NP-hard for every $m\geq 1$, even if restricted to (1,2)-colorable graphs, but the problem $"kdim_m(G)\leq k"$ is polynomially solvable for $(\infty,1)$-polar graphs for every fixed $k,m\geq 1$. △ Less

Submitted 18 July, 2011; originally announced July 2011.

arXiv:1011.4726 [pdf, ps, other]

doi 10.1016/j.disc.2013.07.003

$H$-product and $H$-threshold graphs

Authors: Pavel Skums

Abstract: This paper is the continuation of the research of the author and his colleagues of the {\it canonical} decomposition of graphs. The idea of the canonical decomposition is to define the binary operation on the set of graphs and to represent the graph under study as a product of prime elements with respect to this operation. We consider the graph together with the arbitrary partition of its vertex s… ▽ More This paper is the continuation of the research of the author and his colleagues of the {\it canonical} decomposition of graphs. The idea of the canonical decomposition is to define the binary operation on the set of graphs and to represent the graph under study as a product of prime elements with respect to this operation. We consider the graph together with the arbitrary partition of its vertex set into $n$ subsets ($n$-partitioned graph). On the set of $n$-partitioned graphs distinguished up to isomorphism we consider the binary algebraic operation $\circ_H$ ($H$-product of graphs), determined by the digraph $H$. It is proved, that every operation $\circ_H$ defines the unique factorization as a product of prime factors. We define $H$-threshold graphs as graphs, which could be represented as the product $\circ_{H}$ of one-vertex factors, and the threshold-width of the graph $G$ as the minimum size of $H$ such, that $G$ is $H$-threshold. $H$-threshold graphs generalize the classes of threshold graphs and difference graphs and extend their properties. We show, that the threshold-width is defined for all graphs, and give the characterization of graphs with fixed threshold-width. We study in detail the graphs with threshold-widths 1 and 2. △ Less

Submitted 10 June, 2011; v1 submitted 21 November, 2010; originally announced November 2010.

MSC Class: 05C60; 05C75; 05C76

arXiv:0804.4093 [pdf, ps, other]

Reconstruction of p-disconnected graphs

Authors: Pavel Skums

Abstract: We prove that Kelly-Ulam conjecture is true for p-disconnected graphs. We prove that Kelly-Ulam conjecture is true for p-disconnected graphs. △ Less

Submitted 25 April, 2008; originally announced April 2008.

Comments: 7 pages

MSC Class: 05C60

Showing 1–11 of 11 results for author: Skums, P