-
An Investigation of Hepatitis B Virus Genome using Markov Models
Authors:
Khadijeh,
Jahanian,
Elnaz Shalbafian,
Morteza Saberi,
Roohallah Alizadehsani,
Iman Dehzangi
Abstract:
The human genome encodes a family of editing enzymes known as APOBEC3 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3). Several family members, such as APO-BEC3G, APOBEC3F, and APOBEC3H haplotype II, exhibit activity against viruses such as HIV. These enzymes induce C-to-U mutations in the negative strand of viral genomes, resulting in multiple G-to-A changes, commonly referred…
▽ More
The human genome encodes a family of editing enzymes known as APOBEC3 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3). Several family members, such as APO-BEC3G, APOBEC3F, and APOBEC3H haplotype II, exhibit activity against viruses such as HIV. These enzymes induce C-to-U mutations in the negative strand of viral genomes, resulting in multiple G-to-A changes, commonly referred to as 'hypermutation.' Mutations catalyzed by these enzymes are sequence context-dependent in the HIV genome; for instance, APOBEC3G preferen-tially mutates G within GG, TGG, and TGGG contexts, while other members mutate G within GA, TGA, and TGAA contexts. However, the same sequence context has not been explored in relation to these enzymes and HBV. In this study, our objective is to identify the mutational footprint of APOBEC3 enzymes in the HBV genome. To achieve this, we employ a multivariable data analytics technique to investigate motif preferences and potential sequence hierarchies of mutation by APOBEC3 enzymes using full genome HBV sequences from a diverse range of naturally infected patients. This approach allows us to distinguish between normal and hypermutated sequences based on the representation of mono- to tetra-nucleotide motifs. Additionally, we aim to identify motifs associated with hypermutation induced by different APOBEC3 enzymes in HBV genomes. Our analyses reveal that either APOBEC3 enzymes are not active against HBV, or the induction of G-to-A mutations by these enzymes is not sequence context-dependent in the HBV genome.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Revolutionizing Genomics with Reinforcement Learning Techniques
Authors:
Mohsen Karami,
Roohallah Alizadehsani,
Khadijeh,
Jahanian,
Ahmadreza Argha,
Iman Dehzangi,
Hamid Alinejad-Rokny
Abstract:
In recent years, Reinforcement Learning (RL) has emerged as a powerful tool for solving a wide range of problems, including decision-making and genomics. The exponential growth of raw genomic data over the past two decades has exceeded the capacity of manual analysis, leading to a growing interest in automatic data analysis and processing. RL algorithms are capable of learning from experience with…
▽ More
In recent years, Reinforcement Learning (RL) has emerged as a powerful tool for solving a wide range of problems, including decision-making and genomics. The exponential growth of raw genomic data over the past two decades has exceeded the capacity of manual analysis, leading to a growing interest in automatic data analysis and processing. RL algorithms are capable of learning from experience with minimal human supervision, making them well-suited for genomic data analysis and interpretation. One of the key benefits of using RL is the reduced cost associated with collecting labeled training data, which is required for supervised learning. While there have been numerous studies examining the applications of Machine Learning (ML) in genomics, this survey focuses exclusively on the use of RL in various genomics research fields, including gene regulatory networks (GRNs), genome assembly, and sequence alignment. We present a comprehensive technical overview of existing studies on the application of RL in genomics, highlighting the strengths and limitations of these approaches. We then discuss potential research directions that are worthy of future exploration, including the development of more sophisticated reward functions as RL heavily depends on the accuracy of the reward function, the integration of RL with other machine learning techniques, and the application of RL to new and emerging areas in genomics research. Finally, we present our findings and conclude by summarizing the current state of the field and the future outlook for RL in genomics.
△ Less
Submitted 28 August, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
A Critical Review of the Impact of Candidate Copy Number Variants on Autism Spectrum Disorders
Authors:
Seyedeh Sedigheh Abedini,
Shiva Akhavan,
Julian Heng,
Roohallah Alizadehsani,
Iman Dehzangi,
Denis C. Bauer,
Hamid Rokny
Abstract:
Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental disorder (NDD) that is caused by genetic, epigenetic, and environmental factors. Recent advances in genomic analysis have uncovered numerous candidate genes with common and/or rare mutations that increase susceptibility to ASD. In addition, there is increasing evidence that copy number variations (CNVs), single nucleotide polymor…
▽ More
Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental disorder (NDD) that is caused by genetic, epigenetic, and environmental factors. Recent advances in genomic analysis have uncovered numerous candidate genes with common and/or rare mutations that increase susceptibility to ASD. In addition, there is increasing evidence that copy number variations (CNVs), single nucleotide polymorphisms (SNPs), and unusual de novo variants negatively affect neurodevelopment pathways in various ways. The overall rate of copy number variants found in patients with autism is 10%-20%, of which 3%-7% can be detected cytogenetically. Although the role of submicroscopic CNVs in ASD has been studied recently, their association with genomic loci and genes has not been properly studied. In this review, we focus on 47 ASD-associated CNV regions and their related genes. Here, we identify 1,632 protein-coding genes and long non-coding RNAs (lncRNAs) within these regions. Among them, 552 are significantly expressed in the brain. Using a list of ASD-associated genes from SFARI, we detect 17 regions containing at least one known ASD-associated protein-coding genes. Of the remaining 30 regions, we identify 24 regions containing at least one protein-coding genes with brain-enriched expression and nervous system phenotype in mouse mutant and one lncRNAs with both brain-enriched expression and upregulation in iPSC to neuron differentiation. Our analyses highlight the diversity of genetic lesions of CNV regions that contribute to ASD and provide new genetic evidence that lncRNA genes may contribute to etiology of ASD. In addition, the discovered CNVs will be a valuable resource for diagnostic facilities, therapeutic strategies, and research in terms of variation priority.
△ Less
Submitted 6 March, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.