-
LitSumm: Large language models for literature summarisation of non-coding RNAs
Authors:
Andrew Green,
Carlos Ribas,
Nancy Ontiveros-Palacios,
Sam Griffiths-Jones,
Anton I. Petrov,
Alex Bateman,
Blake Sweeney
Abstract:
Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts.
Results: In th…
▽ More
Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts.
Results: In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We also applied the most commonly used automated evaluation approaches, finding that they do not correlate with human assessment. Finally, we apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied.
Availability: Code used to produce these summaries can be found here: https://github.com/RNAcentral/litscan-summarization and the dataset of contexts and summaries can be found here: https://huggingface.co/datasets/RNAcentral/litsumm-v1. Summaries are also displayed on the RNA report pages in RNAcentral (https://rnacentral.org/)
△ Less
Submitted 19 April, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Conservation and losses of avian non-coding RNA loci
Authors:
Paul P. Gardner,
Mario Fasold,
Sarah W. Burge,
Maria Ninova,
Jana Hertel,
Stephanie Kehr,
Tammy E. Steeves,
Sam Griffiths-Jones,
Peter F. Stadler
Abstract:
Here we present the results of a large-scale bioinformatic annotation of non-coding RNA loci in 48 avian genomes. Our approach uses probabilistic models of hand-curated families from the Rfam database to infer conserved RNA families within each avian genome. We supplement these annotations with predictions from the tRNA annotation tool, tRNAscan-SE and microRNAs from miRBase. We show that a number…
▽ More
Here we present the results of a large-scale bioinformatic annotation of non-coding RNA loci in 48 avian genomes. Our approach uses probabilistic models of hand-curated families from the Rfam database to infer conserved RNA families within each avian genome. We supplement these annotations with predictions from the tRNA annotation tool, tRNAscan-SE and microRNAs from miRBase. We show that a number of lncRNA-associated loci are conserved between birds and mammals, including several intriguing cases where the reported mammalian lncRNA function is not conserved in birds. We also demonstrate extensive conservation of classical ncRNAs (e.g., tRNAs) and more recently discovered ncRNAs (e.g., snoRNAs and miRNAs) in birds. Furthermore, we describe numerous "losses" of several RNA families, and attribute these to genuine loss, divergence or missing data. In particular, we show that many of these losses are due to the challenges associated with assembling Avian microchromosomes. These combined results illustrate the utility of applying homology-based methods for annotating novel vertebrate genomes.
△ Less
Submitted 27 June, 2014;
originally announced June 2014.
-
Clusters of microRNAs emerge by new hairpins in existing transcripts
Authors:
Antonio Marco,
Maria Ninova,
Matthew Ronshaugen,
Sam Griffiths-Jones
Abstract:
Genetic linkage may result in the expression of multiple products from a polycistronic transcript, under the control of a single promoter. In animals, protein-coding polycistronic transcripts are rare. However, microRNAs are frequently clustered in the genomes of animals, and these clusters are often transcribed as a single unit. The evolution of microRNA clusters has been the subject of much spec…
▽ More
Genetic linkage may result in the expression of multiple products from a polycistronic transcript, under the control of a single promoter. In animals, protein-coding polycistronic transcripts are rare. However, microRNAs are frequently clustered in the genomes of animals, and these clusters are often transcribed as a single unit. The evolution of microRNA clusters has been the subject of much speculation, and a selective advantage of clusters of functionally related microRNAs is often proposed. However, the origin of microRNA clusters has not been so far explored. Here we study the evolution of microRNA clusters in Drosophila melanogaster. We observed that the majority of microRNA clusters arose by the de novo formation of new microRNA-like hairpins in existing microRNA transcripts. Some clusters also emerged by tandem duplication of a single microRNA. Comparative genomics show that these clusters are unlikely to split or undergo rearrangements. We did not find any instances of clusters appearing by rearrangement of pre-existing microRNA genes. We propose a model for microRNA cluster evolution in which selection over one of the microRNAs in the cluster interferes with the evolution of the other linked microRNAs. Our analysis suggests that the study of microRNAs and small RNAs must consider linkage associations.
△ Less
Submitted 18 June, 2013; v1 submitted 9 April, 2013;
originally announced April 2013.