-
SuperPos-Prompt: Enhancing Soft Prompt Tuning of Language Models with Superposition of Multi Token Embeddings
Authors:
MohammadAli SadraeiJavaeri,
Ehsaneddin Asgari,
Alice Carolyn McHardy,
Hamid Reza Rabiee
Abstract:
Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this…
▽ More
Soft prompt tuning techniques have recently gained traction as an effective strategy for the parameter-efficient tuning of pretrained language models, particularly minimizing the required adjustment of model parameters. Despite their growing use, achieving optimal tuning with soft prompts, especially for smaller datasets, remains a substantial challenge. This study makes two contributions in this domain: (i) we introduce SuperPos-Prompt, a new reparameterization technique employing the superposition of multiple pretrained vocabulary embeddings to improve the learning of soft prompts. Our experiments across several GLUE and SuperGLUE benchmarks consistently highlight SuperPos-Prompt's superiority over Residual Prompt tuning, exhibiting an average score increase of $+6.4$ in T5-Small and $+5.0$ in T5-Base along with a faster convergence. Remarkably, SuperPos-Prompt occasionally outperforms even full fine-tuning methods. (ii) Additionally, we demonstrate enhanced performance and rapid convergence by omitting dropouts from the frozen network, yielding consistent improvements across various scenarios and tuning methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Genetic recombination as DNA repair
Authors:
Dmitri Parkhomchuk,
Alice C. McHardy,
Alexey Shadrin
Abstract:
Maintenance of sexual reproduction and genetic recombination imposes physiological costs when compared to parthenogenic reproduction, most prominently: for maintaining the corresponding (molecular) machinery, for finding a mating partner, and through the decreased fraction of females in a population, which decreases the reproductive capacity. Based on principles from information theory, we have pr…
▽ More
Maintenance of sexual reproduction and genetic recombination imposes physiological costs when compared to parthenogenic reproduction, most prominently: for maintaining the corresponding (molecular) machinery, for finding a mating partner, and through the decreased fraction of females in a population, which decreases the reproductive capacity. Based on principles from information theory, we have previously developed a new population genetic model, and applying it in simulations, we have recently hypothesized that all species maintain the maximum genomic complexity that is required by their niche and allowed by their mutation rate and selection intensity. Applying this idea to the complexity overhead of recombination maintenance, its costs must be more than compensated by an additional capacity for complexity in recombining populations. Here, we show a simple mechanism, where recombination helps to maintain larger biases of alleles frequencies in a population, so the advantageous alleles can have increased frequency. This allows recombining populations to maintain higher fitness and phenotypic efficiency in comparison with asexual populations with the same parameters. Random mating alone already significantly increases the ability to maintain genomic and phenotypic complexity. Sexual selection provides additional capacity for this complexity. The model can be considered as a unifying synthesis of previous hypotheses about the roles of recombination in Muller's ratchet, mutation purging and Red Queen dynamics, because the introduction of recombination both increases population frequencies of beneficial alleles and decreases detrimental ones. In addition, we suggest simple explanations for niche-dependent prevalence of transient asexuality and the exceptional asexual lineage of Bdelloid rotifers.
△ Less
Submitted 23 May, 2016;
originally announced May 2016.
-
Snowball: Strain aware gene assembly of Metagenomes
Authors:
I. Gregor,
A. Schönhuth,
A. C. McHardy
Abstract:
Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. We have developed Snowball, a novel strain aware and reference-free gene assembler for shot…
▽ More
Gene assembly is an important step in functional analysis of shotgun metagenomic data. Nonetheless, strain aware assembly remains a challenging task, as current assembly tools often fail to distinguish among strain variants or require closely related reference genomes of the studied species to be available. We have developed Snowball, a novel strain aware and reference-free gene assembler for shotgun metagenomic data. It uses profile hidden Markov models (HMMs) of gene domains of interest to guide the assembly. Our assembler performs gene assembly of individual gene domains based on read overlaps and error correction using read quality scores at the same time, which result in very low per-base error rates. The software runs on a user-defined number of processor cores in parallel, runs on a standard laptop and is freely available for installation under Linux or OS X on: https://github.com/algbioi/snowball/wiki
△ Less
Submitted 13 October, 2015;
originally announced October 2015.
-
PhyloPythiaS+: A self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
Authors:
I. Gregor,
J. Dröge,
M. Schirmer,
C. Quince,
A. C. McHardy
Abstract:
Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. For communities of up to medium diversity, e.g. excluding environments such as soil, this is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' repr…
▽ More
Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. For communities of up to medium diversity, e.g. excluding environments such as soil, this is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community from which they originate. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for the recovery of species bins from an individual metagenome sample is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in a composition-based taxonomic metagenome classifier and identifies the 'training' sequences using marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area may not have. With these challenges in mind, we have developed PhyloPythiaS+, a successor to our previously described method PhyloPythia(S). The newly developed + component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated k-mer counting 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion.
△ Less
Submitted 27 June, 2014;
originally announced June 2014.
-
Taxator-tk: Fast and Precise Taxonomic Assignment of Metagenomes by Approximating Evolutionary Neighborhoods
Authors:
J. Dröge,
I. Gregor,
A. C. McHardy
Abstract:
Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows us to identify the sequenced community members and to reconstruct taxonomic bins with sequence data for the individual taxa. We describe an algorithm and the acco…
▽ More
Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows us to identify the sequenced community members and to reconstruct taxonomic bins with sequence data for the individual taxa. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignments by fast approximate determination of evolutionary neighbors from sequence similarities. Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples becauseit identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with ten CPU cores and microbial RefSeq as the genomic reference data.
△ Less
Submitted 3 April, 2014;
originally announced April 2014.