Showing 1–2 of 2 results for author: Kopiczko, D J

Search v0.5.6 released 2020-02-24

arXiv:2405.14862 [pdf, other]

cs.CL

Bitune: Bidirectional Instruction-Tuning

Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

Abstract: We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning… ▽ More We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning techniques. These causal and bidirectional features are then combined into a weighted average with trainable coefficients, which is subsequently used to generate new tokens. We demonstrate significant improvements in zero-shot performance on commonsense reasoning, arithmetic, and language understanding tasks, while extensive ablation studies validate the role of each component and demonstrate the method's agnosticism to different PEFT techniques. △ Less

Submitted 23 May, 2024; originally announced May 2024.
arXiv:2310.11454 [pdf, other]

cs.CL

VeRA: Vector-based Random Matrix Adaptation

Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

Abstract: Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameter… ▽ More Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameters compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, image classification tasks, and show its application in instruction-tuning of 7B and 13B language models. △ Less

Submitted 16 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted at ICLR 2024, website: https://dkopi.github.io/vera

Search v0.5.6 released 2020-02-24