Skip to main content

Showing 1–2 of 2 results for author: Kopiczko, D J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14862  [pdf, other

    cs.CL

    Bitune: Bidirectional Instruction-Tuning

    Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

    Abstract: We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2310.11454  [pdf, other

    cs.CL

    VeRA: Vector-based Random Matrix Adaptation

    Authors: Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

    Abstract: Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameter… ▽ More

    Submitted 16 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024, website: https://dkopi.github.io/vera