Skip to main content

Showing 1–1 of 1 results for author: Schmied, E

.
  1. arXiv:2311.04046  [pdf, other

    cs.LG cs.CL

    Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features

    Authors: Diogo Cruz, Edoardo Pona, Alex Holness-Tofts, Elias Schmied, VĂ­ctor Abia Alonso, Charlie Griffin, Bogdan-Ionut Cirstea

    Abstract: Many capable large language models (LLMs) are developed via self-supervised pre-training followed by a reinforcement-learning fine-tuning phase, often based on human or AI feedback. During this stage, models may be guided by their inductive biases to rely on simpler features which may be easier to extract, at a cost to robustness and generalisation. We investigate whether principles governing indu… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.