Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features
Authors:
Diogo Cruz,
Edoardo Pona,
Alex Holness-Tofts,
Elias Schmied,
Víctor Abia Alonso,
Charlie Griffin,
Bogdan-Ionut Cirstea
Abstract:
Many capable large language models (LLMs) are developed via self-supervised pre-training followed by a reinforcement-learning fine-tuning phase, often based on human or AI feedback. During this stage, models may be guided by their inductive biases to rely on simpler features which may be easier to extract, at a cost to robustness and generalisation. We investigate whether principles governing indu…
▽ More
Many capable large language models (LLMs) are developed via self-supervised pre-training followed by a reinforcement-learning fine-tuning phase, often based on human or AI feedback. During this stage, models may be guided by their inductive biases to rely on simpler features which may be easier to extract, at a cost to robustness and generalisation. We investigate whether principles governing inductive biases in the supervised fine-tuning of LLMs also apply when the fine-tuning process uses reinforcement learning. Following Lovering et al (2021), we test two hypotheses: that features more $\textit{extractable}$ after pre-training are more likely to be utilised by the final policy, and that the evidence for/against a feature predicts whether it will be utilised. Through controlled experiments on synthetic and natural language tasks, we find statistically significant correlations which constitute strong evidence for these hypotheses.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
Discovery of TOI-1260d and the characterisation of the multi-planet system
Authors:
Kristine W. F. Lam,
J. Cabrera,
M. J. Hooton,
Y. Alibert,
A. Bonfanti,
M. Beck,
A. Deline,
H. -G. Florén,
A. E. Simon,
L. Fossati,
C. M. Persson,
M. Fridlund,
S. Salmon,
S. Hoyer,
H. P. Osborn,
T . G. Wilson,
I. Y. Georgieva,
Gr. Nowak,
R. Luque,
J. A. Egger,
V. Adibekyan R. Alonso,
G. Anglada Escudé,
T. Bárczy,
D. Barrado,
S. C. C. Barros
, et al. (61 additional authors not shown)
Abstract:
We report the discovery of a third planet transiting the star TOI-1260, previously known to host two transiting sub-Neptune planets with orbital periods of 3.127 and 7.493 days, respectively. The nature of the third transiting planet with a 16.6-day orbit is supported by ground-based follow-up observations, including time-series photometry, high-angular resolution images, spectroscopy, and archiva…
▽ More
We report the discovery of a third planet transiting the star TOI-1260, previously known to host two transiting sub-Neptune planets with orbital periods of 3.127 and 7.493 days, respectively. The nature of the third transiting planet with a 16.6-day orbit is supported by ground-based follow-up observations, including time-series photometry, high-angular resolution images, spectroscopy, and archival imagery. Precise photometric monitoring with CHEOPS allows to improve the constraints on the parameters of the system, improving our knowledge on their composition. The improved radii of TOI-1260b, TOI-1260c are $2.36 \pm 0.06 \rm R_{\oplus}$, $2.82 \pm 0.08 \rm R_{\oplus}$, respectively while the newly discovered third planet has a radius of $3.09 \pm 0.09 \rm R_{\oplus}$. The radius uncertainties are in the range of 3\%, allowing a precise interpretation of the interior structure of the three planets. Our planet interior composition model suggests that all three planets in the TOI-1260 system contains some fraction of gas. The innermost planet TOI-1260b has most likely lost all of its primordial hydrogen-dominated envelope. Planets c and d were also likely to have experienced significant loss of atmospheric through escape, but to a lesser extent compared to planet b.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.