Skip to main content

Showing 1–6 of 6 results for author: Kalajdzievski, D

.
  1. arXiv:2401.05605  [pdf, other

    cs.CL cs.LG

    Scaling Laws for Forgetting When Fine-Tuning Large Language Models

    Authors: Damjan Kalajdzievski

    Abstract: We study and quantify the problem of forgetting when fine-tuning pre-trained large language models (LLMs) on a downstream task. We find that parameter-efficient fine-tuning (PEFT) strategies, such as Low-Rank Adapters (LoRA), still suffer from catastrophic forgetting. In particular, we identify a strong inverse linear relationship between the fine-tuning performance and the amount of forgetting wh… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    ACM Class: I.2.7

  2. arXiv:2312.03732  [pdf, other

    cs.CL cs.LG

    A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

    Authors: Damjan Kalajdzievski

    Abstract: As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This… ▽ More

    Submitted 27 November, 2023; originally announced December 2023.

    ACM Class: I.2.7

  3. arXiv:2211.16607  [pdf, other

    cs.LG eess.SP

    Transfer Entropy Bottleneck: Learning Sequence to Sequence Information Transfer

    Authors: Damjan Kalajdzievski, Ximeng Mao, Pascal Fortier-Poisson, Guillaume Lajoie, Blake Richards

    Abstract: When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challeng… ▽ More

    Submitted 8 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: 41 pages, 26 figures

    Journal ref: Transactions on Machine Learning Research (TMLR), 2023

  4. arXiv:2012.02306  [pdf, ps, other

    math.LO

    The ultrafilter and almost disjointness numbers

    Authors: Osvaldo Guzman, Damjan Kalajdzievski

    Abstract: We prove that every MAD family can be destroyed by a proper forcing that preserves $P$-points. With this result, we prove that it is consistent that $ω_{1}=\mathfrak{u}<\mathfrak{a,}$ solving a nearly 20 year old problem of Shelah and a problem of Brendle. We will also present a simple proof of a result of Blass and Shelah that the inequality $\mathfrak{u<s}$ is consistent.

    Submitted 4 June, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    MSC Class: 03E17; 03E35

  5. arXiv:1711.11148  [pdf, other

    math.LO

    Forcing and Construction Schemes

    Authors: Damjan Kalajdzievski, Fulgencio Lopez

    Abstract: We investigate forcing and independence questions relating to construction schemes. We show that adding $κ\geqω_1$ Cohen reals adds a capturing construction scheme. We study the weaker structure of $n$-capturing construction schemes and show that it is consistent to have $n$-capturing construction schemes but no $(n+1)$-capturing construction schemes. We also study the relation of $n$-capturing wi… ▽ More

    Submitted 19 January, 2018; v1 submitted 29 November, 2017; originally announced November 2017.

    Comments: 12 pages, submitted to Acta Math. Hung

    MSC Class: 03E05; 03E35; 03E65

  6. arXiv:1205.5819  [pdf, other

    stat.ML cs.LG

    Measurability Aspects of the Compactness Theorem for Sample Compression Schemes

    Authors: Damjan Kalajdzievski

    Abstract: It was proved in 1998 by Ben-David and Litman that a concept space has a sample compression scheme of size d if and only if every finite subspace has a sample compression scheme of size d. In the compactness theorem, measurability of the hypotheses of the created sample compression scheme is not guaranteed; at the same time measurability of the hypotheses is a necessary condition for learnability.… ▽ More

    Submitted 17 July, 2012; v1 submitted 25 May, 2012; originally announced May 2012.

    Comments: Latex 2e, 64 pages, 1 figure. This is an M.Sc. thesis defended on July 4'th 2012 at the University of Ottawa, Canada, under the supervision of Dr. V. Pestov, and with examiners Dr. J. Levy and Dr. S. Zilles