-
Exploring Design Choices for Building Language-Specific LLMs
Authors:
Atula Tejaswi,
Nilesh Gupta,
Eunsol Choi
Abstract:
Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of…
▽ More
Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of efficiency (how many tokens are needed to encode the same amount of information) and end task performance. We find that (1) the initial performance before the adaptation is not always indicative of the final performance. (2) Efficiency can easily improved with simple vocabulary extension and continued fine-tuning in most LLMs we study, and (3) The optimal adaptation method is highly language-dependent, and the simplest approach works well across various experimental settings. Adapting English-centric models can yield better results than adapting multilingual models despite their worse initial performance on low-resource languages. Together, our work lays foundations on efficiently building language-specific LLMs by adapting existing LLMs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors
Authors:
Vijay Lingam,
Atula Tejaswi,
Aditya Vavre,
Aneesh Shetty,
Gautham Krishna Gudur,
Joydeep Ghosh,
Alex Dimakis,
Eunsol Choi,
Aleksandar Bojchevski,
Sujay Sanghavi
Abstract:
Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\) and inject learnable matrices \(ΔW\). These \(ΔW\) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although…
▽ More
Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\) and inject learnable matrices \(ΔW\). These \(ΔW\) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on \(ΔW\) depends on the specific weight matrix \(W\). Specifically, SVFT updates \(W\) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Effect of direct reaction channels on deep sub-barrier fusion in asymmetric systems
Authors:
Md. Moin Shaikh,
S. Nath,
J. Gehlot,
Tathagata Banerjee,
Ish Mukul,
R. Dubey,
A. Shamlath,
P. V. Laveen,
M. Shareef,
A. Jhingan,
N. Madhavan,
Tapan Rajbongshi,
P. Jisha,
G. Naga Jyothi,
A. Tejaswi,
Rudra N. Sahoo,
Anjali Rani
Abstract:
A steeper fall of fusion excitation function, compared to the predictions of coupled-channels models, at energies below the lowest barrier between the reaction partners, is termed as deep sub-barrier fusion hindrance. This phenomenon has been observed in many symmetric and nearly-symmetric systems. Different physical origins of the hindrance have been proposed. This work aims to study the probable…
▽ More
A steeper fall of fusion excitation function, compared to the predictions of coupled-channels models, at energies below the lowest barrier between the reaction partners, is termed as deep sub-barrier fusion hindrance. This phenomenon has been observed in many symmetric and nearly-symmetric systems. Different physical origins of the hindrance have been proposed. This work aims to study the probable effects of direct reactions on deep sub-barrier fusion cross sections. Fusion (evaporation residue) cross sections have been measured for the system $^{19}$F+$^{181}$Ta, from above the barrier down to the energies where fusion hindrance is expected to come into play. Coupled-channels calculation with standard Woods-Saxon potential gives a fair description of the fusion excitation function down to energies $\simeq 14\%$ below the barrier for the present system. This is in contrast with the observation of increasing fusion hindrance in asymmetric reactions induced by increasingly heavier projectiles, \textit{viz.} $^{6,7}$Li, $^{11}$B, $^{12}$C and $^{16}$O. The asymmetric reactions, which have not shown any signature of fusion hindrance within the measured energy range, are found to be induced by projectiles with lower $α$ break-up threshold, compared to the reactions which have shown signatures of fusion hindrance. In addition, most of the $Q$-values for light particles pick-up channels are negative for the reactions which have exhibited strong signatures of fusion hindrance, \textit{viz.} $^{12}$C+$^{198}$Pt and $^{16}$O+$^{204,208}$Pb. Thus, break-up of projectile and particle transfer channels with positive $Q$-values seem to compensate for the hindrance in fusion deep below the barrier. Inclusion of break-up and transfer channels within the framework of coupled-channels calculation would be of interest.
△ Less
Submitted 25 May, 2018; v1 submitted 13 March, 2018;
originally announced March 2018.