Search | arXiv e-print repository

Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

Abstract: We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n… ▽ More We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of nearly linear algorithms by controlling the LoRA update computation term by term, assuming the Strong Exponential Time Hypothesis (SETH). For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $\mathbf{X}$, pretrained weights $\mathbf{W^\star}$, and adapter matrices $α\mathbf{B} \mathbf{A} / r$. Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $\mathbf{W}_V$ and $\mathbf{W}_Q$) and full adaptations (e.g., $\mathbf{W}_Q$, $\mathbf{W}_V$, and $\mathbf{W}_K$) of weights in attention heads. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2403.03391 [pdf, other]

CoRMF: Criticality-Ordered Recurrent Mean Field Ising Solver

Authors: Zhenyu Pan, Ammar Gilani, En-Jui Kuo, Zhuo Liu

Abstract: We propose an RNN-based efficient Ising model solver, the Criticality-ordered Recurrent Mean Field (CoRMF), for forward Ising problems. In its core, a criticality-ordered spin sequence of an $N$-spin Ising model is introduced by sorting mission-critical edges with greedy algorithm, such that an autoregressive mean-field factorization can be utilized and optimized with Recurrent Neural Networks (RN… ▽ More We propose an RNN-based efficient Ising model solver, the Criticality-ordered Recurrent Mean Field (CoRMF), for forward Ising problems. In its core, a criticality-ordered spin sequence of an $N$-spin Ising model is introduced by sorting mission-critical edges with greedy algorithm, such that an autoregressive mean-field factorization can be utilized and optimized with Recurrent Neural Networks (RNNs). Our method has two notable characteristics: (i) by leveraging the approximated tree structure of the underlying Ising graph, the newly-obtained criticality order enables the unification between variational mean-field and RNN, allowing the generally intractable Ising model to be efficiently probed with probabilistic inference; (ii) it is well-modulized, model-independent while at the same time expressive enough, and hence fully applicable to any forward Ising inference problems with minimal effort. Computationally, by using a variance-reduced Monte Carlo gradient estimator, CoRFM solves the Ising problems in a self-train fashion without data/evidence, and the inference tasks can be executed by directly sampling from RNN. Theoretically, we establish a provably tighter error bound than naive mean-field by using the matrix cut decomposition machineries. Numerically, we demonstrate the utility of this framework on a series of Ising datasets. △ Less

Submitted 7 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2305.14558 [pdf, other]

Statistical causal inference methods for observational research in PER: a primer

Authors: Vidushi Adlakha, Eric Kuo

Abstract: Recent critiques of Physics Education Research (PER) studies have revoiced the critical issues when drawing causal inferences from observational data where no intervention is present. In response to a call for a "causal reasoning primer", this paper discusses some of the fundamental issues underlying statistical causal inference. In reviewing these issues, we discuss well-established causal infere… ▽ More Recent critiques of Physics Education Research (PER) studies have revoiced the critical issues when drawing causal inferences from observational data where no intervention is present. In response to a call for a "causal reasoning primer", this paper discusses some of the fundamental issues underlying statistical causal inference. In reviewing these issues, we discuss well-established causal inference methods commonly applied in other fields and discuss their application to PER. Using simulated data sets, we illustrate (i) why analysis for causal inference should control for confounders but not control for mediators and colliders and (ii) that multiple proposed causal models can fit a highly correlated data set. Finally, we discuss how these causal inference methods can be used to represent and explain existing issues in quantitative PER. Throughout, we discuss a central issue: quantitative results from observational studies cannot support a researcher's proposed causal model over other alternative models. To address this issue, we propose an explicit role for observational studies in PER that draw statistical causal inferences: proposing future intervention studies and predicting their outcomes. Mirroring a broader connection between theoretical motivating experiments in physics, observational studies in PER can make quantitative predictions of the causal effects of interventions, and future intervention studies can test those predictions directly. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Showing 1–3 of 3 results for author: Kuo, E