Search | arXiv e-print repository

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning

Authors: Hossein Rajabzadeh, Mojtaba Valipour, Tianshu Zhu, Marzieh Tahaei, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

Abstract: Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requir… ▽ More Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requiring further fine-tuning steps. This paper proposes QDyLoRA -Quantized Dynamic Low-Rank Adaptation-, as an efficient quantization approach for dynamic low-rank adaptation. Motivated by Dynamic LoRA, QDyLoRA is able to efficiently finetune LLMs on a set of pre-defined LoRA ranks. QDyLoRA enables fine-tuning Falcon-40b for ranks 1 to 64 on a single 32 GB V100-GPU through one round of fine-tuning. Experimental results show that QDyLoRA is competitive to QLoRA and outperforms when employing its optimal rank. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Best Paper Award AAAI EIW Workshop

arXiv:2309.08922 [pdf, other]

Multimodal Multi-Hop Question Answering Through a Conversation Between Tools and Efficiently Finetuned Large Language Models

Authors: Hossein Rajabzadeh, Suyuchen Wang, Hyock Ju Kwon, Bang Liu

Abstract: We employ a tool-interacting divide-and-conquer strategy enabling large language models (LLMs) to answer complex multimodal multi-hop questions. In particular, we harness the power of large language models to divide a given multimodal multi-hop question into unimodal single-hop sub-questions to be answered by the appropriate tool from a predefined set of tools. After all corresponding tools provid… ▽ More We employ a tool-interacting divide-and-conquer strategy enabling large language models (LLMs) to answer complex multimodal multi-hop questions. In particular, we harness the power of large language models to divide a given multimodal multi-hop question into unimodal single-hop sub-questions to be answered by the appropriate tool from a predefined set of tools. After all corresponding tools provide the LLM with their answers, the LLM generates the next relevant unimodal single-hop question. To increase the reasoning ability of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer dataset. This dataset is then used to efficiently finetune the corresponding LLM. To assess the effectiveness of this approach, we conduct an evaluation on two recently introduced complex question-answering datasets. The experimental analysis demonstrate substantial improvements over existing state-of-the-art solutions, indicating the efficacy and generality of our strategy △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.05175 [pdf, other]

Nondegeneracy of the spectrum of the twisted cocycle for interval exchange transformations

Authors: Hesam Rajabzadeh, Pedram Safaee

Abstract: We prove the positivity of the top Lyapunov exponent of the twisted (spectral) cocycle, associated with IETs, with respect to a family of natural invariant measures. The proof relies on relating the top exponent to limits of exponents along families of affine invariant submanifolds of genus tending to infinity. Applications include an observation about a conjecture of Kontsevich and Zorich, a disc… ▽ More We prove the positivity of the top Lyapunov exponent of the twisted (spectral) cocycle, associated with IETs, with respect to a family of natural invariant measures. The proof relies on relating the top exponent to limits of exponents along families of affine invariant submanifolds of genus tending to infinity. Applications include an observation about a conjecture of Kontsevich and Zorich, a discrepancy estimate, and a formula for the lower local dimension of spectral measures. △ Less

Submitted 10 September, 2023; originally announced September 2023.

MSC Class: 37A25; 37E05; 30F60; 37D25

arXiv:2309.00255 [pdf, other]

SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks

Authors: Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi

Abstract: Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lac… ▽ More Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lack of generalization across different model architectures or different dimensions (e.g. depth, width, attention blocks), heavy model search requirements during training, and training a limited number of sub-models. To address these limitations, we propose SortedNet, a generalized and scalable training solution to harness the inherent modularity of DNNs. Thanks to a generalized nested architecture (which we refer as \textit{sorted} architecture in this paper) with shared parameters and its novel update scheme combining random sub-model sampling and a new gradient accumulation mechanism, SortedNet enables the training of sub-models simultaneously along with the training of the main model (without any significant extra training or inference overhead), simplifies dynamic model selection, customizes deployment during inference, and reduces the model storage requirement significantly. The versatility and scalability of SortedNet are validated through various architectures and tasks, including LLaMA, BERT, RoBERTa (NLP tasks), ResNet and MobileNet (image classification) demonstrating its superiority over existing dynamic training methods. For example, we introduce a novel adaptive self-speculative approach based on sorted-training to accelerate large language models decoding. Moreover, SortedNet is able to train 160 sub-models at once, achieving at least 96\% of the original model's performance. △ Less

Submitted 1 June, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

arXiv:2102.09259 [pdf, other]

Stable local dynamics: expansion, quasi-conformality and ergodicity

Authors: Abbas Fakhari, Meysam Nassiri, Hesam Rajabzadeh

Abstract: In this paper we study stable ergodicity of the action of groups of diffeomorphisms on smooth manifolds. The existence of such actions is known only on one dimensional manifolds. The aim of this paper is to introduce a geometric method to overcome this restriction and for constructing higher dimensional examples. In particular, we show that every closed manifold admits stably ergodic finitely gene… ▽ More In this paper we study stable ergodicity of the action of groups of diffeomorphisms on smooth manifolds. The existence of such actions is known only on one dimensional manifolds. The aim of this paper is to introduce a geometric method to overcome this restriction and for constructing higher dimensional examples. In particular, we show that every closed manifold admits stably ergodic finitely generated group actions by diffeomorphisms of class $C^{1+α}$. We also prove the stable ergodicity of certain algebraic actions including the natural action of a generic pair of matrices near the identity on a sphere of arbitrary dimension. These are consequences of the quasi-conformal blender, a local and stable mechanism/phenomenon introduced in this paper, which encapsulates our method to prove stable local ergodicity by providing quasi-conformal orbits with fine controlled geometry. The quasi-conformal blender is developed in the context of pseudo-semigroup actions of locally defined smooth diffeomorphisms. This allows for applications in different settings, including for the smooth foliations of arbitrary codimension. △ Less

Submitted 2 March, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: 37 pages, 5 figures

arXiv:1803.01562 [pdf]

Local Distance Metric Learning for Nearest Neighbor Algorithm

Authors: Hossein Rajabzadeh, Mansoor Zolghadri Jahromi, Mohammad Sadegh Zare, Mostafa Fakhrahmad

Abstract: Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature space. Regarding that, this paper proposes a novel local distance metric learning method, namely Local Mahalanobis Distance Learning (LMDL), in order to enhance the… ▽ More Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature space. Regarding that, this paper proposes a novel local distance metric learning method, namely Local Mahalanobis Distance Learning (LMDL), in order to enhance the performance of the nearest neighbor classifier. LMDL considers the neighborhood influence and learns multiple distance metrics for a reduced set of input samples. The reduced set is called as prototypes which try to preserve local discriminative information as much as possible. The proposed LMDL can be kernelized very easily, which is significantly desirable in the case of highly nonlinear data. The quality as well as the efficiency of the proposed method assesses through a set of different experiments on various datasets and the obtained results show that LDML as well as the kernelized version is superior to the other related state-of-the-art methods. △ Less

Submitted 15 March, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: 13 pages

Showing 1–6 of 6 results for author: Rajabzadeh, H