Search | arXiv e-print repository

Computational Limits of Low-Rank Adaptation (LoRA) for Transformer-Based Models

Authors: Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo, Zhao Song, Han Liu

Abstract: We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of n… ▽ More We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup. This allows us to (i) identify a phase transition behavior and (ii) prove the existence of nearly linear algorithms by controlling the LoRA update computation term by term, assuming the Strong Exponential Time Hypothesis (SETH). For the former, we identify a sharp transition in the efficiency of all possible rank-$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence $\mathbf{X}$, pretrained weights $\mathbf{W^\star}$, and adapter matrices $α\mathbf{B} \mathbf{A} / r$. Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold. For the latter, we prove the existence of nearly linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations. To showcase our theory, we consider two practical scenarios: partial (e.g., only $\mathbf{W}_V$ and $\mathbf{W}_Q$) and full adaptations (e.g., $\mathbf{W}_Q$, $\mathbf{W}_V$, and $\mathbf{W}_K$) of weights in attention heads. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2404.07323 [pdf, other]

Surrogate modeling for probability distribution estimation:uniform or adaptive design?

Authors: Maijia Su, Ziqi Wang, Oreste Salvatore Bursi, Marco Broccardo

Abstract: The active learning (AL) technique, one of the state-of-the-art methods for constructing surrogate models, has shown high accuracy and efficiency in forward uncertainty quantification (UQ) analysis. This paper provides a comprehensive study on AL-based global surrogates for computing the full distribution function, i.e., the cumulative distribution function (CDF) and the complementary CDF (CCDF).… ▽ More The active learning (AL) technique, one of the state-of-the-art methods for constructing surrogate models, has shown high accuracy and efficiency in forward uncertainty quantification (UQ) analysis. This paper provides a comprehensive study on AL-based global surrogates for computing the full distribution function, i.e., the cumulative distribution function (CDF) and the complementary CDF (CCDF). To this end, we investigate the three essential components for building surrogates, i.e., types of surrogate models, enrichment methods for experimental designs, and stop** criteria. For each component, we choose several representative methods and study their desirable configurations. In addition, we devise a uniform design (i.e., space-filling design) as a baseline for measuring the improvement of using AL. Combining all the representative methods, a total of 1,920 UQ analyses are carried out to solve 16 benchmark examples. The performance of the selected strategies is evaluated based on accuracy and efficiency. In the context of full distribution estimation, this study concludes that (i) AL techniques cannot provide a systematic improvement compared with uniform designs, (ii) the recommended surrogate modeling methods depend on the features of the problems (especially the local nonlinearity), target accuracy, and computational budget. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2401.03747 [pdf, other]

The Importance of Corner Frequency in Site-Based Stochastic Ground Motion Models

Authors: Maijia Su, Mayssa Dabaghi, Marco Broccardo

Abstract: Synthetic ground motions (GMs) play a fundamental role in both deterministic and probabilistic seismic engineering assessments. This paper shows that the family of filtered and modulated white noise stochastic GM models overlooks a key parameter -- the high-pass filter's corner frequency, $f_c$. In the simulated motions, this causes significant distortions in the long-period range of the linear-re… ▽ More Synthetic ground motions (GMs) play a fundamental role in both deterministic and probabilistic seismic engineering assessments. This paper shows that the family of filtered and modulated white noise stochastic GM models overlooks a key parameter -- the high-pass filter's corner frequency, $f_c$. In the simulated motions, this causes significant distortions in the long-period range of the linear-response spectra and in the linear-response spectral correlations. To address this, we incorporate $f_c$ as an explicitly fitted parameter in a site-based stochastic model. We optimize $f_c$ by individually matching the long-period linear-response spectrum (i.e., $Sa(T)$ for $T \geq 1$s) of synthetic GMs with that of each recorded GM. We show that by fitting $f_c$ the resulting stochastically simulated GMs can precisely capture the spectral amplitudes, variability (i.e., variances of $\log(Sa(T))$), and the correlation structure (i.e., correlation of $\log(Sa(T))$ between distinct periods $T_1$ and $T_2$) of recorded GMs. To quantify the impact of $f_c$, a sensitivity analysis is conducted through linear regression. This regression relates the logarithmic linear-response spectrum ($\log(Sa(T))$) to seven GM parameters, including the optimized $f_c$. The results indicate that the variance of $f_c$ observed in natural GMs, along with its correlation with the other GM parameters, accounts for 26\% of the spectral variability in long periods. Neglecting either the $f_c$ variance or $f_c$ correlation typically results in an important overestimation of the linear-response spectral correlation. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 16 pages, 10 figures

arXiv:2309.09872 [pdf, other]

Moment-assisted Subsampling based Maximum Likelihood Estimator

Authors: Miaomiao Su, Qihua Wang, Ruoyu Wang

Abstract: This paper proposes a moment-assisted subsampling method which can improve the estimation efficiency of existing subsampling estimators. The motivation behind this approach stems from the fact that sample moments can be efficiently computed even if the sample size of the whole data set is huge. Through the generalized method of moments, this method incorporates informative sample moments of the wh… ▽ More This paper proposes a moment-assisted subsampling method which can improve the estimation efficiency of existing subsampling estimators. The motivation behind this approach stems from the fact that sample moments can be efficiently computed even if the sample size of the whole data set is huge. Through the generalized method of moments, this method incorporates informative sample moments of the whole data into the subsampling estimator. The moment-assisted estimator is asymptotically normal and has a smaller asymptotic variance compared to the corresponding estimator without incorporating sample moments of the whole data. The asymptotic variance of the moment-assisted estimator depends on the specific sample moments incorporated. Under the uniform subsampling probability, we derive the optimal moment that minimizes the resulting asymptotic variance in terms of Loewner order. Moreover, the moment-assisted subsampling estimator can be rapidly computed through one-step linear approximation. Simulation studies and a real data analysis were conducted to compare the proposed method with existing subsampling methods. Numerical results show that the moment-assisted subsampling method performs competitively across different settings. This suggests that incorporating the sample moments of the whole data can enhance existing subsampling technique. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2304.06292 [pdf, ps, other]

Improved Naive Bayes with Mislabeled Data

Authors: Qianhan Zeng, Yingqiu Zhu, Xuening Zhu, Feifei Wang, Weichen Zhao, Shuning Sun, Meng Su, Hansheng Wang

Abstract: Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generatin… ▽ More Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generating mechanism of incorrect labels, we optimize the corresponding log-likelihood function iteratively by using an EM algorithm. Our simulation and experiment results show that the improved Naive Bayes method greatly improves the performances of the Naive Bayes method with mislabeled data. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2112.01215 [pdf]

Adaptive Group Collaborative Artificial Bee Colony Algorithm

Authors: Haiquan Wang, Hans-DietrichHaasis, Panpan Du, Xiaobin Xu, Menghao Su, Shengjun Wen, Wenxuan Yue, Shanshan Zhang

Abstract: As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performa… ▽ More As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performance of ABC, an adaptive group collaborative ABC (AgABC) algorithm is introduced where the population in different phases is divided to specific groups and different search strategies with different abilities are assigned to the members in groups, and the member or strategy which obtains the best solution will be employed for further searching. Experimental results on benchmark functions show that the proposed algorithm with dynamic mechanism is superior to other algorithms in searching accuracy and stability. Furthermore, numerical experiments show that the proposed method can generate the optimal solution for the complex scheduling problem. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2106.02475 [pdf, ps, other]

Distributed nonparametric regression imputation for missing response problems with large-scale data

Authors: Ruoyu Wang, Miaomiao Su, Qihua Wang

Abstract: Nonparametric regression imputation is commonly used in missing data analysis. However, it suffers from the ``curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while the large-scale data size presents some challenges on the storage of data and the calculation of estimators. These challenges make the classical nonparametric regression impu… ▽ More Nonparametric regression imputation is commonly used in missing data analysis. However, it suffers from the ``curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while the large-scale data size presents some challenges on the storage of data and the calculation of estimators. These challenges make the classical nonparametric regression imputation methods no longer applicable. This motivates us to develop two distributed nonparametric regression imputation methods. One is based on kernel smoothing and the other on the sieve method. The kernel-based distributed imputation method has extremely low communication cost and the sieve-based distributed imputation method can accommodate more local machines. To illustrate the proposed imputation methods, response mean estimation is considered. Two distributed nonparametric regression imputation estimators are proposed for the response mean, which are proved to be asymptotically normal with asymptotic variances achieving the semiparametric efficiency bound. The proposed methods are evaluated through simulation studies and are illustrated by a real data analysis. △ Less

Submitted 8 January, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

Journal ref: Journal of Machine Learning Research, 2023

arXiv:2012.05677 [pdf, ps, other]

A Convex Programming Solution Based Debiased Estimator for Quantile with Missing Response and High-dimensional Covariables

Authors: Miaomiao Su, Qihua Wang

Abstract: This paper is concerned with the estimating problem of response quantile with high dimensional covariates when response is missing at random. Some existing methods define root-n consistent estimators for the response quantile. But these methods require correct specifications of both the conditional distribution of response given covariates and the selection probability function. In this paper, a d… ▽ More This paper is concerned with the estimating problem of response quantile with high dimensional covariates when response is missing at random. Some existing methods define root-n consistent estimators for the response quantile. But these methods require correct specifications of both the conditional distribution of response given covariates and the selection probability function. In this paper, a debiased method is proposed by solving a convex programming. The estimator obtained by the proposed method is asymptotically normal given a correctly specified parametric model for the condition distribution function, without the requirement to specify and estimate the selection probability function. Moreover, the proposed estimator is asymptotically more efficient than the existing estimators. The proposed method is evaluated by a simulation study and is illustrated by a real data example. △ Less

Submitted 22 June, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

arXiv:1812.01713 [pdf, other]

FineFool: Fine Object Contour Attack via Attention

Authors: **yin Chen, Haibin Zheng, Hui Xiong, Mengmeng Su

Abstract: Machine learning models have been shown vulnerable to adversarial attacks launched by adversarial examples which are carefully crafted by attacker to defeat classifiers. Deep learning models cannot escape the attack either. Most of adversarial attack methods are focused on success rate or perturbations size, while we are more interested in the relationship between adversarial perturbation and the… ▽ More Machine learning models have been shown vulnerable to adversarial attacks launched by adversarial examples which are carefully crafted by attacker to defeat classifiers. Deep learning models cannot escape the attack either. Most of adversarial attack methods are focused on success rate or perturbations size, while we are more interested in the relationship between adversarial perturbation and the image itself. In this paper, we put forward a novel adversarial attack based on contour, named FineFool. Finefool not only has better attack performance compared with other state-of-art white-box attacks in aspect of higher attack success rate and smaller perturbation, but also capable of visualization the optimal adversarial perturbation via attention on object contour. To the best of our knowledge, Finefool is for the first time combines the critical feature of the original clean image with the optimal perturbations in a visible manner. Inspired by the correlations between adversarial perturbations and object contour, slighter perturbations is produced via focusing on object contour features, which is more imperceptible and difficult to be defended, especially network add-on defense methods with the trade-off between perturbations filtering and contour feature loss. Compared with existing state-of-art attacks, extensive experiments are conducted to show that Finefool is capable of efficient attack against defensive deep models. △ Less

Submitted 1 December, 2018; originally announced December 2018.

Showing 1–9 of 9 results for author: Su, M