-
Towards Scalable Automated Alignment of LLMs: A Survey
Authors:
Boxi Cao,
Keming Lu,
Xinyu Lu,
Jiawei Chen,
Mengjie Ren,
Hao Xiang,
Peilin Liu,
Yaojie Lu,
Ben He,
Xianpei Han,
Le Sun,
Hongyu Lin,
Bowen Yu
Abstract:
Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach…
▽ More
Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
SARMA: Scalable Low-Rank High-Dimensional Autoregressive Moving Averages via Tensor Decomposition
Authors:
Feiqing Huang,
Kexin Lu,
Yao Zheng
Abstract:
Existing models for high-dimensional time series are overwhelmingly developed within the finite-order vector autoregressive (VAR) framework, whereas the more flexible vector autoregressive moving averages (VARMA) have been much less considered. This paper introduces a high-dimensional model for capturing VARMA dynamics, namely the Scalable ARMA (SARMA) model, by combining novel reparameterization…
▽ More
Existing models for high-dimensional time series are overwhelmingly developed within the finite-order vector autoregressive (VAR) framework, whereas the more flexible vector autoregressive moving averages (VARMA) have been much less considered. This paper introduces a high-dimensional model for capturing VARMA dynamics, namely the Scalable ARMA (SARMA) model, by combining novel reparameterization and tensor decomposition techniques. To ensure identifiability and computational tractability, we first consider a reparameterization of the VARMA model and discover that this interestingly amounts to a Tucker-low-rank structure for the AR coefficient tensor along the temporal dimension. Motivated by this finding, we further consider Tucker decomposition across the response and predictor dimensions of the AR coefficient tensor, enabling factor extraction across variables and time lags. Additionally, we consider sparsity assumptions on the factor loadings to accomplish automatic variable selection and greater estimation efficiency. For the proposed model, we develop both rank-constrained and sparsity-inducing estimators. Algorithms and model selection methods are also provided. Simulation studies and empirical examples confirm the validity of our theory and advantages of our approaches over existing competitors.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Supervised Factor Modeling for High-Dimensional Linear Time Series
Authors:
Feiqing Huang,
Kexin Lu,
Guodong Li
Abstract:
Motivated by Tucker tensor decomposition, this paper imposes low-rank structures to the column and row spaces of coefficient matrices in a multivariate infinite-order vector autoregression (VAR), which leads to a supervised factor model with two factor modelings being conducted to responses and predictors simultaneously. Interestingly, the stationarity condition implies an intrinsic weak group spa…
▽ More
Motivated by Tucker tensor decomposition, this paper imposes low-rank structures to the column and row spaces of coefficient matrices in a multivariate infinite-order vector autoregression (VAR), which leads to a supervised factor model with two factor modelings being conducted to responses and predictors simultaneously. Interestingly, the stationarity condition implies an intrinsic weak group sparsity mechanism of infinite-order VAR, and hence a rank-constrained group Lasso estimation is considered for high-dimensional linear time series. Its non-asymptotic properties are discussed thoughtfully by balancing the estimation, approximation and truncation errors. Moreover, an alternating gradient descent algorithm with thresholding is designed to search for high-dimensional estimates, and its theoretical justifications, including statistical and convergence analysis, are also provided. Theoretical and computational properties of the proposed methodology are verified by simulation experiments, and the advantages over existing methods are demonstrated by two real examples.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
HAR-Ito models and high-dimensional HAR modeling for high-frequency data
Authors:
Huiling Yuan,
Kexin Lu,
Yifeng Guo,
Guodong Li
Abstract:
It is an important task to model realized volatilities for high-frequency data in finance and economics and, as arguably the most popular model, the heterogeneous autoregressive (HAR) model has dominated the applications in this area. However, this model suffers from three drawbacks: (i.) its heterogeneous volatility components are linear combinations of daily realized volatilities with fixed weig…
▽ More
It is an important task to model realized volatilities for high-frequency data in finance and economics and, as arguably the most popular model, the heterogeneous autoregressive (HAR) model has dominated the applications in this area. However, this model suffers from three drawbacks: (i.) its heterogeneous volatility components are linear combinations of daily realized volatilities with fixed weights, which limit its flexibility for different types of assets, (ii.) it is still unknown what is the high-frequency probabilistic structure for this model, as well as many other HAR-type models in the literature, and (iii.) there is no high-dimensional inference tool for HAR modeling although it is common to encounter many assets in real applications. To overcome these drawbacks, this paper proposes a multilinear low-rank HAR model by using tensor techniques, where a data-driven method is adopted to automatically select the heterogeneous components. In addition, HAR-Itô models are introduced to interpret the corresponding high-frequency dynamics, as well as those of other HAR-type models. Moreover, non-asymptotic properties of the high-dimensional HAR modeling are established, and a projected gradient descent algorithm with theoretical justifications is suggested to search for estimates. Theoretical and computational properties of the proposed method are verified by simulation studies, and the necessity of using the data-driven method for heterogeneous components is illustrated in real data analysis.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Calibration for multivariate Lévy-driven Ornstein-Uhlenbeck processes with applications to weak subordination
Authors:
Kevin W. Lu
Abstract:
Consider a multivariate Lévy-driven Ornstein-Uhlenbeck process where the stationary distribution or background driving Lévy process is from a parametric family. We derive the likelihood function assuming that the innovation term is absolutely continuous. Two examples are studied in detail: the process where the stationary distribution or background driving Lévy process is given by a weak variance…
▽ More
Consider a multivariate Lévy-driven Ornstein-Uhlenbeck process where the stationary distribution or background driving Lévy process is from a parametric family. We derive the likelihood function assuming that the innovation term is absolutely continuous. Two examples are studied in detail: the process where the stationary distribution or background driving Lévy process is given by a weak variance alpha-gamma process, which is a multivariate generalisation of the variance gamma process created using weak subordination. In the former case, we give an explicit representation of the background driving Lévy process, leading to an innovation term which is discrete and continuous mixture, allowing for the exact simulation of the process, and a separate likelihood function. In the latter case, we show the innovation term is absolutely continuous. The results of a simulation study demonstrate that maximum likelihood numerically computed using Fourier inversion can be applied to accurately estimate the parameters in both cases.
△ Less
Submitted 31 August, 2021; v1 submitted 29 November, 2020;
originally announced November 2020.
-
High-throughput relation extraction algorithm development associating knowledge articles and electronic health records
Authors:
Yucong Lin,
Keming Lu,
Yulin Chen,
Chuan Hong,
Sheng Yu
Abstract:
Objective: Medical relations are the core components of medical knowledge graphs that are needed for healthcare artificial intelligence. However, the requirement of expert annotation by conventional algorithm development processes creates a major bottleneck for mining new relations. In this paper, we present Hi-RES, a framework for high-throughput relation extraction algorithm development. We also…
▽ More
Objective: Medical relations are the core components of medical knowledge graphs that are needed for healthcare artificial intelligence. However, the requirement of expert annotation by conventional algorithm development processes creates a major bottleneck for mining new relations. In this paper, we present Hi-RES, a framework for high-throughput relation extraction algorithm development. We also show that combining knowledge articles with electronic health records (EHRs) significantly increases the classification accuracy. Methods: We use relation triplets obtained from structured databases and semistructured webpages to label sentences from target corpora as positive training samples. Two methods are also provided for creating improved negative samples by combining positive samples with naïve negative samples. We propose a common model that summarizes sentence information using large-scale pretrained language models and multi-instance attention, which then joins with the concept embeddings trained from the EHRs for relation prediction. Results: We apply the Hi-RES framework to develop classification algorithms for disorder-disorder relations and disorder-location relations. Millions of sentences are created as training data. Using pretrained language models and EHR-based embeddings individually provides considerable accuracy increases over those of previous models. Joining them together further tremendously increases the accuracy to 0.947 and 0.998 for the two sets of relations, respectively, which are 10-17 percentage points higher than those of previous models. Conclusion: Hi-RES is an efficient framework for achieving high-throughput and accurate relation extraction algorithm development.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Adaptive Online Planning for Continual Lifelong Learning
Authors:
Kevin Lu,
Igor Mordatch,
Pieter Abbeel
Abstract:
We study learning control in an online reset-free lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change. Traditional model-free policy learning methods have achieved successes in difficult tasks due to their broad flexibility, but struggle in this setting, as they can activate failure modes early in their…
▽ More
We study learning control in an online reset-free lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change. Traditional model-free policy learning methods have achieved successes in difficult tasks due to their broad flexibility, but struggle in this setting, as they can activate failure modes early in their lifetimes which are difficult to recover from and face performance degradation as dynamics change. On the other hand, model-based planning methods learn and adapt quickly, but require prohibitive levels of computational resources. We present a new algorithm, Adaptive Online Planning (AOP), that achieves strong performance in this setting by combining model-based planning with model-free learning. By approximating the uncertainty of the model-free components and the planner performance, AOP is able to call upon more extensive planning only when necessary, leading to reduced computation times, while still gracefully adapting behaviors in the face of unpredictable changes in the world -- even when traditional RL fails.
△ Less
Submitted 27 June, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation
Authors:
Yanfu Yan,
Ke Lu,
Jian Xue,
Pengcheng Gao,
Jiayi Lyu
Abstract:
Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presenc…
▽ More
Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presence, or a five-level intensity according to the Facial Action Coding System. To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation. One hundred and twenty-two participants, including children, young adults and elderly people, were recorded in real-world conditions. In addition, 99,356 frames were manually labeled using Expression Quantitative Tool developed by us to quantify 9 symmetrical FACS action units, 10 asymmetrical (unilateral) FACS action units, 2 symmetrical FACS action descriptors and 2 asymmetrical FACS action descriptors, and each action unit or action descriptor is well-annotated with a floating point number between 0 and 1. To provide a baseline for use in future research, a benchmark for the regression of action unit values based on Convolutional Neural Networks are presented. We also demonstrate the potential of our FEAFA dataset for 3D facial animation. Almost all state-of-the-art algorithms for facial animation are achieved based on 3D face reconstruction. We hence propose a novel method that drives virtual characters only based on action unit value regression of the 2D video frames of source actors.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Adversarial Examples on Graph Data: Deep Insights into Attack and Defense
Authors:
Huijun Wu,
Chen Wang,
Yuriy Tyshetskiy,
Andrew Docherty,
Kai Lu,
Liming Zhu
Abstract:
Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opp…
▽ More
Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opportunities for the adversarial attacks and defenses for graph data. In this paper, we propose both attack and defense techniques. For attack, we show that the discreteness problem could easily be resolved by introducing integrated gradients which could accurately reflect the effect of perturbing certain features or edges while still benefiting from the parallel computations. For defense, we observe that the adversarially manipulated graph for the targeted attack differs from normal graphs statistically. Based on this observation, we propose a defense approach which inspects the graph and recovers the potential adversarial perturbations. Our experiments on a number of datasets show the effectiveness of the proposed methods.
△ Less
Submitted 22 May, 2019; v1 submitted 4 March, 2019;
originally announced March 2019.
-
Calibration for Weak Variance-Alpha-Gamma Processes
Authors:
Boris Buchmann,
Kevin W. Lu,
Dilip B. Madan
Abstract:
The weak variance-alpha-gamma process is a multivariate Lévy process constructed by weakly subordinating Brownian motion, possibly with correlated components with an alpha-gamma subordinator. It generalises the variance-alpha-gamma process of Semeraro constructed by traditional subordination. We compare three calibration methods for the weak variance-alpha-gamma process, method of moments, maximum…
▽ More
The weak variance-alpha-gamma process is a multivariate Lévy process constructed by weakly subordinating Brownian motion, possibly with correlated components with an alpha-gamma subordinator. It generalises the variance-alpha-gamma process of Semeraro constructed by traditional subordination. We compare three calibration methods for the weak variance-alpha-gamma process, method of moments, maximum likelihood estimation (MLE) and digital moment estimation (DME). We derive a condition for Fourier invertibility needed to apply MLE and show in our simulations that MLE produces a better fit when this condition holds, while DME produces a better fit when it is violated. We also find that the weak variance-alpha-gamma process exhibits a wider range of dependence and produces a significantly better fit than the variance-alpha-gamma process on an S&P500-FTSE100 data set, and that DME produces the best fit in this situation.
△ Less
Submitted 27 July, 2018; v1 submitted 26 January, 2018;
originally announced January 2018.
-
Two-Way Partial AUC and Its Properties
Authors:
Hanfang Yang,
Kun Lu,
Xiang Lyu,
Feifang Hu
Abstract:
When people evaluate the performance of a diagnostic test, it is important to control both True Positive Rate (TPR) and False Positive Rate (FPR). In the literature, most researchers propose the partial area under the ROC curve (pAUC) with restrictions on FPR to assess a binary classification system, which is named as FPR pAUC. It could be artificially designed to measure the area controlled by TP…
▽ More
When people evaluate the performance of a diagnostic test, it is important to control both True Positive Rate (TPR) and False Positive Rate (FPR). In the literature, most researchers propose the partial area under the ROC curve (pAUC) with restrictions on FPR to assess a binary classification system, which is named as FPR pAUC. It could be artificially designed to measure the area controlled by TPR and FPR, but is often misleading conceptually and practically. A new and intuitive method, named two-way pAUC, is provided in this paper, which focuses directly on the partial area under the ROC curve with both horizontal and vertical restrictions. We propose a nonparametric estimator of two-way pAUC, obtain its asymptotic normality properties and conduct the measure comparison by bootstrap method. Further, in order to evaluate possible covariate effects on two-way pAUC, regression analysis framework is constructed and corresponding theoretical properties are established. Simulation and real application are conducted to support our methods.
△ Less
Submitted 20 June, 2017; v1 submitted 2 August, 2015;
originally announced August 2015.
-
Multiview Hessian regularized logistic regression for action recognition
Authors:
W. Liu,
H. Liu,
D. Tao,
Y. Wang,
Ke Lu
Abstract:
With the rapid development of social media sharing, people often need to manage the growing volume of multimedia data such as large scale video classification and annotation, especially to organize those videos containing human activities. Recently, manifold regularized semi-supervised learning (SSL), which explores the intrinsic data probability distribution and then improves the generalization a…
▽ More
With the rapid development of social media sharing, people often need to manage the growing volume of multimedia data such as large scale video classification and annotation, especially to organize those videos containing human activities. Recently, manifold regularized semi-supervised learning (SSL), which explores the intrinsic data probability distribution and then improves the generalization ability with only a small number of labeled data, has emerged as a promising paradigm for semiautomatic video classification. In addition, human action videos often have multi-modal content and different representations. To tackle the above problems, in this paper we propose multiview Hessian regularized logistic regression (mHLR) for human action recognition. Compared with existing work, the advantages of mHLR lie in three folds: (1) mHLR combines multiple Hessian regularization, each of which obtained from a particular representation of instance, to leverage the exploring of local geometry; (2) mHLR naturally handle multi-view instances with multiple representations; (3) mHLR employs a smooth loss function and then can be effectively optimized. We carefully conduct extensive experiments on the unstructured social activity attribute (USAA) dataset and the experimental results demonstrate the effectiveness of the proposed multiview Hessian regularized logistic regression for human action recognition.
△ Less
Submitted 2 March, 2014;
originally announced March 2014.
-
Large-Scale Paralleled Sparse Principal Component Analysis
Authors:
W. Liu,
H. Zhang,
D. Tao,
Y. Wang,
K. Lu
Abstract:
Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance…
▽ More
Principal component analysis (PCA) is a statistical technique commonly used in multivariate data analysis. However, PCA can be difficult to interpret and explain since the principal components (PCs) are linear combinations of the original variables. Sparse PCA (SPCA) aims to balance statistical fidelity and interpretability by approximating sparse PCs whose projections capture the maximal variance of original data. In this paper we present an efficient and paralleled method of SPCA using graphics processing units (GPUs), which can process large blocks of data in parallel. Specifically, we construct parallel implementations of the four optimization formulations of the generalized power method of SPCA (GP-SPCA), one of the most efficient and effective SPCA approaches, on a GPU. The parallel GPU implementation of GP-SPCA (using CUBLAS) is up to eleven times faster than the corresponding CPU implementation (using CBLAS), and up to 107 times faster than a MatLab implementation. Extensive comparative experiments in several real-world datasets confirm that SPCA offers a practical advantage.
△ Less
Submitted 20 December, 2013;
originally announced December 2013.