Search | arXiv e-print repository

Towards Scalable Automated Alignment of LLMs: A Survey

Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2006.10533 [pdf]

Endpoints for randomized controlled clinical trials for COVID-19 treatments

Authors: Lori E Dodd, Dean Follmann, **g Wang, Franz Koenig, Lisa L Korn, Christian Schoergenhofer, Michael Proschan, Sally Hunsberger, Tyler Bonnett, Mat Makowski, Drifa Belhadi, Yeming Wang, Bin Cao, France Mentre, Thomas Jaki

Abstract: Introduction: Endpoint choice for randomized controlled trials of treatments for COVID-19 is complex. A new disease brings many uncertainties, but trials must start rapidly. COVID-19 is heterogeneous, ranging from mild disease that improves within days to critical disease that can last weeks and can end in death. While improvement in mortality would provide unquestionable evidence about clinical s… ▽ More Introduction: Endpoint choice for randomized controlled trials of treatments for COVID-19 is complex. A new disease brings many uncertainties, but trials must start rapidly. COVID-19 is heterogeneous, ranging from mild disease that improves within days to critical disease that can last weeks and can end in death. While improvement in mortality would provide unquestionable evidence about clinical significance of a treatment, sample sizes for a study evaluating mortality are large and may be impractical. Furthermore, patient states in between "cure" and "death" represent meaningful distinctions. Clinical severity scores have been proposed as an alternative. However, the appropriate summary measure for severity scores has been the subject of debate, particularly in relating to the uncertainty about the time-course of COVID-19. Outcomes measured at fixed time-points may risk missing the time of clinical benefit. An endpoint such as time-to-improvement (or recovery), avoids the timing problem. However, some have argued that power losses will result from reducing the ordinal scale to a binary state of "recovered" vs "not recovered." Methods: We evaluate statistical power for possible trial endpoints for COVID-19 treatment trials using simulation models and data from two recent COVID-19 treatment trials. Results: Power for fixed-time point methods depends heavily on the time selected for evaluation. Time-to-improvement (or recovery) analyses do not specify a time-point. Time-to-event approaches have reasonable statistical power, even when compared to a fixed time-point method evaluated at the optimal time. Discussion: Time-to-event analyses methods have advantages in the COVID-19 setting, unless the optimal time for evaluating treatment effect is known in advance. Even when the optimal time is known, a time-to-event approach may increase power for interim analyses. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:1811.05072 [pdf, other]

Private Model Compression via Knowledge Distillation

Authors: Ji Wang, Weidong Bao, Lichao Sun, Xiaomin Zhu, Bokai Cao, Philip S. Yu

Abstract: The soaring demand for intelligent mobile applications calls for deploying powerful deep neural networks (DNNs) on mobile devices. However, the outstanding performance of DNNs notoriously relies on increasingly complex models, which in turn is associated with an increase in computational expense far surpassing mobile devices' capacity. What is worse, app service providers need to collect and utili… ▽ More The soaring demand for intelligent mobile applications calls for deploying powerful deep neural networks (DNNs) on mobile devices. However, the outstanding performance of DNNs notoriously relies on increasingly complex models, which in turn is associated with an increase in computational expense far surpassing mobile devices' capacity. What is worse, app service providers need to collect and utilize a large volume of users' data, which contain sensitive information, to build the sophisticated DNN models. Directly deploying these models on public mobile devices presents prohibitive privacy risk. To benefit from the on-device deep learning without the capacity and privacy concerns, we design a private model compression framework RONA. Following the knowledge distillation paradigm, we jointly use hint learning, distillation learning, and self learning to train a compact and fast neural network. The knowledge distilled from the cumbersome model is adaptively bounded and carefully perturbed to enforce differential privacy. We further propose an elegant query sample selection method to reduce the number of queries and control the privacy loss. A series of empirical evaluations as well as the implementation on an Android mobile device show that RONA can not only compress cumbersome models efficiently but also provide a strong privacy guarantee. For example, on SVHN, when a meaningful $(9.83,10^{-6})$-differential privacy is guaranteed, the compact model trained by RONA can obtain 20$\times$ compression ratio and 19$\times$ speed-up with merely 0.97% accuracy loss. △ Less

Submitted 12 November, 2018; originally announced November 2018.

Comments: Conference version accepted by AAAI'19

arXiv:1809.04110 [pdf, other]

Joint Embedding of Meta-Path and Meta-Graph for Heterogeneous Information Networks

Authors: Lichao Sun, Lifang He, Zhipeng Huang, Bokai Cao, Congying Xia, Xiaokai Wei, Philip S. Yu

Abstract: Meta-graph is currently the most powerful tool for similarity search on heterogeneous information networks,where a meta-graph is a composition of meta-paths that captures the complex structural information. However, current relevance computing based on meta-graph only considers the complex structural information, but ignores its embedded meta-paths information. To address this problem, we proposeM… ▽ More Meta-graph is currently the most powerful tool for similarity search on heterogeneous information networks,where a meta-graph is a composition of meta-paths that captures the complex structural information. However, current relevance computing based on meta-graph only considers the complex structural information, but ignores its embedded meta-paths information. To address this problem, we proposeMEta-GrAph-based network embedding models, called MEGA and MEGA++, respectively. The MEGA model uses normalized relevance or similarity measures that are derived from a meta-graph and its embedded meta-paths between nodes simultaneously, and then leverages tensor decomposition method to perform node embedding. The MEGA++ further facilitates the use of coupled tensor-matrix decomposition method to obtain a joint embedding for nodes, which simultaneously considers the hidden relations of all meta information of a meta-graph.Extensive experiments on two real datasets demonstrate thatMEGA and MEGA++ are more effective than state-of-the-art approaches. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Comments: accepted by ICBK 18

arXiv:1809.03428 [pdf, other]

Not Just Privacy: Improving Performance of Private Deep Learning in Mobile Cloud

Authors: Ji Wang, Jianguo Zhang, Weidong Bao, Xiaomin Zhu, Bokai Cao, Philip S. Yu

Abstract: The increasing demand for on-device deep learning services calls for a highly efficient manner to deploy deep neural networks (DNNs) on mobile devices with limited capacity. The cloud-based solution is a promising approach to enabling deep learning applications on mobile devices where the large portions of a DNN are offloaded to the cloud. However, revealing data to the cloud leads to potential pr… ▽ More The increasing demand for on-device deep learning services calls for a highly efficient manner to deploy deep neural networks (DNNs) on mobile devices with limited capacity. The cloud-based solution is a promising approach to enabling deep learning applications on mobile devices where the large portions of a DNN are offloaded to the cloud. However, revealing data to the cloud leads to potential privacy risk. To benefit from the cloud data center without the privacy risk, we design, evaluate, and implement a cloud-based framework ARDEN which partitions the DNN across mobile devices and cloud data centers. A simple data transformation is performed on the mobile device, while the resource-hungry training and the complex inference rely on the cloud data center. To protect the sensitive information, a lightweight privacy-preserving mechanism consisting of arbitrary data nullification and random noise addition is introduced, which provides strong privacy guarantee. A rigorous privacy budget analysis is given. Nonetheless, the private perturbation to the original data inevitably has a negative impact on the performance of further inference on the cloud side. To mitigate this influence, we propose a noisy training method to enhance the cloud-side network robustness to perturbed data. Through the sophisticated design, ARDEN can not only preserve privacy but also improve the inference performance. To validate the proposed ARDEN, a series of experiments based on three image datasets and a real mobile application are conducted. The experimental results demonstrate the effectiveness of ARDEN. Finally, we implement ARDEN on a demo system to verify its practicality. △ Less

Submitted 5 January, 2019; v1 submitted 10 September, 2018; originally announced September 2018.

Comments: Conference version accepted by KDD'18

arXiv:1806.07703 [pdf, other]

Multi-View Multi-Graph Embedding for Brain Network Clustering Analysis

Authors: Ye Liu, Lifang He, Bokai Cao, Philip S. Yu, Ann B. Ragin, Alex D. Leow

Abstract: Network analysis of human brain connectivity is critically important for understanding brain function and disease states. Embedding a brain network as a whole graph instance into a meaningful low-dimensional representation can be used to investigate disease mechanisms and inform therapeutic interventions. Moreover, by exploiting information from multiple neuroimaging modalities or views, we are ab… ▽ More Network analysis of human brain connectivity is critically important for understanding brain function and disease states. Embedding a brain network as a whole graph instance into a meaningful low-dimensional representation can be used to investigate disease mechanisms and inform therapeutic interventions. Moreover, by exploiting information from multiple neuroimaging modalities or views, we are able to obtain an embedding that is more useful than the embedding learned from an individual view. Therefore, multi-view multi-graph embedding becomes a crucial task. Currently, only a few studies have been devoted to this topic, and most of them focus on the vector-based strategy which will cause structural information contained in the original graphs lost. As a novel attempt to tackle this problem, we propose Multi-view Multi-graph Embedding (M2E) by stacking multi-graphs into multiple partially-symmetric tensors and using tensor techniques to simultaneously leverage the dependencies and correlations among multi-view and multi-graph brain networks. Extensive experiments on real HIV and bipolar disorder brain network datasets demonstrate the superior performance of M2E on clustering brain networks by leveraging the multi-view multi-graph interactions. △ Less

Submitted 19 June, 2018; originally announced June 2018.

arXiv:1803.08978 [pdf, other]

Broad Learning for Healthcare

Authors: Bokai Cao

Abstract: A broad spectrum of data from different modalities are generated in the healthcare domain every day, including scalar data (e.g., clinical measures collected at hospitals), tensor data (e.g., neuroimages analyzed by research institutes), graph data (e.g., brain connectivity networks), and sequence data (e.g., digital footprints recorded on smart sensors). Capability for modeling information from t… ▽ More A broad spectrum of data from different modalities are generated in the healthcare domain every day, including scalar data (e.g., clinical measures collected at hospitals), tensor data (e.g., neuroimages analyzed by research institutes), graph data (e.g., brain connectivity networks), and sequence data (e.g., digital footprints recorded on smart sensors). Capability for modeling information from these heterogeneous data sources is potentially transformative for investigating disease mechanisms and for informing therapeutic interventions. Our works in this thesis attempt to facilitate healthcare applications in the setting of broad learning which focuses on fusing heterogeneous data sources for a variety of synergistic knowledge discovery and machine learning tasks. We are generally interested in computer-aided diagnosis, precision medicine, and mobile health by creating accurate user profiles which include important biomarkers, brain connectivity patterns, and latent representations. In particular, our works involve four different data mining problems with application to the healthcare domain: multi-view feature selection, subgraph pattern mining, brain network embedding, and multi-view sequence prediction. △ Less

Submitted 23 March, 2018; originally announced March 2018.

Comments: PhD Thesis, University of Illinois at Chicago, March 2018

arXiv:1508.04554 [pdf, other]

doi 10.1109/ICDM.2015.50

Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification

Authors: Bokai Cao, Xiangnan Kong, **gyuan Zhang, Philip S. Yu, Ann B. Ragin

Abstract: Mining discriminative subgraph patterns from graph data has attracted great interest in recent years. It has a wide variety of applications in disease diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the graph representation alone. However, in many real-world applications, the side information is available along with the graph data. For example, for neurological disorder i… ▽ More Mining discriminative subgraph patterns from graph data has attracted great interest in recent years. It has a wide variety of applications in disease diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the graph representation alone. However, in many real-world applications, the side information is available along with the graph data. For example, for neurological disorder identification, in addition to the brain networks derived from neuroimaging data, hundreds of clinical, immunologic, serologic and cognitive measures may also be documented for each subject. These measures compose multiple side views encoding a tremendous amount of supplemental information for diagnostic purposes, yet are often ignored. In this paper, we study the problem of discriminative subgraph selection using multiple side views and propose a novel solution to find an optimal set of subgraph features for graph classification by exploring a plurality of side views. We derive a feature evaluation criterion, named gSide, to estimate the usefulness of subgraph patterns based upon side views. Then we develop a branch-and-bound algorithm, called gMSV, to efficiently search for optimal subgraph features by integrating the subgraph mining process and the procedure of discriminative feature selection. Empirical studies on graph classification tasks for neurological disorders using brain networks demonstrate that subgraph patterns selected by the multi-side-view guided subgraph selection approach can effectively boost graph classification performances and are relevant to disease diagnosis. △ Less

Submitted 19 August, 2015; originally announced August 2015.

Comments: in Proceedings of IEEE International Conference on Data Mining (ICDM) 2015

arXiv:1508.01023 [pdf, other]

A review of heterogeneous data mining for brain disorders

Authors: Bokai Cao, Xiangnan Kong, Philip S. Yu

Abstract: With rapid advances in neuroimaging techniques, the research on brain disorder identification has become an emerging area in the data mining community. Brain disorder data poses many unique challenges for data mining research. For example, the raw data generated by neuroimaging experiments is in tensor representations, with typical characteristics of high dimensionality, structural complexity and… ▽ More With rapid advances in neuroimaging techniques, the research on brain disorder identification has become an emerging area in the data mining community. Brain disorder data poses many unique challenges for data mining research. For example, the raw data generated by neuroimaging experiments is in tensor representations, with typical characteristics of high dimensionality, structural complexity and nonlinear separability. Furthermore, brain connectivity networks can be constructed from the tensor data, embedding subtle interactions between brain regions. Other clinical measures are usually available reflecting the disease status from different perspectives. It is expected that integrating complementary information in the tensor data and the brain network data, and incorporating other clinical parameters will be potentially transformative for investigating disease mechanisms and for informing therapeutic interventions. Many research efforts have been devoted to this area. They have achieved great success in various applications, such as tensor-based modeling, subgraph pattern mining, multi-view feature analysis. In this paper, we review some recent data mining methods that are used for analyzing brain disorders. △ Less

Submitted 5 August, 2015; originally announced August 2015.

arXiv:1506.01110 [pdf, other]

doi 10.1145/2835776.2835777

Multi-View Factorization Machines

Authors: Bokai Cao, Hucheng Zhou, Guoqiang Li, Philip S. Yu

Abstract: For a learning task, data can usually be collected from different sources or be represented from multiple views. For example, laboratory results from different medical examinations are available for disease diagnosis, and each of them can only reflect the health state of a person from a particular aspect/view. Therefore, different views provide complementary information for learning tasks. An effe… ▽ More For a learning task, data can usually be collected from different sources or be represented from multiple views. For example, laboratory results from different medical examinations are available for disease diagnosis, and each of them can only reflect the health state of a person from a particular aspect/view. Therefore, different views provide complementary information for learning tasks. An effective integration of the multi-view information is expected to facilitate the learning performance. In this paper, we propose a general predictor, named multi-view machines (MVMs), that can effectively include all the possible interactions between features from multiple views. A joint factorization is embedded for the full-order interaction parameters which allows parameter estimation under sparsity. Moreover, MVMs can work in conjunction with different loss functions for a variety of machine learning tasks. A stochastic gradient descent method is presented to learn the MVM model. We further illustrate the advantages of MVMs through comparison with other methods for multi-view classification, including support vector machines (SVMs), support tensor machines (STMs) and factorization machines (FMs). △ Less

Submitted 23 March, 2018; v1 submitted 2 June, 2015; originally announced June 2015.

Comments: WSDM 2016

ACM Class: H.2.8

arXiv:1305.4433 [pdf, other]

Meta Path-Based Collective Classification in Heterogeneous Information Networks

Authors: Xiangnan Kong, Bokai Cao, Philip S. Yu, Ying Ding, David J. Wild

Abstract: Collective classification has been intensively studied due to its impact in many important applications, such as web mining, bioinformatics and citation analysis. Collective classification approaches exploit the dependencies of a group of linked objects whose class labels are correlated and need to be predicted simultaneously. In this paper, we focus on studying the collective classification probl… ▽ More Collective classification has been intensively studied due to its impact in many important applications, such as web mining, bioinformatics and citation analysis. Collective classification approaches exploit the dependencies of a group of linked objects whose class labels are correlated and need to be predicted simultaneously. In this paper, we focus on studying the collective classification problem in heterogeneous networks, which involves multiple types of data objects interconnected by multiple types of links. Intuitively, two objects are correlated if they are linked by many paths in the network. However, most existing approaches measure the dependencies among objects through directly links or indirect links without considering the different semantic meanings behind different paths. In this paper, we study the collective classification problem taht is defined among the same type of objects in heterogenous networks. Moreover, by considering different linkage paths in the network, one can capture the subtlety of different types of dependencies among objects. We introduce the concept of meta-path based dependencies among objects, where a meta path is a path consisting a certain sequence of linke types. We show that the quality of collective classification results strongly depends upon the meta paths used. To accommodate the large network size, a novel solution, called HCC (meta-path based Heterogenous Collective Classification), is developed to effectively assign labels to a group of instances that are interconnected through different meta-paths. The proposed HCC model can capture different types of dependencies among objects with respect to different meta paths. Empirical studies on real-world networks demonstrate that effectiveness of the proposed meta path-based collective classification approach. △ Less

Submitted 20 May, 2013; originally announced May 2013.

Showing 1–11 of 11 results for author: Cao, B