-
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
Authors:
Xinglin Pan Wenxiang Lin,
Shaohuai Shi,
Xiaowen Chu,
Weinong Sun,
Bo Li
Abstract:
Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, th…
▽ More
Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, the training efficiency is hindered by communication costs introduced by these parallel paradigms. To address this limitation, we propose Parm, a system that accelerates MP+EP+ESP training by designing two dedicated schedules for placing communication tasks. The proposed schedules eliminate redundant computations and communications and enable overlaps between intra-node and inter-node communications, ultimately reducing the overall training time. As the two schedules are not mutually exclusive, we provide comprehensive theoretical analyses and derive an automatic and accurate solution to determine which schedule should be applied in different scenarios. Experimental results on an 8-GPU server and a 32-GPU cluster demonstrate that Parm outperforms the state-of-the-art MoE training system, DeepSpeed-MoE, achieving 1.13$\times$ to 5.77$\times$ speedup on 1296 manually configured MoE layers and approximately 3$\times$ improvement on two real-world MoE models based on BERT and GPT-2.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
An Empirical Study of Unit Test Generation with Large Language Models
Authors:
Lin Yang,
Chen Yang,
Shutao Gao,
Wei**g Wang,
Bo Wang,
Qihao Zhu,
Xiao Chu,
Jianyi Zhou,
Guangtai Liang,
Qianxiang Wang,
Junjie Chen
Abstract:
Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s…
▽ More
Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting strategies, leaving the capabilities of advanced open-source LLMs with various prompting settings unexplored. Particularly, open-source LLMs offer advantages in data privacy protection and have demonstrated superior performance in some tasks. Moreover, effective prompting is crucial for maximizing LLMs' capabilities. In this paper, we conduct the first empirical study to fill this gap, based on 17 Java projects, five widely-used open-source LLMs with different structures and parameter sizes, and comprehensive evaluation metrics. Our findings highlight the significant influence of various prompt factors, show the performance of open-source LLMs compared to the commercial GPT-4 and the traditional Evosuite, and identify limitations in LLM-based unit test generation. We then derive a series of implications from our study to guide future research and practical use of LLM-based unit test generation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Generating and Evolving Reward Functions for Highway Driving with Large Language Models
Authors:
Xu Han,
Qiannan Yang,
Xianda Chen,
Xiaowen Chu,
Meixin Zhu
Abstract:
Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in a…
▽ More
Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models
Authors:
Peijie Dong,
Lujun Li,
Zhenheng Tang,
Xiang Liu,
Xinglin Pan,
Qiang Wang,
Xiaowen Chu
Abstract:
Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. How…
▽ More
Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: \url{https://github.com/pprp/Pruner-Zero}.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories
Authors:
Tianlong Wang,
Xianfeng Jiao,
Yifan He,
Zhongzhi Chen,
Yinghao Zhu,
Xu Chu,
Junyi Gao,
Yasha Wang,
Liantao Ma
Abstract:
Recent studies have indicated that Large Language Models (LLMs) harbor an inherent understanding of truthfulness, yet often fail to express fully and generate false statements. This gap between "knowing" and "telling" poses a challenge for ensuring the truthfulness of generated content. To address this, we introduce Adaptive Activation Steering (ACT), a tuning-free method that adaptively shift LLM…
▽ More
Recent studies have indicated that Large Language Models (LLMs) harbor an inherent understanding of truthfulness, yet often fail to express fully and generate false statements. This gap between "knowing" and "telling" poses a challenge for ensuring the truthfulness of generated content. To address this, we introduce Adaptive Activation Steering (ACT), a tuning-free method that adaptively shift LLM's activations in "truthful" direction during inference. ACT addresses diverse categories of hallucinations by utilizing diverse steering vectors and adjusting the steering intensity adaptively. Applied as an add-on across various models, ACT significantly improves truthfulness in LLaMA ($\uparrow$ 142\%), LLaMA2 ($\uparrow$ 24\%), Alpaca ($\uparrow$ 36\%), Vicuna ($\uparrow$ 28\%), and LLaMA2-Chat ($\uparrow$ 19\%). Furthermore, we verify ACT's scalability across larger models (13B, 33B, 65B), underscoring the adaptability of ACT to large-scale language models.
△ Less
Submitted 26 May, 2024;
originally announced June 2024.
-
SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches
Authors:
Renzhi Wu,
Pramod Chunduri,
Dristi J Shah,
Ashmitha Julius Aravind,
Ali Payani,
Xu Chu,
Joy Arulraj,
Kexin Rong
Abstract:
In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajec…
▽ More
In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.
△ Less
Submitted 30 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
World Models for General Surgical Gras**
Authors:
Hongbin Lin,
Bin Li,
Chun Wai Wong,
Juan Rojas,
Xiangyu Chu,
Kwok Wai Samuel Au
Abstract:
Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor poli…
▽ More
Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor policy for surgical gras**, enhancing both generality and robustness. In particular, a novel method is proposed to estimate the values and uncertainties of depth pixels for a rigid-link object's inaccurate region based on the empirical prior of the object's size; both depth and mask images of task objects are encoded to a single compact 3-channel image (size: 64x64x3) by dynamically zooming in the mask regions, minimizing the information loss. The learned controller's effectiveness is extensively evaluated in simulation and in a real robot. Our learned visuomotor policy handles: i) unseen objects, including 5 types of target gras** objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control. Note that we are the first work to achieve a unified surgical control system that grasps diverse surgical objects using different robot grippers on real robots in complex surgery scenes (average success rate: 69%). Our system also demonstrates significant robustness across 6 conditions including background variation, target disturbance, camera pose variation, kinematic control error, image noise, and re-gras** after the gripped target object drops from the gripper. Videos and codes can be found on our project page: https://linhongbin.github.io/gas/.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Authors:
Shengyuan Ye,
Jiangsu Du,
Liekang Zeng,
Wenzhong Ou,
Xiaowen Chu,
Yutong Lu,
Xu Chen
Abstract:
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recogniz…
▽ More
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlap** of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to 2.5x end-to-end latency reduction.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus
Authors:
Zhaochen Liu,
Limeng Qiao,
Xiangxiang Chu,
Tingting Jiang
Abstract:
Aiming to predict the complete shapes of partially occluded objects, amodal segmentation is an important step towards visual intelligence. With crucial significance, practical prior knowledge derives from sufficient training, while limited amodal annotations pose challenges to achieve better performance. To tackle this problem, utilizing the mighty priors accumulated in the foundation model, we pr…
▽ More
Aiming to predict the complete shapes of partially occluded objects, amodal segmentation is an important step towards visual intelligence. With crucial significance, practical prior knowledge derives from sufficient training, while limited amodal annotations pose challenges to achieve better performance. To tackle this problem, utilizing the mighty priors accumulated in the foundation model, we propose the first SAM-based amodal segmentation approach, PLUG. Methodologically, a novel framework with hierarchical focus is presented to better adapt the task characteristics and unleash the potential capabilities of SAM. In the region level, due to the association and division in visible and occluded areas, inmodal and amodal regions are assigned as the focuses of distinct branches to avoid mutual disturbance. In the point level, we introduce the concept of uncertainty to explicitly assist the model in identifying and focusing on ambiguous points. Guided by the uncertainty map, a computation-economic point loss is applied to improve the accuracy of predicted boundaries. Experiments are conducted on several prominent datasets, and the results show that our proposed method outperforms existing methods with large margins. Even with fewer total parameters, our method still exhibits remarkable advantages.
△ Less
Submitted 3 June, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Authors:
Di Qiu,
Xinyang Lin,
Kaiye Wang,
Xiangxiang Chu,
Pengfei Yan
Abstract:
With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena…
▽ More
With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scenarios. In this paper, we propose a simple yet effective federated face recognition framework called AdaFedFR, by devising an adaptive inter-class representation learning algorithm to enhance the generalization of the generic face model and the efficiency of federated training under strict privacy-preservation. In particular, our work delicately utilizes feature representations of public identities as learnable negative knowledge to optimize the local objective within the feature space, which further encourages the local model to learn powerful representations and optimize personalized models for clients. Experimental results demonstrate that our method outperforms previous approaches on several prevalent face recognition benchmarks within less than 3 communication rounds, which shows communication-friendly and great efficiency.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction
Authors:
Zhihao Yu,
Xu Chu,
Yujie **,
Yasha Wang,
Junfeng Zhao
Abstract:
Electronic health record (EHR) data has emerged as a valuable resource for analyzing patient health status. However, the prevalence of missing data in EHR poses significant challenges to existing methods, leading to spurious correlations and suboptimal predictions. While various imputation techniques have been developed to address this issue, they often obsess unnecessary details and may introduce…
▽ More
Electronic health record (EHR) data has emerged as a valuable resource for analyzing patient health status. However, the prevalence of missing data in EHR poses significant challenges to existing methods, leading to spurious correlations and suboptimal predictions. While various imputation techniques have been developed to address this issue, they often obsess unnecessary details and may introduce additional noise when making clinical predictions. To tackle this problem, we propose SMART, a Self-Supervised Missing-Aware RepresenTation Learning approach for patient health status prediction, which encodes missing information via elaborated attentions and learns to impute missing values through a novel self-supervised pre-training approach that reconstructs missing data representations in the latent space. By adopting missing-aware attentions and focusing on learning higher-order representations, SMART promotes better generalization and robustness to missing data. We validate the effectiveness of SMART through extensive experiments on six EHR tasks, demonstrating its superiority over state-of-the-art methods.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration
Authors:
Yifan Duan,
Xinran Zhang,
Guoliang You,
Yilong Wu,
Xingchen Li,
Yao Li,
Xiaomeng Chu,
Jie Peng,
Yu Zhang,
Jianmin Ji,
Yanyong Zhang
Abstract:
Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-…
▽ More
Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-to-fine framework, we first design a spherical descriptor TERRA for 3-DoF rotation initialization with no prior knowledge. To further optimize, we present JEEP for the joint estimation of extrinsic and pose, integrating geometric and motion information to overcome factors affecting the point cloud registration. Finally, the LiDAR poses optimized by the hierarchical optimization module are input to time synchronization module to produce the ultimate calibration results, including the time offset. To verify the effectiveness, we conduct extensive experiments on eight datasets, where 16 diverse types of LiDARs in total and dozens of calibration tasks are tested. In the challenging tasks, the calibration errors can still be controlled within 5cm and 1° with a high success rate.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
Authors:
Yaqi Wu,
Zhihao Fan,
Xiaofeng Chu,
Jimmy S. Ren,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangcheng Zhou,
Ruicheng Feng,
Yuekun Dai,
Peiqing Yang,
Chen Change Loy,
Senyan Xu,
Zhi**g Sun,
Jiaying Zhu,
Yurui Zhu,
Xueyang Fu,
Zheng-Jun Zha,
Jun Cao,
Cheng Li,
Shu Chen,
Liang Ma,
Shiyang Zhou,
Hai** Zeng,
Kai Feng
, et al. (24 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Authors:
Dayou Du,
Gu Gong,
Xiaowen Chu
Abstract:
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific…
▽ More
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific to ViTs, aiming to optimize their performance by tailoring both the algorithmic structure and the underlying hardware accelerator to each other's strengths. Model quantization, by converting high-precision numbers to lower-precision, reduces the computational demands and memory needs of ViTs, allowing the creation of hardware specifically optimized for these quantized algorithms, boosting efficiency. This article provides a comprehensive survey of ViTs quantization and its hardware acceleration. We first delve into the unique architectural attributes of ViTs and their runtime characteristics. Subsequently, we examine the fundamental principles of model quantization, followed by a comparative analysis of the state-of-the-art quantization techniques for ViTs. Additionally, we explore the hardware acceleration of quantized ViTs, highlighting the importance of hardware-friendly algorithm design. In conclusion, this article will discuss ongoing challenges and future research paths. We consistently maintain the related open-source materials at https://github.com/DD-DuDa/awesome-vit-quantization-acceleration.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
LoRA Dropout as a Sparsity Regularizer for Overfitting Control
Authors:
Yang Lin,
Xinyu Ma,
Xu Chu,
Yujie **,
Zhibang Yang,
Yasha Wang,
Hong Mei
Abstract:
Parameter-efficient fine-tuning methods, represented by LoRA, play an essential role in adapting large-scale pre-trained models to downstream tasks. However, fine-tuning LoRA-series models also faces the risk of overfitting on the training dataset, and yet there's still a lack of theoretical guidance and practical mechanism to control overfitting on LoRA-based PEFT methods. In this paper, we propo…
▽ More
Parameter-efficient fine-tuning methods, represented by LoRA, play an essential role in adapting large-scale pre-trained models to downstream tasks. However, fine-tuning LoRA-series models also faces the risk of overfitting on the training dataset, and yet there's still a lack of theoretical guidance and practical mechanism to control overfitting on LoRA-based PEFT methods. In this paper, we propose a LoRA Dropout mechanism for the LoRA-based methods by introducing random noises to the learnable low-rank matrices and increasing parameter sparsity. We then demonstrate the theoretical mechanism of our LoRA Dropout mechanism from the perspective of sparsity regularization by providing a generalization error bound under this framework. Theoretical results show that appropriate sparsity would help tighten the gap between empirical and generalization risks and thereby control overfitting. Furthermore, based on the LoRA Dropout framework, we introduce a test-time ensemble strategy and provide theoretical evidence demonstrating that the ensemble method can further compress the error bound, and lead to better performance during inference time. Extensive experiments on various NLP tasks provide practical validations of the effectiveness of our LoRA Dropout framework in improving model accuracy and calibration.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation
Authors:
Xinyu Ma,
Xu Chu,
Zhibang Yang,
Yang Lin,
Xin Gao,
Junfeng Zhao
Abstract:
With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks. One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT), which rigorously preserves the angular distances within the parameter space to preserve the pr…
▽ More
With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks. One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT), which rigorously preserves the angular distances within the parameter space to preserve the pretrained knowledge. Despite the empirical effectiveness, OFT still suffers low parameter efficiency at $\mathcal{O}(d^2)$ and limited capability of downstream adaptation. Inspired by Givens rotation, in this paper, we proposed quasi-Givens Orthogonal Fine-Tuning (qGOFT) to address the problems. We first use $\mathcal{O}(d)$ Givens rotations to accomplish arbitrary orthogonal transformation in $SO(d)$ with provable equivalence, reducing parameter complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$. Then we introduce flexible norm and relative angular adjustments under soft orthogonality regularization to enhance the adaptation capability of downstream semantic deviations. Extensive experiments on various tasks and pretrained models validate the effectiveness of our methods.
△ Less
Submitted 6 June, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Authors:
Yu** Tang,
Peijie Dong,
Zhenheng Tang,
Xiaowen Chu,
Junwei Liang
Abstract:
Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based…
▽ More
Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based architectures has been met with enthusiasm for their exceptional long-sequence modeling capabilities, surpassing established vision models in efficiency and accuracy, which motivates us to develop an innovative architecture tailored for spatiotemporal forecasting. In this paper, we propose the VMRNN cell, a new recurrent unit that integrates the strengths of Vision Mamba blocks with LSTM. We construct a network centered on VMRNN cells to tackle spatiotemporal prediction tasks effectively. Our extensive evaluations show that our proposed approach secures competitive results on a variety of tasks while maintaining a smaller model size. Our code is available at https://github.com/yyyu**tang/VMRNN-PyTorch.
△ Less
Submitted 29 June, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Exploring the Potential of Large Language Models in Graph Generation
Authors:
Yang Yao,
Xin Wang,
Zeyang Zhang,
Yijian Qin,
Ziwei Zhang,
Xu Chu,
Yuekui Yang,
Wenwu Zhu,
Hong Mei
Abstract:
Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug d…
▽ More
Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug discovery, while tends to be more challenging. In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments. Specifically, we propose several tasks tailored with comprehensive experiments to address key questions regarding LLMs' understanding of different graph structure rules, their ability to capture structural type distributions, and their utilization of domain knowledge for property-based graph generation. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks, including rule-based and distribution-based generation. We also observe that popular prompting methods, such as few-shot and chain-of-thought prompting, do not consistently enhance performance. Besides, LLMs show potential in generating molecules with specific properties. These findings may serve as foundations for designing good LLMs based models for graph generation and provide valuable insights and further research.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge
Authors:
Yuhe Liu,
Mengxue Kang,
Zengchang Qin,
Xiangxiang Chu
Abstract:
Large text-to-image models have achieved astonishing performance in synthesizing diverse and high-quality images guided by texts. With detail-oriented conditioning control, even finer-grained spatial control can be achieved. However, some generated images still appear unreasonable, even with plentiful object features and a harmonious style. In this paper, we delve into the underlying causes and fi…
▽ More
Large text-to-image models have achieved astonishing performance in synthesizing diverse and high-quality images guided by texts. With detail-oriented conditioning control, even finer-grained spatial control can be achieved. However, some generated images still appear unreasonable, even with plentiful object features and a harmonious style. In this paper, we delve into the underlying causes and find that deep-level logical information, serving as common-sense knowledge, plays a significant role in understanding and processing images. Nonetheless, almost all models have neglected the importance of logical relations in images, resulting in poor performance in this aspect. Following this observation, we propose LogicalDefender, which combines images with the logical knowledge already summarized by humans in text. This encourages models to learn logical knowledge faster and better, and concurrently, extracts the widely applicable logical knowledge from both images and human knowledge. Experiments show that our model has achieved better logical performance, and the extracted logical knowledge can be effectively applied to other scenarios.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Authors:
Honghao Chen,
Xiangxiang Chu,
Yongjian Ren,
Xin Zhao,
Kaiqi Huang
Abstract:
Recently, some large kernel convnets strike back with appealing performance and efficiency. However, given the square complexity of convolution, scaling up kernels can bring about an enormous amount of parameters and the proliferated parameters can induce severe optimization problem. Due to these issues, current CNNs compromise to scale up to 51x51 in the form of stripe convolution (i.e., 51x5 + 5…
▽ More
Recently, some large kernel convnets strike back with appealing performance and efficiency. However, given the square complexity of convolution, scaling up kernels can bring about an enormous amount of parameters and the proliferated parameters can induce severe optimization problem. Due to these issues, current CNNs compromise to scale up to 51x51 in the form of stripe convolution (i.e., 51x5 + 5x51) and start to saturate as the kernel size continues growing. In this paper, we delve into addressing these vital issues and explore whether we can continue scaling up kernels for more performance gains. Inspired by human vision, we propose a human-like peripheral convolution that efficiently reduces over 90% parameter count of dense grid convolution through parameter sharing, and manage to scale up kernel size to extremely large. Our peripheral convolution behaves highly similar to human, reducing the complexity of convolution from O(K^2) to O(logK) without backfiring performance. Built on this, we propose Parameter-efficient Large Kernel Network (PeLK). Our PeLK outperforms modern vision Transformers and ConvNet architectures like Swin, ConvNeXt, RepLKNet and SLaK on various vision tasks including ImageNet classification, semantic segmentation on ADE20K and object detection on MS COCO. For the first time, we successfully scale up the kernel size of CNNs to an unprecedented 101x101 and demonstrate consistent improvements.
△ Less
Submitted 15 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Authors:
Xiangxiang Chu,
Jianlin Su,
Bo Zhang,
Chunhua Shen
Abstract:
Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. V…
▽ More
Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. VisionLLaMA is a unified and generic modelling framework for solving most vision tasks. We extensively evaluate its effectiveness using typical pre-training paradigms in a good portion of downstream tasks of image perception and especially image generation. In many cases, VisionLLaMA have exhibited substantial gains over the previous state-of-the-art vision transformers. We believe that VisionLLaMA can serve as a strong new baseline model for vision generation and understanding. Our code will be released at https://github.com/Meituan-AutoML/VisionLLaMA.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
Authors:
Weile Luo,
Ruibo Fan,
Zeyu Li,
Dayou Du,
Qiang Wang,
Xiaowen Chu
Abstract:
Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) utilizing deep learning techniques. A substantial body of studies have been dedicated to dissecting the microarchitectural metrics characterizing diverse GPU generations, which helps researchers understa…
▽ More
Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) utilizing deep learning techniques. A substantial body of studies have been dedicated to dissecting the microarchitectural metrics characterizing diverse GPU generations, which helps researchers understand the hardware details and leverage them to optimize the GPU programs. However, the latest Hopper GPUs present a set of novel attributes, including new tensor cores supporting FP8, DPX, and distributed shared memory. Their details still remain mysterious in terms of performance and operational characteristics. In this research, we propose an extensive benchmarking study focused on the Hopper GPU. The objective is to unveil its microarchitectural intricacies through an examination of the new instruction-set architecture (ISA) of Nvidia GPUs and the utilization of new CUDA APIs. Our approach involves two main aspects. Firstly, we conduct conventional latency and throughput comparison benchmarks across the three most recent GPU architectures, namely Hopper, Ada, and Ampere. Secondly, we delve into a comprehensive discussion and benchmarking of the latest Hopper features, encompassing the Hopper DPX dynamic programming (DP) instruction set, distributed shared memory, and the availability of FP8 tensor cores. The microbenchmarking results we present offer a deeper understanding of the novel GPU AI function units and programming features introduced by the Hopper architecture. This newfound understanding is expected to greatly facilitate software optimization and modeling efforts for GPU architectures. To the best of our knowledge, this study makes the first attempt to demystify the tensor core performance and programming instruction sets unique to Hopper GPUs.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning
Authors:
Tiansong Zhou,
Yebin Liu,
Xuangeng Chu,
Chengkun Cao,
Changyin Zhou,
Fei Yu,
Yu Li
Abstract:
Rendering photo-realistic novel-view images of complex scenes has been a long-standing challenge in computer graphics. In recent years, great research progress has been made on enhancing rendering quality and accelerating rendering speed in the realm of view synthesis. However, when rendering complex dynamic scenes with sparse views, the rendering quality remains limited due to occlusion problems.…
▽ More
Rendering photo-realistic novel-view images of complex scenes has been a long-standing challenge in computer graphics. In recent years, great research progress has been made on enhancing rendering quality and accelerating rendering speed in the realm of view synthesis. However, when rendering complex dynamic scenes with sparse views, the rendering quality remains limited due to occlusion problems. Besides, for rendering high-resolution images on dynamic scenes, the rendering speed is still far from real-time. In this work, we propose a generalizable view synthesis method that can render high-resolution novel-view images of complex static and dynamic scenes in real-time from sparse views. To address the occlusion problems arising from the sparsity of input views and the complexity of captured scenes, we introduce an explicit 3D visibility reasoning approach that can efficiently estimate the visibility of sampled 3D points to the input views. The proposed visibility reasoning approach is fully differentiable and can gracefully fit inside the volume rendering pipeline, allowing us to train our networks with only multi-view images as supervision while refining geometry and texture simultaneously. Besides, each module in our pipeline is carefully designed to bypass the time-consuming MLP querying process and enhance the rendering quality of high-resolution images, enabling us to render high-resolution novel-view images in real-time.Experimental results show that our method outperforms previous view synthesis methods in both rendering quality and speed, particularly when dealing with complex dynamic scenes with sparse views.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems
Authors:
Shengyao Zhuang,
Bevan Koopman,
Xiaoran Chu,
Guido Zuccon
Abstract:
The introduction of Vec2Text, a technique for inverting text embeddings, has raised serious privacy concerns within dense retrieval systems utilizing text embeddings, including those provided by OpenAI and Cohere. This threat comes from the ability for a malicious attacker with access to text embeddings to reconstruct the original text.
In this paper, we investigate various aspects of embedding…
▽ More
The introduction of Vec2Text, a technique for inverting text embeddings, has raised serious privacy concerns within dense retrieval systems utilizing text embeddings, including those provided by OpenAI and Cohere. This threat comes from the ability for a malicious attacker with access to text embeddings to reconstruct the original text.
In this paper, we investigate various aspects of embedding models that could influence the recoverability of text using Vec2Text. Our exploration involves factors such as distance metrics, pooling functions, bottleneck pre-training, training with noise addition, embedding quantization, and embedding dimensions -- aspects not previously addressed in the original Vec2Text paper. Through a thorough analysis of these factors, our aim is to gain a deeper understanding of the critical elements impacting the trade-offs between text recoverability and retrieval effectiveness in dense retrieval systems. This analysis provides valuable insights for practitioners involved in designing privacy-aware dense retrieval systems. Additionally, we propose a straightforward fix for embedding transformation that ensures equal ranking effectiveness while mitigating the risk of text recoverability.
Furthermore, we extend the application of Vec2Text to the separate task of corpus poisoning, where, theoretically, Vec2Text presents a more potent threat compared to previous attack methods. Notably, Vec2Text does not require access to the dense retriever's model parameters and can efficiently generate numerous adversarial passages.
In summary, this study highlights the potential threat posed by Vec2Text to existing dense retrieval systems, while also presenting effective methods to patch and strengthen such systems against such risks.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Deformable Object Manipulation With Constraints Using Path Set Planning and Tracking
Authors:
**g Huang,
Xiangyu Chu,
Xin Ma,
Kwok Wai Samuel Au
Abstract:
In robotic deformable object manipulation (DOM) applications, constraints arise commonly from environments and task-specific requirements. Enabling DOM with constraints is therefore crucial for its deployment in practice. However, dealing with constraints turns out to be challenging due to many inherent factors such as inaccessible deformation models of deformable objects (DOs) and varying environ…
▽ More
In robotic deformable object manipulation (DOM) applications, constraints arise commonly from environments and task-specific requirements. Enabling DOM with constraints is therefore crucial for its deployment in practice. However, dealing with constraints turns out to be challenging due to many inherent factors such as inaccessible deformation models of deformable objects (DOs) and varying environmental setups. This article presents a systematic manipulation framework for DOM subject to constraints by proposing a novel path set planning and tracking scheme. First, constrained DOM tasks are formulated into a versatile optimization formalism which enables dynamic constraint imposition. Because of the lack of the local optimization objective and high state dimensionality, the formulated problem is not analytically solvable. To address this, planning of the path set, which collects paths of DO feedback points, is proposed subsequently to offer feasible path and motion references for DO in constrained setups. Both theoretical analyses and computationally efficient algorithmic implementation of path set planning are discussed. Lastly, a control architecture combining path set tracking and constraint handling is designed for task execution. The effectiveness of our methods is validated in a variety of DOM tasks with constrained experimental settings.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
Authors:
Dayou Du,
Yijia Zhang,
Shijie Cao,
Jiaqi Guo,
Ting Cao,
Xiaowen Chu,
Ningyi Xu
Abstract:
The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD)…
▽ More
The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clip** technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance. Empirical evaluations demonstrate that BitDistiller significantly surpasses existing methods in both 3-bit and 2-bit configurations on general language understanding and complex reasoning benchmarks. Notably, BitDistiller is shown to be more cost-effective, demanding fewer data and training resources. The code is available at https://github.com/DD-DuDa/BitDistiller.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
FedImpro: Measuring and Improving Client Update in Federated Learning
Authors:
Zhenheng Tang,
Yonggang Zhang,
Shaohuai Shi,
Xinmei Tian,
Tongliang Liu,
Bo Han,
Xiaowen Chu
Abstract:
Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved…
▽ More
Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved local models. First, we analyze the generalization contribution of local training and conclude that this generalization contribution is bounded by the conditional Wasserstein distance between the data distribution of different clients. Then, we propose FedImpro, to construct similar conditional distributions for local training. Specifically, FedImpro decouples the model into high-level and low-level components, and trains the high-level portion on reconstructed feature distributions. This approach enhances the generalization contribution and reduces the dissimilarity of gradients in FL. Experimental results show that FedImpro can help FL defend against data heterogeneity and enhance the generalization performance of the model.
△ Less
Submitted 14 March, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Authors:
Xiangxiang Chu,
Limeng Qiao,
Xinyu Zhang,
Shuang Xu,
Fei Wei,
Yang Yang,
Xiaofei Sun,
Yiming Hu,
Xinyang Lin,
Bo Zhang,
Chunhua Shen
Abstract:
We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM b…
▽ More
We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
Authors:
Peijie Dong,
Lujun Li,
Xinglin Pan,
Zimian Wei,
Xiang Liu,
Qiang Wang,
Xiaowen Chu
Abstract:
Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without c…
▽ More
Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without considering the fact that not all nodes in a neural network equally impact performance estimation. Our observations reveal that node-wise zero-cost statistics significantly vary in their contributions to performance, with each node exhibiting a degree of uncertainty. Based on this insight, we introduce a novel method called Parametric Zero-Cost Proxies (ParZC) framework to enhance the adaptability of zero-cost proxies through parameterization. To address the node indiscrimination, we propose a Mixer Architecture with Bayesian Network (MABN) to explore the node-wise zero-cost statistics and estimate node-specific uncertainty. Moreover, we propose DiffKendall as a loss function to directly optimize Kendall's Tau coefficient in a differentiable manner so that our ParZC can better handle the discrepancies in ranking architectures. Comprehensive experiments on NAS-Bench-101, 201, and NDS demonstrate the superiority of our proposed ParZC compared to existing zero-shot NAS methods. Additionally, we demonstrate the versatility and adaptability of ParZC by transferring it to the Vision Transformer search space.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems
Authors:
Yuxin Wang,
Yuhan Chen,
Zeyu Li,
Xueze Kang,
Zhenheng Tang,
Xin He,
Rui Guo,
Xin Wang,
Qiang Wang,
Amelie Chi Zhou,
Xiaowen Chu
Abstract:
Serving systems for Large Language Models (LLMs) are often optimized to improve quality of service (QoS) and throughput. However, due to the lack of open-sourced LLM serving workloads, these systems are frequently evaluated under unrealistic workload assumptions. Consequently, performance may degrade when these systems are deployed in real-world scenarios. This work presents BurstGPT, an LLM servi…
▽ More
Serving systems for Large Language Models (LLMs) are often optimized to improve quality of service (QoS) and throughput. However, due to the lack of open-sourced LLM serving workloads, these systems are frequently evaluated under unrealistic workload assumptions. Consequently, performance may degrade when these systems are deployed in real-world scenarios. This work presents BurstGPT, an LLM serving workload with 5.29 million traces from regional Azure OpenAI GPT services over 121 days. BurstGPT captures realistic LLM serving characteristics through detailed tracing of: (1) Concurrency of requests: It traces burstiness variations of requests in Azure OpenAI GPT services, revealing diversified concurrency patterns in different services and model types. (2) Response Lengths of requests: It traces the auto-regressive serving processes of GPT models, showing statistical relations between requests and their responses. (3) Failures of requests: It traces failures of conversation and API services, showing intensive resource needs and limited resource availability of such services in Azure. Details of the characteristics can serve multiple purposes in LLM serving optimizations, such as system evaluation and trace provisioning. In our demo evaluation with BurstGPT, we observe that frequent variations in BurstGPT reveal declines in efficiency, stability, or reliability in realistic LLM serving. We identify that the generalization of KV cache management and request scheduling optimization is not guaranteed for different workloads, especially when systems are poorly optimized for unrealistic workloads. We have made the dataset publicly available to encourage further research at https://github.com/HPMLL/BurstGPT.
△ Less
Submitted 17 June, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction
Authors:
Weibin Liao,
Yinghao Zhu,
Zixiang Wang,
Xu Chu,
Yasha Wang,
Liantao Ma
Abstract:
Analyzing the health status of patients based on Electronic Health Records (EHR) is a fundamental research problem in medical informatics. The presence of extensive missing values in EHR makes it challenging for deep neural networks to directly model the patient's health status based on EHR. Existing deep learning training protocols require the use of statistical information or imputation models t…
▽ More
Analyzing the health status of patients based on Electronic Health Records (EHR) is a fundamental research problem in medical informatics. The presence of extensive missing values in EHR makes it challenging for deep neural networks to directly model the patient's health status based on EHR. Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values; however, the protocols inject non-realistic data into downstream EHR analysis models, significantly limiting model performance. This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol. PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values, resulting in a significant performance improvement for all EHR analysis models. Additionally, our experiments show that PAI exhibits higher robustness in situations of data insufficiency and high missing rates. More importantly, in a real-world application involving cross-institutional data with zero-shot evaluation, PAI demonstrates stronger model generalization capabilities for non-overlap** features.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection
Authors:
Sifan Zhou,
Liang Li,
Xinyu Zhang,
Bo Zhang,
Shipeng Bai,
Miao Sun,
Ziyu Zhao,
Xiaobo Lu,
Xiangxiang Chu
Abstract:
Due to highly constrained computing power and memory, deploying 3D lidar-based detectors on edge devices equipped in autonomous vehicles and robots poses a crucial challenge. Being a convenient and straightforward model compression approach, Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks. However, applying it directly to 3D lidar-based tasks inevitably leads to perform…
▽ More
Due to highly constrained computing power and memory, deploying 3D lidar-based detectors on edge devices equipped in autonomous vehicles and robots poses a crucial challenge. Being a convenient and straightforward model compression approach, Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks. However, applying it directly to 3D lidar-based tasks inevitably leads to performance degradation. As a remedy, we propose an effective PTQ method called LiDAR-PTQ, which is particularly curated for 3D lidar detection (both SPConv-based and SPConv-free). Our LiDAR-PTQ features three main components, \textbf{(1)} a sparsity-based calibration method to determine the initialization of quantization parameters, \textbf{(2)} a Task-guided Global Positive Loss (TGPL) to reduce the disparity between the final predictions before and after quantization, \textbf{(3)} an adaptive rounding-to-nearest operation to minimize the layerwise reconstruction error. Extensive experiments demonstrate that our LiDAR-PTQ can achieve state-of-the-art quantization performance when applied to CenterPoint (both Pillar-based and Voxel-based). To our knowledge, for the very first time in lidar-based 3D detection tasks, the PTQ INT8 model's accuracy is almost the same as the FP32 model while enjoying $3\times$ inference speedup. Moreover, our LiDAR-PTQ is cost-effective being $30\times$ faster than the quantization-aware training method. Code will be released at \url{https://github.com/StiphyJay/LiDAR-PTQ}.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
GPAvatar: Generalizable and Precise Head Avatar from Image(s)
Authors:
Xuangeng Chu,
Yu Li,
Ailing Zeng,
Tianyu Yang,
Lijian Lin,
Yunfei Liu,
Tatsuya Harada
Abstract:
Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based war**, mesh-based, and neural re…
▽ More
Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based war**, mesh-based, and neural rendering approaches, present challenges in maintaining multi-view consistency, incorporating non-facial information, and generalizing to new identities. In this paper, we propose a framework named GPAvatar that reconstructs 3D head avatars from one or several images in a single forward pass. The key idea of this work is to introduce a dynamic point-based expression field driven by a point cloud to precisely and effectively capture expressions. Furthermore, we use a Multi Tri-planes Attention (MTA) fusion module in the tri-planes canonical field to leverage information from multiple input images. The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency, demonstrating promising results for free-viewpoint rendering and novel view synthesis.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Infinite-Horizon Graph Filters: Leveraging Power Series to Enhance Sparse Information Aggregation
Authors:
Ruizhe Zhang,
Xinke Jiang,
Yuchen Fang,
Jiayuan Luo,
Yongxin Xu,
Yichen Zhu,
Xu Chu,
Junfeng Zhao,
Yasha Wang
Abstract:
Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years. However, their performance is often constrained by a limited receptive field, a challenge that becomes more acute in the presence of sparse graphs. In light of the power series, which possesses infinite expansion capabili…
▽ More
Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years. However, their performance is often constrained by a limited receptive field, a challenge that becomes more acute in the presence of sparse graphs. In light of the power series, which possesses infinite expansion capabilities, we propose a novel Graph Power Filter Neural Network (GPFN) that enhances node classification by employing a power series graph filter to augment the receptive field. Concretely, our GPFN designs a new way to build a graph filter with an infinite receptive field based on the convergence power series, which can be analyzed in the spectral and spatial domains. Besides, we theoretically prove that our GPFN is a general framework that can integrate any power series and capture long-range dependencies. Finally, experimental results on three datasets demonstrate the superiority of our GPFN over state-of-the-art baselines.
△ Less
Submitted 18 April, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Conditional Flood Fill Method in Logic Synthesis
Authors:
Shitian Yang,
Junyue Jiang,
Yilai Liang,
Xiaoyang Chu
Abstract:
In the field of Electronic Design Automation (EDA), logic synthesis plays a pivotal role in optimizing hardware resources. Traditional logic synthesis algorithms, such as the Quine-McCluskey method, face challenges in scalability and efficiency, particularly for higher-dimension problems. This paper introduces a novel heuristic algorithm based on Conditional Flood Fill Method aimed at addressing t…
▽ More
In the field of Electronic Design Automation (EDA), logic synthesis plays a pivotal role in optimizing hardware resources. Traditional logic synthesis algorithms, such as the Quine-McCluskey method, face challenges in scalability and efficiency, particularly for higher-dimension problems. This paper introduces a novel heuristic algorithm based on Conditional Flood Fill Method aimed at addressing these limitations. Our method employs count-based adjacent element handling and introduces nine new theorems to guide the logic synthesis process. Experimental results validate the efficacy of our approach, showing significant improvements in computational efficiency and scalability compared to existing algorithms. The algorithm holds potential for future advancements in circuit development and Boolean function optimization.
△ Less
Submitted 7 December, 2023;
originally announced January 2024.
-
Imputation with Inter-Series Information from Prototypes for Irregular Sampled Time Series
Authors:
Zhihao Yu,
Xu Chu,
Liantao Ma,
Yasha Wang,
Wenwu Zhu
Abstract:
Irregularly sampled time series are ubiquitous, presenting significant challenges for analysis due to missing values. Despite existing methods address imputation, they predominantly focus on leveraging intra-series information, neglecting the potential benefits that inter-series information could provide, such as reducing uncertainty and memorization effect. To bridge this gap, we propose PRIME, a…
▽ More
Irregularly sampled time series are ubiquitous, presenting significant challenges for analysis due to missing values. Despite existing methods address imputation, they predominantly focus on leveraging intra-series information, neglecting the potential benefits that inter-series information could provide, such as reducing uncertainty and memorization effect. To bridge this gap, we propose PRIME, a Prototype Recurrent Imputation ModEl, which integrates both intra-series and inter-series information for imputing missing values in irregularly sampled time series. Our framework comprises a prototype memory module for learning inter-series information, a bidirectional gated recurrent unit utilizing prototype information for imputation, and an attentive prototypical refinement module for adjusting imputations. We conducted extensive experiments on three datasets, and the results underscore PRIME's superiority over the state-of-the-art models by up to 26% relative improvement on mean square error.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
On the Burstiness of Distributed Machine Learning Traffic
Authors:
Natchanon Luangsomboon,
Fahimeh Fazel,
Jörg Liebeherr,
Ashkan Sobhani,
Shichao Guan,
Xingjun Chu
Abstract:
Traffic from distributed training of machine learning (ML) models makes up a large and growing fraction of the traffic mix in enterprise data centers. While work on distributed ML abounds, the network traffic generated by distributed ML has received little attention. Using measurements on a testbed network, we investigate the traffic characteristics generated by the training of the ResNet-50 neura…
▽ More
Traffic from distributed training of machine learning (ML) models makes up a large and growing fraction of the traffic mix in enterprise data centers. While work on distributed ML abounds, the network traffic generated by distributed ML has received little attention. Using measurements on a testbed network, we investigate the traffic characteristics generated by the training of the ResNet-50 neural network with an emphasis on studying its short-term burstiness. For the latter we propose metrics that quantify traffic burstiness at different time scales. Our analysis reveals that distributed ML traffic exhibits a very high degree of burstiness on short time scales, exceeding a 60:1 peak-to-mean ratio on time intervals as long as 5~ms. We observe that training software orchestrates transmissions in such a way that burst transmissions from different sources within the same application do not result in congestion and packet losses. An extrapolation of the measurement data to multiple applications underscores the challenges of distributed ML traffic for congestion and flow control algorithms.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation
Authors:
Zhengze Xu,
Dongyue Wu,
Changqian Yu,
Xiangxiang Chu,
Nong Sang,
Changxin Gao
Abstract:
Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representa…
▽ More
Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN. SCTNet utilizes a transformer as the training-only semantic branch considering its superb ability to extract long-range context. With the help of the proposed transformer-like CNN block CFBlock and the semantic information alignment module, SCTNet could capture the rich semantic information from the transformer branch in training. During the inference, only the single branch CNN needs to be deployed. We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance. The code and model is available at https://github.com/xzz777/SCTNet
△ Less
Submitted 15 January, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Authors:
Xiangxiang Chu,
Limeng Qiao,
Xinyang Lin,
Shuang Xu,
Yang Yang,
Yiming Hu,
Fei Wei,
Xinyu Zhang,
Bo Zhang,
Xiaolin Wei,
Chunhua Shen
Abstract:
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices. It is an amalgamation of a myriad of architectural designs and techniques that are mobile-oriented, which comprises a set of language models at the scale of 1.4B and 2.7B parameters, trained from scratch, a multimodal vision model that is pre-trained in the CLIP fashion, cross-modality int…
▽ More
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices. It is an amalgamation of a myriad of architectural designs and techniques that are mobile-oriented, which comprises a set of language models at the scale of 1.4B and 2.7B parameters, trained from scratch, a multimodal vision model that is pre-trained in the CLIP fashion, cross-modality interaction via an efficient projector. We evaluate MobileVLM on several typical VLM benchmarks. Our models demonstrate on par performance compared with a few much larger models. More importantly, we measure the inference speed on both a Qualcomm Snapdragon 888 CPU and an NVIDIA Jeston Orin GPU, and we obtain state-of-the-art performance of 21.5 tokens and 65.3 tokens per second, respectively. Our code will be made available at: https://github.com/Meituan-AutoML/MobileVLM.
△ Less
Submitted 29 December, 2023; v1 submitted 28 December, 2023;
originally announced December 2023.
-
HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses
Authors:
Xinke Jiang,
Ruizhe Zhang,
Yongxin Xu,
Rihong Qiu,
Yue Fang,
Zhiyuan Wang,
**yi Tang,
Hongxin Ding,
Xu Chu,
Junfeng Zhao,
Yasha Wang
Abstract:
In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (Hy…
▽ More
In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework, which leverages LLMs' powerful reasoning capacity to compensate for the incompleteness of user queries, optimizes the interaction process with LLMs, and provides diverse retrieved knowledge. Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs, as well as the carefully curated prompt to enhance the density and efficiency of LLMs' responses. Furthermore, we introduce the HO Fragment Granularity-aware Rerank Module to filter out noise while ensuring the balance between diversity and relevance in retrieved knowledge. Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability.
△ Less
Submitted 19 April, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Open Domain Knowledge Extraction for Knowledge Graphs
Authors:
Kun Qian,
Anton Belyi,
Fei Wu,
Samira Khorshidi,
Azadeh Nikfarjam,
Rahul Khot,
Yisi Sang,
Katherine Luna,
Xianqi Chu,
Eric Choi,
Yash Govind,
Chloe Seivwright,
Yiwen Sun,
Ahmed Fakhry,
Theo Rekatsinas,
Ihab Ilyas,
Xiaoguang Qi,
Yunyao Li
Abstract:
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope…
▽ More
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.
△ Less
Submitted 30 October, 2023;
originally announced December 2023.
-
CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning
Authors:
Qingsong Yan,
Qiang Wang,
Kaiyong Zhao,
Jie Chen,
Bo Li,
Xiaowen Chu,
Fei Deng
Abstract:
Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. H…
▽ More
Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. However, these methods work for forward-looking scenes with slight motions and fail to tackle the rotation scenario in practice. To overcome this limitation, we propose a novel \underline{c}amera parameter \underline{f}ree neural radiance field (CF-NeRF), which incrementally reconstructs 3D representations and recovers the camera parameters inspired by incremental structure from motion (SfM). Given a sequence of images, CF-NeRF estimates the camera parameters of images one by one and reconstructs the scene through initialization, implicit localization, and implicit optimization. To evaluate our method, we use a challenging real-world dataset NeRFBuster which provides 12 scenes under complex trajectories. Results demonstrate that CF-NeRF is robust to camera rotation and achieves state-of-the-art results without providing prior information and constraints.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Lenna: Language Enhanced Reasoning Detection Assistant
Authors:
Fei Wei,
Xinyu Zhang,
Ailing Zhang,
Bo Zhang,
Xiangxiang Chu
Abstract:
With the fast-paced development of multimodal large language models (MLLMs), we can now converse with AI systems in natural languages to understand images. However, the reasoning power and world knowledge embedded in the large language models have been much less investigated and exploited for image perception tasks. In this paper, we propose Lenna, a language-enhanced reasoning detection assistant…
▽ More
With the fast-paced development of multimodal large language models (MLLMs), we can now converse with AI systems in natural languages to understand images. However, the reasoning power and world knowledge embedded in the large language models have been much less investigated and exploited for image perception tasks. In this paper, we propose Lenna, a language-enhanced reasoning detection assistant, which utilizes the robust multimodal feature representation of MLLMs, while preserving location information for detection. This is achieved by incorporating an additional <DET> token in the MLLM vocabulary that is free of explicit semantic context but serves as a prompt for the detector to identify the corresponding position. To evaluate the reasoning capability of Lenna, we construct a ReasonDet dataset to measure its performance on reasoning-based detection. Remarkably, Lenna demonstrates outstanding performance on ReasonDet and comes with significantly low training costs. It also incurs minimal transferring overhead when extended to other tasks. Our code and model will be available at https://git.io/Lenna.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency Learning
Authors:
Shuang Xu,
Sifan Zhou,
Zhi Tian,
Jizhou Ma,
Qiong Nie,
Xiangxiang Chu
Abstract:
Current traditional methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts, while learning-based approaches resort to iterative refinement for calibration results, posing constraints on their generalization and application in on-board systems. In this paper, we propose a novel approach to address the extrinsic calibration problem in a robust, automatic, and sing…
▽ More
Current traditional methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts, while learning-based approaches resort to iterative refinement for calibration results, posing constraints on their generalization and application in on-board systems. In this paper, we propose a novel approach to address the extrinsic calibration problem in a robust, automatic, and single-shot manner. Instead of directly optimizing extrinsics, we leverage the consistency learning between LiDAR and camera to implement implicit re-calibartion. Specially, we introduce an appearance-consistency loss and a geometric-consistency loss to minimizing the inconsitency between the attrbutes (e.g., intensity and depth) of projected LiDAR points and the predicted ones. This design not only enhances adaptability to various scenarios but also enables a simple and efficient formulation during inference. We conduct comprehensive experiments on different datasets, and the results demonstrate that our method achieves accurate and robust performance. To promote further research and development in this area, we will release our model and code.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Masked Autoencoders Are Robust Neural Architecture Search Learners
Authors:
Yiming Hu,
Xiangxiang Chu,
Bo Zhang
Abstract:
Neural Architecture Search (NAS) currently relies heavily on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust disc…
▽ More
Neural Architecture Search (NAS) currently relies heavily on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust discovery of network architectures without compromising performance and generalization ability. Additionally, we address the problem of performance collapse encountered in the widely-used Differentiable Architecture Search (DARTS) method in the unsupervised paradigm by introducing a multi-scale decoder. Through extensive experiments conducted on various search spaces and datasets, we demonstrate the effectiveness and robustness of the proposed method, providing empirical evidence of its superiority over baseline approaches.
△ Less
Submitted 26 March, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
A Speed Odyssey for Deployable Quantization of LLMs
Authors:
Qingyuan Li,
Ran Meng,
Yiduo Li,
Bo Zhang,
Liang Li,
Yifan Lu,
Xiangxiang Chu,
Yerui Sun,
Yuchen Xie
Abstract:
The large language model era urges faster and less costly inference. Prior model compression works on LLMs tend to undertake a software-centric approach primarily focused on the simulated quantization performance. By neglecting the feasibility of deployment, these approaches are typically disabled in real practice. They used to drastically push down the quantization bit range for a reduced computa…
▽ More
The large language model era urges faster and less costly inference. Prior model compression works on LLMs tend to undertake a software-centric approach primarily focused on the simulated quantization performance. By neglecting the feasibility of deployment, these approaches are typically disabled in real practice. They used to drastically push down the quantization bit range for a reduced computation which might not be supported by the mainstream hardware, or involve sophisticated algorithms that introduce extra computation or memory access overhead. We argue that pursuing a hardware-centric approach in the construction of quantization algorithms is crucial. In this regard, we are driven to build our compression method on top of hardware awareness, eliminating impractical algorithm choices while maximizing the benefit of hardware acceleration. Our method, OdysseyLLM, comes with a novel W4A8 kernel implementation called FastGEMM and a combined recipe of quantization strategies. Extensive experiments manifest the superiority of our W4A8 method which brings the actual speed boosting up to \textbf{4$\times$} compared to Hugging Face FP16 inference and \textbf{2.23$\times$} vs. the state-of-the-art inference engine TensorRT-LLM in FP16, and \textbf{1.45$\times$} vs. TensorRT-LLM in INT8, yet without substantially harming the performance.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
On the IRS Deployment in Smart Factories Considering Blockage Effects: Collocated or Distributed?
Authors:
Yixin Zhang,
Saeed R. Khosravirad,
Xiaoli Chu,
Mikko A. Uusitalo
Abstract:
In this article, we study the collocated and distributed deployment of intelligent reflecting surfaces (IRS) for a fixed total number of IRS elements to support enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) services inside a factory. We build a channel model that incorporates the line-of-sight (LOS) probability and power loss of each transmission path, and p…
▽ More
In this article, we study the collocated and distributed deployment of intelligent reflecting surfaces (IRS) for a fixed total number of IRS elements to support enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) services inside a factory. We build a channel model that incorporates the line-of-sight (LOS) probability and power loss of each transmission path, and propose three metrics, namely, the expected received signal-to-noise ratio (SNR), expected finite-blocklength (FB) capacity, and expected outage probability, where the expectation is taken over the probability distributions of interior blockages and channel fading. The expected received SNR and expected FB capacity for extremely high blockage densities are derived in closed-form as functions of the amount and height of IRSs and the density, size, and penetration loss of blockages, which are verified by Monte Carlo simulations. Results show that deploying IRSs vertically higher leads to higher expected received SNR and expected FB capacity. By analysing the average/minimum/maximum of the three metrics versus the number of IRSs, we find that for high blockage densities, both eMBB and URLLC services benefit from distributed deployment; and for low blockage densities, URLLC services benefit from distributed deployment while eMBB services see limited difference between collocated and distributed deployment.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Bootstrap** Robotic Skill Learning With Intuitive Teleoperation: Initial Feasibility Study
Authors:
Xiangyu Chu,
Yunxi Tang,
Lam Him Kwok,
Yuanpei Cai,
Kwok Wai Samuel Au
Abstract:
Robotic skill learning has been increasingly studied but the demonstration collections are more challenging compared to collecting images/videos in computer vision and texts in natural language processing. This paper presents a skill learning paradigm by using intuitive teleoperation devices to generate high-quality human demonstrations efficiently for robotic skill learning in a data-driven manne…
▽ More
Robotic skill learning has been increasingly studied but the demonstration collections are more challenging compared to collecting images/videos in computer vision and texts in natural language processing. This paper presents a skill learning paradigm by using intuitive teleoperation devices to generate high-quality human demonstrations efficiently for robotic skill learning in a data-driven manner. By using a reliable teleoperation interface, the da Vinci Research Kit (dVRK) master, a system called dVRK-Simulator-for-Demonstration (dS4D) is proposed in this paper. Various manipulation tasks show the system's effectiveness and advantages in efficiency compared to other interfaces. Using the collected data for policy learning has been investigated, which verifies the initial feasibility. We believe the proposed paradigm can facilitate robot learning driven by high-quality demonstrations and efficiency while generating them.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models
Authors:
Longteng Zhang,
Xiang Liu,
Zeyu Li,
Xinglin Pan,
Peijie Dong,
Ruibo Fan,
Rui Guo,
Xin Wang,
Qiong Luo,
Shaohuai Shi,
Xiaowen Chu
Abstract:
Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system…
▽ More
Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.
△ Less
Submitted 1 December, 2023; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining
Authors:
Yuxin Wang,
Shaohuai Shi,
Xin He,
Zhenheng Tang,
Xinglin Pan,
Yang Zheng,
Xiaoyu Wu,
Amelie Chi Zhou,
Bingsheng He,
Xiaowen Chu
Abstract:
Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks v…
▽ More
Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks via CPU memory. In this paper, we identify the frequency of existing checkpoint-based fault-tolerance being significantly limited by the storage I/O overheads, which results in hefty re-training costs on restarting from the nearest checkpoint. In response to this gap, we introduce an in-memory fault-tolerance framework for large-scale LLM pretraining. The framework boosts the efficiency and reliability of fault tolerance from three aspects: (1) Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e., sharded model parameters, optimizer states, and RNG states, to CPU volatile memory, Our framework significantly reduces communication costs and bypasses checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances parameter protection with a two-layer hierarchy: snapshot management processes (SMPs) safeguard against software failures, together with Erasure Coding (EC) protecting against node failures. This double-layered protection greatly improves the survival probability of the parameters compared to existing checkpointing methods. (3) Improved Snapshotting Frequency: Our framework achieves more frequent snapshotting compared with asynchronous checkpointing optimizations under the same saving time budget, which improves the fault tolerance efficiency. Empirical results demonstrate that Our framework minimizes the overhead of fault tolerance of LLM pretraining by effectively leveraging redundant CPU resources.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.