-
WeShap: Weak Supervision Source Evaluation with Shapley Values
Authors:
Naiqing Guan,
Nick Koudas
Abstract:
Efficient data annotation stands as a significant bottleneck in training contemporary machine learning models. The Programmatic Weak Supervision (PWS) pipeline presents a solution by utilizing multiple weak supervision sources to automatically label data, thereby expediting the annotation process. Given the varied contributions of these weak supervision sources to the accuracy of PWS, it is impera…
▽ More
Efficient data annotation stands as a significant bottleneck in training contemporary machine learning models. The Programmatic Weak Supervision (PWS) pipeline presents a solution by utilizing multiple weak supervision sources to automatically label data, thereby expediting the annotation process. Given the varied contributions of these weak supervision sources to the accuracy of PWS, it is imperative to employ a robust and efficient metric for their evaluation. This is crucial not only for understanding the behavior and performance of the PWS pipeline but also for facilitating corrective measures.
In our study, we introduce WeShap values as an evaluation metric, which quantifies the average contribution of weak supervision sources within a proxy PWS pipeline, leveraging the theoretical underpinnings of Shapley values. We demonstrate efficient computation of WeShap values using dynamic programming, achieving quadratic computational complexity relative to the number of weak supervision sources.
Our experiments demonstrate the versatility of WeShap values across various applications, including the identification of beneficial or detrimental labeling functions, refinement of the PWS pipeline, and rectification of mislabeled data. Furthermore, WeShap values aid in comprehending the behavior of the PWS pipeline and scrutinizing specific instances of mislabeled data. Although initially derived from a specific proxy PWS pipeline, we empirically demonstrate the generalizability of WeShap values to other PWS pipeline configurations.
Our findings indicate a noteworthy average improvement of 4.8 points in downstream model accuracy through the revision of the PWS pipeline compared to previous state-of-the-art methods, underscoring the efficacy of WeShap values in enhancing data quality for training machine learning models.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
Authors:
Wen**g Xie,
Juxin Niu,
Chun Jason Xue,
Nan Guan
Abstract:
While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps,…
▽ More
While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps, such as grading rubrics design and post-grading review. There has been a lack of systematic research exploring the potential of LLMs to enhance the entire grading~process.
In this paper, we propose an LLM-based grading system that addresses the entire grading procedure, including the following key components: 1) Develo** grading rubrics that not only consider the questions but also the student answers, which can more accurately reflect students' performance. 2) Under the guidance of grading rubrics, providing accurate and consistent scores for each student, along with customized feedback. 3) Conducting post-grading review to better ensure accuracy and fairness. Additionally, we collected a new dataset named OS from a university operating system course and conducted extensive experiments on both our new dataset and the widely used Mohler dataset. Experiments demonstrate the effectiveness of our proposed approach, providing some new insights for develo** automated grading systems based on LLMs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction
Authors:
Zikang Zhou,
Haibo Hu,
Xinhong Chen,
Jian** Wang,
Nan Guan,
Kui Wu,
Yung-Hui Li,
Yu-Kai Huang,
Chun Jason Xue
Abstract:
Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to lo…
▽ More
Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future," treating each time step as the "current" one, resulting in a simpler, more parameter- and data-efficient design that scales seamlessly with data and computation. Additionally, we introduce the Next-Patch Prediction Paradigm (NP3), which enables models to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. BehaviorGPT ranks first across several metrics on the Waymo Sim Agents Benchmark, demonstrating its exceptional performance in multi-agent and agent-map interactions. We outperformed state-of-the-art models with a realism score of 0.741 and improved the minADE metric to 1.540, with an approximately 91.6% reduction in model parameters.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference
Authors:
Lianming Huang,
Shangyu Wu,
Yufei Cui,
Ying Xiong,
Xue Liu,
Tei-Wei Kuo,
Nan Guan,
Chun Jason Xue
Abstract:
Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to de…
▽ More
Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to design and train the classifiers. To address these limitations, this paper proposes RAEE, a training-free Retrieval-Augmented Early Exiting framework for efficient inference. First, this paper demonstrates that the early exiting problem can be modeled as a distribution prediction problem, where the distribution is approximated using similar data's existing information. Next, the paper details the process of collecting existing information to build the retrieval database. Finally, based on the pre-built retrieval database, RAEE leverages the retrieved similar data's exiting information to guide the backbone model to exit at the layer, which is predicted by the approximated distribution. Experimental results demonstrate that the proposed RAEE can significantly accelerate inference. RAEE also achieves state-of-the-art zero-shot performance on 8 classification tasks.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
IHC Matters: Incorporating IHC analysis to H&E Whole Slide Image Analysis for Improved Cancer Grading via Two-stage Multimodal Bilinear Pooling Fusion
Authors:
Jun Wang,
Yu Mao,
Yufei Cui,
Nan Guan,
Chun Jason Xue
Abstract:
Immunohistochemistry (IHC) plays a crucial role in pathology as it detects the over-expression of protein in tissue samples. However, there are still fewer machine learning model studies on IHC's impact on accurate cancer grading. We discovered that IHC and H\&E possess distinct advantages and disadvantages while possessing certain complementary qualities. Building on this observation, we develope…
▽ More
Immunohistochemistry (IHC) plays a crucial role in pathology as it detects the over-expression of protein in tissue samples. However, there are still fewer machine learning model studies on IHC's impact on accurate cancer grading. We discovered that IHC and H\&E possess distinct advantages and disadvantages while possessing certain complementary qualities. Building on this observation, we developed a two-stage multi-modal bilinear model with a feature pooling module. This model aims to maximize the potential of both IHC and HE's feature representation, resulting in improved performance compared to their individual use. Our experiments demonstrate that incorporating IHC data into machine learning models, alongside H\&E stained images, leads to superior predictive results for cancer grading. The proposed framework achieves an impressive ACC higher of 0.953 on the public dataset BCI.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Impedance Matching: Enabling an RL-Based Running Jump in a Quadruped Robot
Authors:
Neil Guan,
Shangqun Yu,
Shifan Zhu,
Donghyun Kim
Abstract:
Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-…
▽ More
Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-based impedance matching between simulated and real robots. Our framework offers a structured guideline for parameter selection and the range for dynamics randomization in simulation, thus facilitating a safe sim-to-real transfer. The learned policy using our framework enabled jumps across distances of 55 cm and heights of 38 cm. The results are, to the best of our knowledge, one of the highest and longest running jumps demonstrated by an RL-based control policy in a real quadruped robot. Note that the achieved jum** height is approximately 85% of that obtained from a state-of-the-art trajectory optimization method, which can be seen as the physical limit for the given robot hardware. In addition, our control policy accomplished stable walking at speeds up to 2 m/s in the forward and backward directions, and 1 m/s in the sideway direction.
△ Less
Submitted 29 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Pre-processing matters: A segment search method for WSI classification
Authors:
Jun Wang,
Yufei Cui,
Yu Mao,
Nan Guan,
Chun Jason Xue
Abstract:
Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach f…
▽ More
Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach for fast parameter tuning to enhance inference performance on single-domain data. Our method demonstrates significant performance improvements in accuracy, which raise accuracy from 0.512 to 0.847 in a single domain. We further extend our insight into training performance in multi-domain data by employing a novel Bayesian optimization to search optimal pre-processing parameters, resulting in a high AUC of 0.967. We highlight that better pre-processing for WSI can contribute to further accuracy improvement in the histology area.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
On the Compressibility of Quantized Large Language Models
Authors:
Yu Mao,
Weilan Wang,
Hongchao Du,
Nan Guan,
Chun Jason Xue
Abstract:
Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory requirement of LLMs. Quantization is an effective way of reducing the model size while maintaining good performance. However, even after quantization, LLMs may stil…
▽ More
Deploying Large Language Models (LLMs) on edge or mobile devices offers significant benefits, such as enhanced data privacy and real-time processing capabilities. However, it also faces critical challenges due to the substantial memory requirement of LLMs. Quantization is an effective way of reducing the model size while maintaining good performance. However, even after quantization, LLMs may still be too big to fit entirely into the limited memory of edge or mobile devices and have to be partially loaded from the storage to complete the inference. In this case, the I/O latency of model loading becomes the bottleneck of the LLM inference latency. In this work, we take a preliminary step of studying applying data compression techniques to reduce data movement and thus speed up the inference of quantized LLM on memory-constrained devices. In particular, we discussed the compressibility of quantized LLMs, the trade-off between the compressibility and performance of quantized LLMs, and opportunities to optimize both of them jointly.
△ Less
Submitted 5 May, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
ActiveDP: Bridging Active Learning and Data Programming
Authors:
Naiqing Guan,
Nick Koudas
Abstract:
Modern machine learning models require large labelled datasets to achieve good performance, but manually labelling large datasets is expensive and time-consuming. The data programming paradigm enables users to label large datasets efficiently but produces noisy labels, which deteriorates the downstream model's performance. The active learning paradigm, on the other hand, can acquire accurate label…
▽ More
Modern machine learning models require large labelled datasets to achieve good performance, but manually labelling large datasets is expensive and time-consuming. The data programming paradigm enables users to label large datasets efficiently but produces noisy labels, which deteriorates the downstream model's performance. The active learning paradigm, on the other hand, can acquire accurate labels but only for a small fraction of instances. In this paper, we propose ActiveDP, an interactive framework bridging active learning and data programming together to generate labels with both high accuracy and coverage, combining the strengths of both paradigms. Experiments show that ActiveDP outperforms previous weak supervision and active learning approaches and consistently performs well under different labelling budgets.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Can Large Language Models Design Accurate Label Functions?
Authors:
Naiqing Guan,
Kaiwen Chen,
Nick Koudas
Abstract:
Programmatic weak supervision methodologies facilitate the expedited labeling of extensive datasets through the use of label functions (LFs) that encapsulate heuristic data sources. Nonetheless, the creation of precise LFs necessitates domain expertise and substantial endeavors. Recent advances in pre-trained language models (PLMs) have exhibited substantial potential across diverse tasks. However…
▽ More
Programmatic weak supervision methodologies facilitate the expedited labeling of extensive datasets through the use of label functions (LFs) that encapsulate heuristic data sources. Nonetheless, the creation of precise LFs necessitates domain expertise and substantial endeavors. Recent advances in pre-trained language models (PLMs) have exhibited substantial potential across diverse tasks. However, the capacity of PLMs to autonomously formulate accurate LFs remains an underexplored domain. In this research, we address this gap by introducing DataSculpt, an interactive framework that harnesses PLMs for the automated generation of LFs. Within DataSculpt, we incorporate an array of prompting techniques, instance selection strategies, and LF filtration methods to explore the expansive design landscape. Ultimately, we conduct a thorough assessment of DataSculpt's performance on 12 real-world datasets, encompassing a range of tasks. This evaluation unveils both the strengths and limitations of contemporary PLMs in LF design.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Multi-Path Bound for DAG Tasks
Authors:
Qingqiang He,
Nan Guan,
Shuai Zhao,
Mingsong Lv
Abstract:
This paper studies the response time bound of a DAG (directed acyclic graph) task. Recently, the idea of using multiple paths to bound the response time of a DAG task, instead of using a single longest path in previous results, was proposed and leads to the so-called multi-path bound. Multi-path bounds can greatly reduce the response time bound and significantly improve the schedulability of DAG t…
▽ More
This paper studies the response time bound of a DAG (directed acyclic graph) task. Recently, the idea of using multiple paths to bound the response time of a DAG task, instead of using a single longest path in previous results, was proposed and leads to the so-called multi-path bound. Multi-path bounds can greatly reduce the response time bound and significantly improve the schedulability of DAG tasks. This paper derives a new multi-path bound and proposes an optimal algorithm to compute this bound. We further present a systematic analysis on the dominance and the sustainability of three existing multi-path bounds and the proposed multi-path bound. Our bound theoretically dominates and empirically outperforms all existing multi-path bounds. What's more, the proposed bound is the only multi-path bound that is proved to be self-sustainable.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
MVP: Meta Visual Prompt Tuning for Few-Shot Remote Sensing Image Scene Classification
Authors:
Junjie Zhu,
Yiying Li,
Chun** Qiu,
Ke Yang,
Naiyang Guan,
Xiaodong Yi
Abstract:
Vision Transformer (ViT) models have recently emerged as powerful and versatile models for various visual tasks. Recently, a work called PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models. However, PMF employs full fine-tuning for learning the downstream tasks, leading to significant overfitting and storage issues, especially in t…
▽ More
Vision Transformer (ViT) models have recently emerged as powerful and versatile models for various visual tasks. Recently, a work called PMF has achieved promising results in few-shot image classification by utilizing pre-trained vision transformer models. However, PMF employs full fine-tuning for learning the downstream tasks, leading to significant overfitting and storage issues, especially in the remote sensing domain. In order to tackle these issues, we turn to the recently proposed parameter-efficient tuning methods, such as VPT, which updates only the newly added prompt parameters while kee** the pre-trained backbone frozen. Inspired by VPT, we propose the Meta Visual Prompt Tuning (MVP) method. Specifically, we integrate the VPT method into the meta-learning framework and tailor it to the remote sensing domain, resulting in an efficient framework for Few-Shot Remote Sensing Scene Classification (FS-RSSC). Furthermore, we introduce a novel data augmentation strategy based on patch embedding recombination to enhance the representation and diversity of scenes for classification purposes. Experiment results on the FS-RSSC benchmark demonstrate the superior performance of the proposed MVP over existing methods in various settings, such as various-way-various-shot, various-way-one-shot, and cross-domain adaptation.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Timely Fusion of Surround Radar/Lidar for Object Detection in Autonomous Driving Systems
Authors:
Wen**g Xie,
Tao Hu,
Neiwen Ling,
Guoliang Xing,
Chun Jason Xue,
Nan Guan
Abstract:
Fusing Radar and Lidar sensor data can fully utilize their complementary advantages and provide more accurate reconstruction of the surrounding for autonomous driving systems. Surround Radar/Lidar can provide 360-degree view sampling with the minimal cost, which are promising sensing hardware solutions for autonomous driving systems. However, due to the intrinsic physical constraints, the rotating…
▽ More
Fusing Radar and Lidar sensor data can fully utilize their complementary advantages and provide more accurate reconstruction of the surrounding for autonomous driving systems. Surround Radar/Lidar can provide 360-degree view sampling with the minimal cost, which are promising sensing hardware solutions for autonomous driving systems. However, due to the intrinsic physical constraints, the rotating speed of surround Radar, and thus the frequency to generate Radar data frames, is much lower than surround Lidar. Existing Radar/Lidar fusion methods have to work at the low frequency of surround Radar, which cannot meet the high responsiveness requirement of autonomous driving systems.This paper develops techniques to fuse surround Radar/Lidar with working frequency only limited by the faster surround Lidar instead of the slower surround Radar, based on the state-of-the-art object detection model MVDNet. The basic idea of our approach is simple: we let MVDNet work with temporally unaligned data from Radar/Lidar, so that fusion can take place at any time when a new Lidar data frame arrives, instead of waiting for the slow Radar data frame. However, directly applying MVDNet to temporally unaligned Radar/Lidar data greatly degrades its object detection accuracy. The key information revealed in this paper is that we can achieve high output frequency with little accuracy loss by enhancing the training procedure to explore the temporal redundancy in MVDNet so that it can tolerate the temporal unalignment of input data. We explore several different ways of training enhancement and compare them quantitatively with experiments.
△ Less
Submitted 27 May, 2024; v1 submitted 9 September, 2023;
originally announced September 2023.
-
Longer Is Shorter: Making Long Paths to Improve the Worst-Case Response Time of DAG Tasks
Authors:
Qingqiang He,
Nan Guan,
Mingsong Lv
Abstract:
DAG (directed acyclic graph) tasks are widely used to model parallel real-time workload. The real-time performance of a DAG task not only depends on its total workload, but also its graph structure. Intuitively, with the same total workload, a DAG task with looser precedence constraints tends to have better real-time performance in terms of worst-case response time. However, this paper shows that…
▽ More
DAG (directed acyclic graph) tasks are widely used to model parallel real-time workload. The real-time performance of a DAG task not only depends on its total workload, but also its graph structure. Intuitively, with the same total workload, a DAG task with looser precedence constraints tends to have better real-time performance in terms of worst-case response time. However, this paper shows that actually we can shorten the worst-case response time of a DAG task by carefully adding new edges and constructing longer paths. We develop techniques based on the state-of-the-art DAG response time analysis techniques to properly add new edges so that the worst-case response time bound guaranteed by formal analysis can be significantly reduced. Experiments under different parameter settings demonstrate the effectiveness of the proposed techniques.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU
Authors:
Zhihe Zhao,
Neiwen Ling,
Nan Guan,
Guoliang Xing
Abstract:
Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardwa…
▽ More
Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show that Miriam can increase system throughput by 92% while only incurring less than 10\% latency overhead for critical tasks, compared to state of art baselines.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
LPN: Language-guided Prototypical Network for few-shot classification
Authors:
Kaihui Cheng,
Chule Yang,
Xiao Liu,
Naiyang Guan,
Zhiyuan Wang
Abstract:
Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the accessible data, recent methods explore suitable measures for the similarity between the query and support images and better high-dimensional features with meta-training and pre-training strategies. However, the potential of multi-modality information has barely been explored, which may bring promisi…
▽ More
Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the accessible data, recent methods explore suitable measures for the similarity between the query and support images and better high-dimensional features with meta-training and pre-training strategies. However, the potential of multi-modality information has barely been explored, which may bring promising improvement for few-shot classification. In this paper, we propose a Language-guided Prototypical Network (LPN) for few-shot classification, which leverages the complementarity of vision and language modalities via two parallel branches to improve the classifier. Concretely, to introduce language modality with limited samples in the visual task, we leverage a pre-trained text encoder to extract class-level text features directly from class names while processing images with a conventional image encoder. Then, we introduce a language-guided decoder to obtain text features corresponding to each image by aligning class-level features with visual features. Additionally, we utilize class-level features and prototypes to build a refined prototypical head, which generates robust prototypes in the text branch for follow-up measurement. Furthermore, we leverage the class-level features to align the visual features, capturing more class-relevant visual features. Finally, we aggregate the visual and text logits to calibrate the deviation of a single modality, enhancing the overall performance. Extensive experiments demonstrate the competitiveness of LPN against state-of-the-art methods on benchmark datasets.
△ Less
Submitted 21 October, 2023; v1 submitted 4 July, 2023;
originally announced July 2023.
-
PVP: Pre-trained Visual Parameter-Efficient Tuning
Authors:
Zhao Song,
Ke Yang,
Naiyang Guan,
Junjie Zhu,
Peng Qiao,
Qingyong Hu
Abstract:
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly redu…
▽ More
Large-scale pre-trained transformers have demonstrated remarkable success in various computer vision tasks. However, it is still highly challenging to fully fine-tune these models for downstream tasks due to their high computational and storage costs. Recently, Parameter-Efficient Tuning (PETuning) techniques, e.g., Visual Prompt Tuning (VPT) and Low-Rank Adaptation (LoRA), have significantly reduced the computation and storage cost by inserting lightweight prompt modules into the pre-trained models and tuning these prompt modules with a small number of trainable parameters, while kee** the transformer backbone frozen. Although only a few parameters need to be adjusted, most PETuning methods still require a significant amount of downstream task training data to achieve good results. The performance is inadequate on low-data regimes, especially when there are only one or two examples per class. To this end, we first empirically identify the poor performance is mainly due to the inappropriate way of initializing prompt modules, which has also been verified in the pre-trained language models. Next, we propose a Pre-trained Visual Parameter-efficient (PVP) Tuning framework, which pre-trains the parameter-efficient tuning modules first and then leverages the pre-trained modules along with the pre-trained transformer backbone to perform parameter-efficient tuning on downstream tasks. Experiment results on five Fine-Grained Visual Classification (FGVC) and VTAB-1k datasets demonstrate that our proposed method significantly outperforms state-of-the-art PETuning methods.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Estimating Regression Predictive Distributions with Sample Networks
Authors:
Ali Harakeh,
Jordan Hu,
Naiqing Guan,
Steven L. Waslander,
Liam Paull
Abstract:
Estimating the uncertainty in deep neural network predictions is crucial for many real-world applications. A common approach to model uncertainty is to choose a parametric distribution and fit the data to it using maximum likelihood estimation. The chosen parametric form can be a poor fit to the data-generating distribution, resulting in unreliable uncertainty estimates. In this work, we propose S…
▽ More
Estimating the uncertainty in deep neural network predictions is crucial for many real-world applications. A common approach to model uncertainty is to choose a parametric distribution and fit the data to it using maximum likelihood estimation. The chosen parametric form can be a poor fit to the data-generating distribution, resulting in unreliable uncertainty estimates. In this work, we propose SampleNet, a flexible and scalable architecture for modeling uncertainty that avoids specifying a parametric form on the output distribution. SampleNets do so by defining an empirical distribution using samples that are learned with the Energy Score and regularized with the Sinkhorn Divergence. SampleNets are shown to be able to well-fit a wide range of distributions and to outperform baselines on large-scale real-world regression tasks.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Bounding the Response Time of DAG Tasks Using Long Paths
Authors:
Qingqiang He,
Nan Guan,
Mingsong Lv,
Xu Jiang,
Wanli Chang
Abstract:
In 1969, Graham developed a well-known response time bound for a DAG task using the total workload and the longest path of the DAG, which has been widely applied to solve many scheduling and analysis problems of DAG-based task systems. This paper presents a new response time bound for a DAG task using the total workload and the lengths of multiple long paths of the DAG, instead of the longest path…
▽ More
In 1969, Graham developed a well-known response time bound for a DAG task using the total workload and the longest path of the DAG, which has been widely applied to solve many scheduling and analysis problems of DAG-based task systems. This paper presents a new response time bound for a DAG task using the total workload and the lengths of multiple long paths of the DAG, instead of the longest path in Graham's bound. Our new bound theoretically dominates and empirically outperforms Graham's bound. We further extend the proposed approach to multi-DAG task systems. Our schedulability test theoretically dominates federated scheduling and outperforms the state-of-the-art by a considerable margin.
△ Less
Submitted 17 November, 2022; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Enhanced compound-protein binding affinity prediction by representing protein multimodal information via a coevolutionary strategy
Authors:
Binjie Guo,
Hanyu Zheng,
Haohan Jiang,
Xiaodan Li,
Naiyu Guan,
Yanming Zuo,
Yicheng Zhang,
Hengfu Yang,
Xuhua Wang
Abstract:
Due to the lack of a method to efficiently represent the multimodal information of a protein, including its structure and sequence information, predicting compound-protein binding affinity (CPA) still suffers from low accuracy when applying machine learning methods. To overcome this limitation, in a novel end-to-end architecture (named FeatNN), we develop a coevolutionary strategy to jointly repre…
▽ More
Due to the lack of a method to efficiently represent the multimodal information of a protein, including its structure and sequence information, predicting compound-protein binding affinity (CPA) still suffers from low accuracy when applying machine learning methods. To overcome this limitation, in a novel end-to-end architecture (named FeatNN), we develop a coevolutionary strategy to jointly represent the structure and sequence features of proteins and ultimately optimize the mathematical models for predicting CPA. Furthermore, from the perspective of data-driven approach, we proposed a rational method that can utilize both high- and low-quality databases to optimize the accuracy and generalization ability of FeatNN in CPA prediction tasks. Notably, we visually interpret the feature interaction process between sequence and structure in the rationally designed architecture. As a result, FeatNN considerably outperforms the state-of-the-art (SOTA) baseline in virtual drug screening tasks, indicating the feasibility of this approach for practical use. FeatNN provides an outstanding method for higher CPA prediction accuracy and better generalization ability by efficiently representing multimodal information of proteins via a coevolutionary strategy.
△ Less
Submitted 23 November, 2022; v1 submitted 29 March, 2022;
originally announced April 2022.
-
Moses: Efficient Exploitation of Cross-device Transferable Features for Tensor Program Optimization
Authors:
Zhihe Zhao,
Xian Shuai,
Yang Bai,
Neiwen Ling,
Nan Guan,
Zhenyu Yan,
Guoliang Xing
Abstract:
Achieving efficient execution of machine learning models has attracted significant attention recently. To generate tensor programs efficiently, a key component of DNN compilers is the cost model that can predict the performance of each configuration on specific devices. However, due to the rapid emergence of hardware platforms, it is increasingly labor-intensive to train domain-specific predictors…
▽ More
Achieving efficient execution of machine learning models has attracted significant attention recently. To generate tensor programs efficiently, a key component of DNN compilers is the cost model that can predict the performance of each configuration on specific devices. However, due to the rapid emergence of hardware platforms, it is increasingly labor-intensive to train domain-specific predictors for every new platform. Besides, current design of cost models cannot provide transferable features between different hardware accelerators efficiently and effectively. In this paper, we propose Moses, a simple and efficient design based on the lottery ticket hypothesis, which fully takes advantage of the features transferable to the target device via domain adaptation. Compared with state-of-the-art approaches, Moses achieves up to 1.53X efficiency gain in the search stage and 1.41X inference speedup on challenging DNN benchmarks.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation
Authors:
Liqiang Lu,
Naiqing Guan,
Yuyue Wang,
Liancheng Jia,
Zizhang Luo,
Jieming Yin,
Jason Cong,
Yun Liang
Abstract:
Accelerating tensor applications on spatial architectures provides high performance and energy-efficiency, but requires accurate performance models for evaluating various dataflow alternatives. Such modeling relies on the notation of tensor dataflow and the formulation of performance metrics. Recent proposed compute-centric and data-centric notations describe the dataflow using imperative directiv…
▽ More
Accelerating tensor applications on spatial architectures provides high performance and energy-efficiency, but requires accurate performance models for evaluating various dataflow alternatives. Such modeling relies on the notation of tensor dataflow and the formulation of performance metrics. Recent proposed compute-centric and data-centric notations describe the dataflow using imperative directives. However, these two notations are less expressive and thus lead to limited optimization opportunities and inaccurate performance models.
In this paper, we propose a framework TENET that models hardware dataflow of tensor applications. We start by introducing a relation-centric notation, which formally describes the hardware dataflow for tensor computation. The relation-centric notation specifies the hardware dataflow, PE interconnection, and data assignment in a uniform manner using relations. The relation-centric notation is more expressive than the compute-centric and data-centric notations by using more sophisticated affine transformations. Another advantage of relation-centric notation is that it inherently supports accurate metrics estimation, including data reuse, bandwidth, latency, and energy. TENET computes each performance metric by counting the relations using integer set structures and operators. Overall, TENET achieves 37.4\% and 51.4\% latency reduction for CONV and GEMM kernels compared with the state-of-the-art data-centric notation by identifying more sophisticated hardware dataflows.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Schedulability Bounds for Parallel Real-Time Tasks under Global Rate-Monotonic Scheduling
Authors:
Xu Jiang,
Nan Guan,
Maolin Yang,
Yue Tang,
Wang Yi
Abstract:
Schedulability bounds not only serve as efficient tests to decide schedulability of real-time task systems but also reveal insights about the worst-case performance of scheduling algorithms. Different from sequential real-time task systems for which utilization is a suitable metric to develop schedulability bounds, schedulability of parallel real-time tasks depends on not only utilization but also…
▽ More
Schedulability bounds not only serve as efficient tests to decide schedulability of real-time task systems but also reveal insights about the worst-case performance of scheduling algorithms. Different from sequential real-time task systems for which utilization is a suitable metric to develop schedulability bounds, schedulability of parallel real-time tasks depends on not only utilization but also the workload graph structure of tasks, which can be well represented by the tensity metric. In this paper, we develop new analysis techniques for parallel real-time task systems under Global Rate-Monotonic (G-RM) scheduling and obtain new results on schedulability bounds based on these two metrics: utilization and tensity. First, we develop the first utilization-tensity bound for G-RM. Second, we improve the capacity augmentation bound of G-RM from the best known value 3.73 to 3.18. These schedulability bounds not only provide theoretical insights about real-time performance of G-RM, but also serve as highly efficient schedulability tests, which are particularly suitable to design scenarios in which detailed task graph structures are unknown or may change at run-time. Experiments with randomly generated task sets show that our new results consistently outperform the state-of-the-art with a significant margin under different parameter settings.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
DPCP-p: A Distributed Locking Protocol for Parallel Real-Time Tasks
Authors:
Maolin Yang,
Zewei Chen,
Xu Jiang,
Nan Guan,
Hang Lei
Abstract:
Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared resources. This paper for the first time studies the distributed synchronization framework of parallel real-time tasks, where both tasks and global resources ar…
▽ More
Real-time scheduling and locking protocols are fundamental facilities to construct time-critical systems. For parallel real-time tasks, predictable locking protocols are required when concurrent sub-jobs mutually exclusive access to shared resources. This paper for the first time studies the distributed synchronization framework of parallel real-time tasks, where both tasks and global resources are partitioned to designated processors, and requests to each global resource are conducted on the processor on which the resource is partitioned. We extend the Distributed Priority Ceiling Protocol (DPCP) for parallel tasks under federated scheduling, with which we proved that a request can be blocked by at most one lower-priority request. We develop task and resource partitioning heuristics and propose analysis techniques to safely bound the task response times. Numerical evaluation (with heavy tasks on 8-, 16-, and 32-core processors) indicates that the proposed methods improve the schedulability significantly compared to the state-of-the-art locking protocols under federated scheduling.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
On the Analysis of Parallel Real-Time Tasks with Spin Locks
Authors:
Xu Jiang,
Nan Guan,
He Du,
Weichen Liu,
Wang Yi
Abstract:
Locking protocol is an essential component in resource management of real-time systems, which coordinates mutually exclusive accesses to shared resources from different tasks. Although the design and analysis of locking protocols have been intensively studied for sequential real-time tasks, there has been little work on this topic for parallel real-time tasks. In this paper, we study the analysis…
▽ More
Locking protocol is an essential component in resource management of real-time systems, which coordinates mutually exclusive accesses to shared resources from different tasks. Although the design and analysis of locking protocols have been intensively studied for sequential real-time tasks, there has been little work on this topic for parallel real-time tasks. In this paper, we study the analysis of parallel real-time tasks using spin locks to protect accesses to shared resources in three commonly used request serving orders (unordered, FIFO-order and priority-order). A remarkable feature making our analysis method more accurate is to systematically analyze the blocking time which may delay a task's finishing time, where the impact to the total workload and the longest path length is jointly considered, rather than analyzing them separately and counting all blocking time as the workload that delays a task's finishing time, as commonly assumed in the state-of-the-art.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
Truncated Cauchy Non-negative Matrix Factorization
Authors:
Naiyang Guan,
Tongliang Liu,
Yangmuzi Zhang,
Dacheng Tao,
Larry S. Davis
Abstract:
Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy…
▽ More
Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy datasets contaminated by outliers. We theoretically analyze the robustness of Truncated CauchyNMF comparing with the competing models and theoretically prove that Truncated CauchyNMF has a generalization bound which converges at a rate of order $O(\sqrt{{\ln n}/{n}})$, where $n$ is the sample size. We evaluate Truncated CauchyNMF by image clustering on both simulated and real datasets. The experimental results on the datasets containing gross corruptions validate the effectiveness and robustness of Truncated CauchyNMF for learning robust subspaces.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Response Time Bounds for Typed DAG Parallel Tasks on Heterogeneous Multi-cores
Authors:
Meiling Han,
Nan Guan,
**ghao Sun,
Qingqiang He,
Qingxu Deng,
Weichen Liu
Abstract:
Heterogeneous multi-cores utilize the strength of different architectures for executing particular types of workload, and usually offer higher performance and energy efficiency. In this paper, we study the worst-case response time (WCRT) analysis of \emph{typed} scheduling of parallel DAG tasks on heterogeneous multi-cores, where the workload of each vertex in the DAG is only allowed to execute on…
▽ More
Heterogeneous multi-cores utilize the strength of different architectures for executing particular types of workload, and usually offer higher performance and energy efficiency. In this paper, we study the worst-case response time (WCRT) analysis of \emph{typed} scheduling of parallel DAG tasks on heterogeneous multi-cores, where the workload of each vertex in the DAG is only allowed to execute on a particular type of cores. The only known WCRT bound for this problem is grossly pessimistic and suffers the \emph{non-self-sustainability} problem. In this paper, we propose two new WCRT bounds. The first new bound has the same time complexity as the existing bound, but is more precise and solves its \emph{non-self-sustainability} problem. The second new bound explores more detailed task graph structure information to greatly improve the precision, but is computationally more expensive. We prove that the problem of computing the second bound is strongly NP-hard if the number of types in the system is a variable, and develop an efficient algorithm which has polynomial time complexity if the number of types is a constant. Experiments with randomly generated workload show that our proposed new methods are significantly more precise than the existing bound while having good scalability.
△ Less
Submitted 7 August, 2018;
originally announced September 2018.
-
Utilization-Based Scheduling of Flexible Mixed-Criticality Real-Time Tasks
Authors:
Gang Chen,
Nan Guan,
Di Liu,
Qingqiang He,
Kai Huang,
Todor Stefanov,
Wang Yi
Abstract:
Mixed-criticality models are an emerging paradigm for the design of real-time systems because of their significantly improved resource efficiency. However, formal mixed-criticality models have traditionally been characterized by two impractical assumptions: once \textit{any} high-criticality task overruns, \textit{all} low-criticality tasks are suspended and \textit{all other} high-criticality tas…
▽ More
Mixed-criticality models are an emerging paradigm for the design of real-time systems because of their significantly improved resource efficiency. However, formal mixed-criticality models have traditionally been characterized by two impractical assumptions: once \textit{any} high-criticality task overruns, \textit{all} low-criticality tasks are suspended and \textit{all other} high-criticality tasks are assumed to exhibit high-criticality behaviors at the same time. In this paper, we propose a more realistic mixed-criticality model, called the flexible mixed-criticality (FMC) model, in which these two issues are addressed in a combined manner. In this new model, only the overrun task itself is assumed to exhibit high-criticality behavior, while other high-criticality tasks remain in the same mode as before. The guaranteed service levels of low-criticality tasks are gracefully degraded with the overruns of high-criticality tasks. We derive a utilization-based technique to analyze the schedulability of this new mixed-criticality model under EDF-VD scheduling. During runtime, the proposed test condition serves an important criterion for dynamic service level tuning, by means of which the maximum available execution budget for low-criticality tasks can be directly determined with minimal overhead while guaranteeing mixed-criticality schedulability. Experiments demonstrate the effectiveness of the FMC scheme compared with state-of-the-art techniques.
△ Less
Submitted 29 September, 2017;
originally announced November 2017.
-
Semi-Federated Scheduling of Parallel Real-Time Tasks on Multiprocessors
Authors:
Xu Jiang,
Nan Guan,
Xiang Long,
Wang Yi
Abstract:
Federated scheduling is a promising approach to schedule parallel real-time tasks on multi-cores, where each heavy task exclusively executes on a number of dedicated processors, while light tasks are treated as sequential sporadic tasks and share the remaining processors. However, federated scheduling suffers resource waste since a heavy task with processing capacity requirement $x + ε$ (where…
▽ More
Federated scheduling is a promising approach to schedule parallel real-time tasks on multi-cores, where each heavy task exclusively executes on a number of dedicated processors, while light tasks are treated as sequential sporadic tasks and share the remaining processors. However, federated scheduling suffers resource waste since a heavy task with processing capacity requirement $x + ε$ (where $x$ is an integer and $0 < ε< 1$) needs $x + 1$ dedicated processors. In the extreme case, almost half of the processing capacity is wasted. In this paper we propose the semi-federate scheduling approach, which only grants $x$ dedicated processors to a heavy task with processing capacity requirement $x + ε$, and schedules the remaining $ε$ part together with light tasks on shared processors. Experiments with randomly generated task sets show the semi-federated scheduling approach significantly outperforms not only federated scheduling, but also all existing approaches for scheduling parallel real-time tasks on multi-cores.
△ Less
Submitted 9 May, 2017;
originally announced May 2017.
-
EDF-VD Scheduling of Mixed-Criticality Systems with Degraded Quality Guarantees
Authors:
Di Liu,
Jelena Spasic,
Gang Chen,
Nan Guan,
Songran Liu,
Todor Stefanov,
Wang Yi
Abstract:
This paper studies real-time scheduling of mixed-criticality systems where low-criticality tasks are still guaranteed some service in the high-criticality mode, with reduced execution budgets. First, we present a utilization-based schedulability test for such systems under EDF-VD scheduling. Second, we quantify the suboptimality of EDF-VD (with our test condition) in terms of speedup factors. In g…
▽ More
This paper studies real-time scheduling of mixed-criticality systems where low-criticality tasks are still guaranteed some service in the high-criticality mode, with reduced execution budgets. First, we present a utilization-based schedulability test for such systems under EDF-VD scheduling. Second, we quantify the suboptimality of EDF-VD (with our test condition) in terms of speedup factors. In general, the speedup factor is a function with respect to the ratio between the amount of resource required by different types of tasks in different criticality modes, and reaches 4/3 in the worst case. Furthermore, we show that the proposed utilization-based schedulability test and speedup factor results apply to the elastic mixed-criticality model as well. Experiments show effectiveness of our proposed method and confirm the theoretical suboptimality results.
△ Less
Submitted 4 May, 2016;
originally announced May 2016.
-
MahNMF: Manhattan Non-negative Matrix Factorization
Authors:
Naiyang Guan,
Dacheng Tao,
Zhigang Luo,
John Shawe-Taylor
Abstract:
Non-negative matrix factorization (NMF) approximates a non-negative matrix $X$ by a product of two non-negative low-rank factor matrices $W$ and $H$. NMF and its extensions minimize either the Kullback-Leibler divergence or the Euclidean distance between $X$ and $W^T H$ to model the Poisson noise or the Gaussian noise. In practice, when the noise distribution is heavy tailed, they cannot perform w…
▽ More
Non-negative matrix factorization (NMF) approximates a non-negative matrix $X$ by a product of two non-negative low-rank factor matrices $W$ and $H$. NMF and its extensions minimize either the Kullback-Leibler divergence or the Euclidean distance between $X$ and $W^T H$ to model the Poisson noise or the Gaussian noise. In practice, when the noise distribution is heavy tailed, they cannot perform well. This paper presents Manhattan NMF (MahNMF) which minimizes the Manhattan distance between $X$ and $W^T H$ for modeling the heavy tailed Laplacian noise. Similar to sparse and low-rank matrix decompositions, MahNMF robustly estimates the low-rank part and the sparse part of a non-negative matrix and thus performs effectively when data are contaminated by outliers. We extend MahNMF for various practical applications by develo** box-constrained MahNMF, manifold regularized MahNMF, group sparse MahNMF, elastic net inducing MahNMF, and symmetric MahNMF. The major contribution of this paper lies in two fast optimization algorithms for MahNMF and its extensions: the rank-one residual iteration (RRI) method and Nesterov's smoothing method. In particular, by approximating the residual matrix by the outer product of one row of W and one row of $H$ in MahNMF, we develop an RRI method to iteratively update each variable of $W$ and $H$ in a closed form solution. Although RRI is efficient for small scale MahNMF and some of its extensions, it is neither scalable to large scale matrices nor flexible enough to optimize all MahNMF extensions. Since the objective functions of MahNMF and its extensions are neither convex nor smooth, we apply Nesterov's smoothing method to recursively optimize one factor matrix with another matrix fixed. By setting the smoothing parameter inversely proportional to the iteration number, we improve the approximation accuracy iteratively for both MahNMF and its extensions.
△ Less
Submitted 14 July, 2012;
originally announced July 2012.
-
Viscosity and dilepton production of a chemically equilibrating quark-gluon plasma at finite baryon density
Authors:
N. N. Guan,
Z. J. He,
J. L. Long,
X. Z. Cai,
Y. G. Ma,
J. W. Li,
W. Q. Shen
Abstract:
By considering the effect of shear viscosity we have investigated the evolution of a chemically equilibrating quark-gluon plasma at finite baryon density. Based on the evolution of the system we have performed a complete calculation for the dilepton production from the following processes: $q\bar{q}{\to}l\bar{l}$, $q\bar{q}{\to}gl\bar{l}$, Compton-like scattering ($qg{\to}ql\bar{l}$,…
▽ More
By considering the effect of shear viscosity we have investigated the evolution of a chemically equilibrating quark-gluon plasma at finite baryon density. Based on the evolution of the system we have performed a complete calculation for the dilepton production from the following processes: $q\bar{q}{\to}l\bar{l}$, $q\bar{q}{\to}gl\bar{l}$, Compton-like scattering ($qg{\to}ql\bar{l}$, $\bar{q}g{\to}{\bar{q}}l\bar{l}$), gluon fusion $g\bar{g}{\to}c\bar{c}$, annihilation $q\bar{q}{\to}c\bar{c}$ as well as the multiple scattering of quarks. We have found that quark-antiquark annihilation, Compton-like scatterring, gluon fusion, and multiple scattering of quarks give important contributions. Moreover, we have also found that the dilepton yield is an increasing function of the initial quark chemical potential, and the increase of the quark phase lifetime because of the viscosity also obviously raises the dilepton yield.
△ Less
Submitted 2 September, 2009;
originally announced September 2009.