Skip to main content

Showing 1–50 of 67 results for author: Dai, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00945  [pdf, other

    cs.LG

    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.14909  [pdf, other

    cs.LG cs.AI cs.CL

    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Authors: Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring thei… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 10 pages

    ACM Class: I.2.7

  3. arXiv:2406.14629  [pdf, other

    cs.CL cs.AI

    Can LLMs Learn by Teaching? A Preliminary Study

    Authors: Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  4. arXiv:2406.14373  [pdf, other

    cs.AI cs.CL cs.CY cs.HC cs.MA

    Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory

    Authors: Gordon Dai, Weijia Zhang, **han Li, Siqi Yang, Chidera Onochie lbe, Srihas Rao, Arthur Caetano, Misha Sra

    Abstract: The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.12718  [pdf, other

    cs.CV cs.AI cs.CL

    AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

    Authors: Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, ** Chen, Shijian Lu

    Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.08552  [pdf, other

    cs.CV

    DiTFastAttn: Attention Compression for Diffusion Transformer Models

    Authors: Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention's quadratic complexity. We propose DiTFastAttn, a novel post-training compression method to alleviate DiT's computational bottleneck. We identify three key redundancies in the attention computation during DiT inference: 1. spatial redundancy, where many attention heads focus on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.02540  [pdf, other

    cs.CV

    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

    Authors: Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

    Abstract: Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an ef… ▽ More

    Submitted 30 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project Page: https://a-suozhang.xyz/viditq.github.io/

  8. arXiv:2405.19012  [pdf, other

    cs.AI

    Implicit Neural Image Field for Biological Microscopy Image Compression

    Authors: Gaole Dai, Cheng-Ching Tseng, Qingpo Wuwu, Rongyu Zhang, Shaokang Wang, Ming Lu, Tiejun Huang, Yu Zhou, Ali Ata Tuz, Matthias Gunzer, Jianxu Chen, Shanghang Zhang

    Abstract: The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization. This necessitates the development of efficient compression solutions. Traditional CODEC methods struggle to adapt to the diverse bioimaging data and often suffer from sub-optimal compression. In this study, we propose… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  9. arXiv:2405.17873  [pdf, other

    cs.CV cs.AI

    MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

    Authors: Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantiz… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project Page: https://a-suozhang.xyz/mixdq.github.io/

  10. arXiv:2405.17761  [pdf, other

    cs.LG math.OC

    Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

    Authors: Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang

    Abstract: Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  11. arXiv:2405.16486  [pdf, other

    cs.CV cs.AI

    Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

    Authors: Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

    Abstract: Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system'… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  12. arXiv:2405.16256  [pdf, other

    cs.DC cs.AI

    HetHub: A Heterogeneous distributed hybrid training system for large-scale models

    Authors: Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang

    Abstract: The development of large-scale models relies on a vast number of computing resources. For example, the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs for its training. It is a challenge to build a large-scale cluster with a type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a cluster is an effective way to solve the problem of insufficient homogeneous GP… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  13. arXiv:2405.14224  [pdf, other

    cs.CV

    DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

    Authors: Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

    Abstract: Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant challenges when dealing with high-resolution images. In this work, we propose Diffusion Mamba (DiM), which combines the efficiency of Mamba, a sequence model based… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  14. arXiv:2405.02538  [pdf, other

    cs.CV

    AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

    Authors: Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

    Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multip… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  15. arXiv:2404.15305  [pdf, other

    eess.SP cs.LG

    ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay

    Authors: Hyungjun Yoon, Jaehyun Kwak, Biniyam Aschalew Tolera, Gaole Dai, Mo Li, Taesik Gong, Kimin Lee, Sung-Ju Lee

    Abstract: Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  16. arXiv:2404.14294  [pdf, other

    cs.CL cs.AI

    A Survey on Efficient Inference for Large Language Models

    Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-** Zhang, Yuhan Dong, Yu Wang

    Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards develo** techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More

    Submitted 8 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  17. arXiv:2404.06710  [pdf, other

    cs.CV cs.AI

    SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

    Authors: Gaole Dai, Zhenyu Wang, Qinwen Xu, Ming Lu, Wen Chen, Boxin Shi, Shanghang Zhang, Tiejun Huang

    Abstract: One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information… ▽ More

    Submitted 12 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  18. arXiv:2404.02241  [pdf, other

    cs.CV

    Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks. When training DM and CM, intermediate weight checkpoints are not fully utilized and only the last converged checkpoint is used. In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by pro… ▽ More

    Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  19. arXiv:2404.00878  [pdf, other

    cs.CV

    TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

    Authors: Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong Liu, **gdong Wang

    Abstract: Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the wide… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  20. arXiv:2403.19235  [pdf, other

    cs.CV

    DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

    Authors: Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong Liu, **gdong Wang

    Abstract: While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context. Existing personalization methods either require time-consuming optimization or learning additional encoders, ad… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.16379  [pdf, other

    cs.CV

    FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

    Authors: Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: In recent years, there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately, the evaluation process could consume a significant amount of computational resources, making the required periodic evaluation of model performance (e.g., monitoring training progr… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: The paper is accepted by CVPR 2024

  22. arXiv:2403.05105  [pdf, other

    cs.CV cs.AI cs.MM

    Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

    Authors: Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, **gdong Wang

    Abstract: Collecting well-matched multimedia datasets is crucial for training cross-modal retrieval models. However, in real-world scenarios, massive multimodal data are harvested from the Internet, which inevitably contains Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data will remarkably harm the cross-modal retrieval performance. Previous efforts tend to mitigate this proble… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  23. arXiv:2402.18158  [pdf, other

    cs.CL cs.AI

    Evaluating Quantized Large Language Models

    Authors: Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs). Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs. To meet the requirements of both high efficiency and performance across diverse scenarios, a comprehensive evaluation of quantized LLMs is essential to guide the selection o… ▽ More

    Submitted 6 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  24. arXiv:2402.15173  [pdf, other

    cs.LG

    Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

    Authors: Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor W. Tsang

    Abstract: Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process. Recent works have turned to zeroth-order optimizers for fine-tuning, which save substantial memory by using two forward passes. However, these optimizers are plagued by the heterogeneity of parameter curvatures across different dimensions. In this work, we… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  25. arXiv:2402.05136  [pdf, other

    cs.CL

    LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

    Authors: Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

    Abstract: State-of-the-art large language models (LLMs) are now claiming remarkable supported context lengths of 256k or even more. In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation. This paper introduces LV-Eval, a challenging long-context benchmark with five le… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  26. arXiv:2401.11649  [pdf, other

    cs.CV

    M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

    Authors: Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, **gdong Wang, Yong Liu

    Abstract: Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Journal ref: AAAI2024

  27. arXiv:2401.10039  [pdf, other

    cs.CV

    GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition

    Authors: Guangzhao Dai, Xiangbo Shu, Wenhao Wu, Rui Yan, Jiachao Zhang

    Abstract: Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in Zero-Shot Egocentric Action Recognition (ZS-EAR). Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We prop… ▽ More

    Submitted 11 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  28. arXiv:2401.03868  [pdf, other

    cs.AR cs.AI

    FlightLLM: Efficient Large Language Model Inference with a Complete Map** Flow on FPGAs

    Authors: Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, **tao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, Yu Wang

    Abstract: Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerato… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted to FPGA'24

  29. arXiv:2312.16478  [pdf, other

    cs.LG

    Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Xiaojun Chang, **gdong Wang

    Abstract: Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. Recently, to alleviate expensive data collection, co-occurring pairs from the Internet are automatically harvested for training. However, it inevitably includes mismatched pairs, \ie, noisy correspondences, undermining supervision reliability and degrading performance. Current methods leverage deep ne… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  30. arXiv:2312.02226  [pdf, other

    cs.CV

    Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

    Authors: Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, **gdong Wang

    Abstract: Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of categories. Existing methods typically adapt pretrained image-text models to the video domain, capitalizing on their inherent strengths in generalization. A common thread among such methods is the augmentation of visual embeddings with temporal in… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  31. arXiv:2311.16442  [pdf, other

    cs.LG cs.DC

    Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

    Authors: **hao Li, Jiaming Xu, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian, Guohao Dai

    Abstract: Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. Many previous studies exploit quantization methods to reduce LLM inference cost by reducing latency and memory consumption. Applying 2-bit single-precision weight quantization brings >3% accuracy loss, so the state-of-the-art methods use mixed-precision methods for LLMs (e.… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  32. arXiv:2311.12862  [pdf, other

    cs.DC cs.CV cs.LG cs.PF

    TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

    Authors: Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

    Abstract: Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular, specialized high-performance kernels are required. Existing GPU libraries offer two dataflow types for sparse convolution. The gather-GEMM-scatter dataflow is easy to i… ▽ More

    Submitted 25 October, 2023; originally announced November 2023.

    Comments: MICRO 2023; Haotian Tang and Shang Yang contributed equally to this project

  33. arXiv:2311.01686  [pdf, other

    cs.CV cs.LG

    Disentangled Representation Learning with Transmitted Information Bottleneck

    Authors: Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, **gdong Wang, Qinghua Zheng

    Abstract: Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models. Although significant advances have been made by regularizing the information in representations with information theory, two major challenges remain: 1) the representation compression inevitably leads to performance drop;… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  34. arXiv:2311.01282  [pdf, other

    cs.LG cs.CL

    FlashDecoding++: Faster Large Language Model Inference on GPUs

    Authors: Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang

    Abstract: As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1) Synchronized partial softmax update. The softmax operation requires a synchronized update operation among each partial softmax result, leading to ~20% overheads for the attention computation in LLMs. (2) Under-utilized compu… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  35. arXiv:2310.06218  [pdf, other

    cs.LG cs.AI

    SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration

    Authors: **gyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, Yong Liu

    Abstract: The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 14 pages, 4 figures, Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  36. arXiv:2309.11125  [pdf, other

    cs.CV

    PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

    Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, **gdong Wang

    Abstract: Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID). Despite significant progress, current methods face two primary challenges: 1) the pedestrian candidates learned within detectors are suboptimal for the ReID task. 2) the potential for collaboration between tw… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

  37. arXiv:2308.10547  [pdf, other

    math.OC cs.LG eess.SY

    Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold

    Authors: Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, Yong Liu

    Abstract: The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios.… ▽ More

    Submitted 12 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Journal ref: International Conference on Learning Representations, 2024

  38. arXiv:2308.10156  [pdf, other

    cs.CV cs.AI

    SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

    Authors: Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, **gdong Wang

    Abstract: Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls. In contrast, Layout-to-Image (L2I) generation, aiming to generate realistic and complex scene images from user-specified layouts, has risen to prominence. However, existing methods transform layout information into tokens or RGB images for co… ▽ More

    Submitted 13 March, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted to AAAI 2024

    Journal ref: 38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024

  39. arXiv:2308.09346  [pdf, other

    cs.CV

    Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

    Authors: Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, **gdong Wang, Yong Liu

    Abstract: Class prototype construction and matching are core aspects of few-shot action recognition. Previous methods mainly focus on designing spatiotemporal relation modeling modules or complex temporal alignment algorithms. Despite the promising results, they ignored the value of class prototype construction and matching, leading to unsatisfactory performance in recognizing similar categories in every ta… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  40. arXiv:2308.01532  [pdf, other

    cs.CV

    Multimodal Adaptation of CLIP for Few-Shot Action Recognition

    Authors: Jiazheng Xing, Mengmeng Wang, Xiaojun Hou, Guang Dai, **gdong Wang, Yong Liu

    Abstract: Applying large-scale pre-trained visual models like CLIP to few-shot action recognition tasks can benefit performance and efficiency. Utilizing the "pre-training, fine-tuning" paradigm makes it possible to avoid training a network from scratch, which can be time-consuming and resource-intensive. However, this method has two drawbacks. First, limited labeled samples for few-shot action recognition… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  41. arXiv:2307.15773  [pdf, other

    cs.LG stat.ML

    Seeking the Yield Barrier: High-Dimensional SRAM Evaluation Through Optimal Manifold

    Authors: Yanfang Liu, Guohao Dai, Wei W. Xing

    Abstract: Being able to efficiently obtain an accurate estimate of the failure probability of SRAM components has become a central issue as model circuits shrink their scale to submicrometer with advanced technology nodes. In this work, we revisit the classic norm minimization method. We then generalize it with infinite components and derive the novel optimal manifold concept, which bridges the surrogate-ba… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 2023 60th ACM/IEEE Design Automation Conference(DAC)

    MSC Class: 68U07 ACM Class: J.6

  42. arXiv:2307.08209  [pdf, other

    cs.CV

    Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

    Authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redu… ▽ More

    Submitted 8 August, 2023; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV2023

  43. arXiv:2305.19982  [pdf, other

    cs.LG cs.AI

    Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

    Authors: Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

    Abstract: Running out of GPU memory has become a main bottleneck for large-scale DNN training. How to reduce the memory footprint during training has received intensive research attention. We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients. To address this… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  44. arXiv:2305.16823   

    cs.LG cs.AI

    HUB: Guiding Learned Optimizers with Continuous Prompt Tuning

    Authors: Gaole Dai, Wei Wu, Ziyu Wang, Jie Fu, Shanghang Zhang, Tiejun Huang

    Abstract: Learned optimizers are a crucial component of meta-learning. Recent advancements in scalable learned optimizers have demonstrated their superior performance over hand-designed optimizers in various tasks. However, certain characteristics of these models, such as an unstable learning curve, limited ability to handle unseen tasks and network architectures, difficult-to-control behaviours, and poor p… ▽ More

    Submitted 31 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Some table information is not accurate, author information not correct inside the pdf

  45. arXiv:2303.14736  [pdf, other

    cs.CV

    Disentangling Writer and Character Styles for Handwriting Generation

    Authors: Gang Dai, Yifan Zhang, Qingfeng Wang, Qing Du, Zhuliang Yu, Zhuoman Liu, Shuang** Huang

    Abstract: Training machines to synthesize diverse handwritings is an intriguing task. Recently, RNN-based methods have been proposed to generate stylized online Chinese characters. However, these methods mainly focus on capturing a person's overall writing style, neglecting subtle style inconsistencies between characters written by the same person. For example, while a person's handwriting typically exhibit… ▽ More

    Submitted 31 March, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

    Comments: accepted by CVPR 2023. Source code: https://github.com/dailenson/SDT

  46. arXiv:2303.07012  [pdf, other

    cs.CV cs.AI

    AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation

    Authors: Hongxiang Huang, Daihui Yang, Gang Dai, Zhen Han, Yuyi Wang, Kin-Man Lam, Fan Yang, Shuang** Huang, Yongge Liu, Mengchao He

    Abstract: The study of ancient writings has great value for archaeology and philology. Essential forms of material are photographic characters, but manual photographic character recognition is extremely time-consuming and expertise-dependent. Automatic classification is therefore greatly desired. However, the current performance is limited due to the lack of annotated data. Data generation is an inexpensive… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  47. arXiv:2302.03390  [pdf, other

    cs.LG cs.IT cs.NE

    Learning Discretized Neural Networks under Ricci Flow

    Authors: Jun Chen, Hanwen Chen, Mengmeng Wang, Guang Dai, Ivor W. Tsang, Yong Liu

    Abstract: In this paper, we study Discretized Neural Networks (DNNs) composed of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function during training. Most training-based DNNs in such scenarios employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the use of STE i… ▽ More

    Submitted 4 January, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

  48. High-Dimensional Yield Estimation using Shrinkage Deep Features and Maximization of Integral Entropy Reduction

    Authors: Shuo Yin, Guohao Dai, Wei W. Xing

    Abstract: Despite the fast advances in high-sigma yield analysis with the help of machine learning techniques in the past decade, one of the main challenges, the curse of dimensionality, which is inevitable when dealing with modern large-scale circuits, remains unsolved. To resolve this challenge, we propose an absolute shrinkage deep kernel learning, ASDK, which automatically identifies the dominant proces… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    MSC Class: 68U07 ACM Class: J.6

    Journal ref: ASPDAC 2023, January, Tokyo, Japan

  49. arXiv:2211.10995  [pdf, other

    cs.CV cs.AI

    Distinctive Self-Similar Object Detection

    Authors: Zeyu Shangguan, Bocheng Hu, Guohua Dai, Yuyu Liu, Darun Tang, Xingqun Jiang

    Abstract: Deep learning-based object detection has demonstrated a significant presence in the practical applications of artificial intelligence. However, objects such as fire and smoke, pose challenges to object detection because of their non-solid and various shapes, and consequently difficult to truly meet requirements in practical fire prevention and control. In this paper, we propose that the distinctiv… ▽ More

    Submitted 25 August, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

  50. arXiv:2209.02882  [pdf, other

    cs.DC cs.PL

    Sgap: Towards Efficient Sparse Tensor Algebra Compilation for GPU

    Authors: Genghan Zhang, Yuetong Zhao, Yanting Tao, Zhongming Yu, Guohao Dai, Sitao Huang, Yuan Wen, Pavlos Petoumenos, Yu Wang

    Abstract: Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity, the central question is: how to elevate the flexible reduction semantics to sparse compilati… ▽ More

    Submitted 9 January, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

    Comments: 23 pages, 10 figures