Skip to main content

Showing 1–50 of 243 results for author: Chu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00599  [pdf, other

    cs.DC cs.LG

    Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules

    Authors: Xinglin Pan Wenxiang Lin, Shaohuai Shi, Xiaowen Chu, Weinong Sun, Bo Li

    Abstract: Sparsely-activated Mixture-of-Expert (MoE) layers have found practical applications in enlarging the model size of large-scale foundation models, with only a sub-linear increase in computation demands. Despite the wide adoption of hybrid parallel paradigms like model parallelism, expert parallelism, and expert-sharding parallelism (i.e., MP+EP+ESP) to support MoE model training on GPU clusters, th… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.18181  [pdf, ps, other

    cs.SE

    An Empirical Study of Unit Test Generation with Large Language Models

    Authors: Lin Yang, Chen Yang, Shutao Gao, Wei**g Wang, Bo Wang, Qihao Zhu, Xiao Chu, Jianyi Zhou, Guangtai Liang, Qianxiang Wang, Junjie Chen

    Abstract: Unit testing is an essential activity in software development for verifying the correctness of software components. However, manually writing unit tests is challenging and time-consuming. The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation. Existing research primarily focuses on closed-source LLMs (e.g., ChatGPT and CodeX) with fixed prompting s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.10540  [pdf, other

    cs.AI cs.NE cs.RO

    Generating and Evolving Reward Functions for Highway Driving with Large Language Models

    Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu

    Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures

  4. arXiv:2406.02924  [pdf, other

    cs.LG cs.CL cs.NE

    Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

    Authors: Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

    Abstract: Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. How… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML2024, 29 pages, 4 figures

  5. arXiv:2406.00034  [pdf, other

    cs.CL cs.AI

    Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

    Authors: Tianlong Wang, Xianfeng Jiao, Yifan He, Zhongzhi Chen, Yinghao Zhu, Xu Chu, Junyi Gao, Yasha Wang, Liantao Ma

    Abstract: Recent studies have indicated that Large Language Models (LLMs) harbor an inherent understanding of truthfulness, yet often fail to express fully and generate false statements. This gap between "knowing" and "telling" poses a challenge for ensuring the truthfulness of generated content. To address this, we introduce Adaptive Activation Steering (ACT), a tuning-free method that adaptively shift LLM… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.17811

  6. arXiv:2405.18334  [pdf, other

    cs.DB cs.CV cs.LG

    SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

    Authors: Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong

    Abstract: In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajec… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Journal ref: Published on International Conference on Very Large Databases 2024

  7. arXiv:2405.17940  [pdf, other

    cs.RO cs.AI

    World Models for General Surgical Gras**

    Authors: Hongbin Lin, Bin Li, Chun Wai Wong, Juan Rojas, Xiangyu Chu, Kwok Wai Samuel Au

    Abstract: Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor poli… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Journal ref: Robotics: Science and Systems 2024

  8. arXiv:2405.17245  [pdf, other

    cs.DC cs.AI cs.LG cs.NI

    Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference

    Authors: Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen

    Abstract: Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recogniz… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE International Conference on Computer Communications 2024

  9. arXiv:2405.16094  [pdf, other

    cs.CV

    PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus

    Authors: Zhaochen Liu, Limeng Qiao, Xiangxiang Chu, Tingting Jiang

    Abstract: Aiming to predict the complete shapes of partially occluded objects, amodal segmentation is an important step towards visual intelligence. With crucial significance, practical prior knowledge derives from sufficient training, while limited amodal annotations pose challenges to achieve better performance. To tackle this problem, utilizing the mighty priors accumulated in the foundation model, we pr… ▽ More

    Submitted 3 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  10. arXiv:2405.13467  [pdf, other

    cs.CV

    AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

    Authors: Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan

    Abstract: With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  11. arXiv:2405.09039  [pdf, other

    cs.LG

    SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction

    Authors: Zhihao Yu, Xu Chu, Yujie **, Yasha Wang, Junfeng Zhao

    Abstract: Electronic health record (EHR) data has emerged as a valuable resource for analyzing patient health status. However, the prevalence of missing data in EHR poses significant challenges to existing methods, leading to spurious correlations and suboptimal predictions. While various imputation techniques have been developed to address this issue, they often obsess unnecessary details and may introduce… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  12. arXiv:2405.05589  [pdf, other

    cs.RO

    Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration

    Authors: Yifan Duan, Xinran Zhang, Guoliang You, Yilong Wu, Xingchen Li, Yao Li, Xiaomeng Chu, Jie Peng, Yu Zhang, Jianmin Ji, Yanyong Zhang

    Abstract: Autonomous systems often employ multiple LiDARs to leverage the integrated advantages, enhancing perception and robustness. The most critical prerequisite under this setting is the estimating the extrinsic between each LiDAR, i.e., calibration. Despite the exciting progress in multi-LiDAR calibration efforts, a universal, sensor-agnostic calibration method remains elusive. According to the coarse-… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 19 figures

  13. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  14. arXiv:2405.00314  [pdf, other

    cs.LG cs.AI cs.AR cs.CV cs.PF

    Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey

    Authors: Dayou Du, Gu Gong, Xiaowen Chu

    Abstract: Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  15. arXiv:2404.09610  [pdf, other

    cs.LG cs.AI

    LoRA Dropout as a Sparsity Regularizer for Overfitting Control

    Authors: Yang Lin, Xinyu Ma, Xu Chu, Yujie **, Zhibang Yang, Yasha Wang, Hong Mei

    Abstract: Parameter-efficient fine-tuning methods, represented by LoRA, play an essential role in adapting large-scale pre-trained models to downstream tasks. However, fine-tuning LoRA-series models also faces the risk of overfitting on the training dataset, and yet there's still a lack of theoretical guidance and practical mechanism to control overfitting on LoRA-based PEFT methods. In this paper, we propo… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.04316  [pdf, other

    cs.LG cs.AI cs.CL

    Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

    Authors: Xinyu Ma, Xu Chu, Zhibang Yang, Yang Lin, Xin Gao, Junfeng Zhao

    Abstract: With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks. One representative line of fine-tuning methods is Orthogonal Fine-tuning (OFT), which rigorously preserves the angular distances within the parameter space to preserve the pr… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Appeared at ICML 2024

  17. arXiv:2403.16536  [pdf, other

    cs.CV

    VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

    Authors: Yu** Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang

    Abstract: Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR2024 Precognition Workshop

  18. arXiv:2403.14358  [pdf, other

    cs.LG cs.AI q-bio.BM

    Exploring the Potential of Large Language Models in Graph Generation

    Authors: Yang Yao, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Xu Chu, Yuekui Yang, Wenwu Zhu, Hong Mei

    Abstract: Large language models (LLMs) have achieved great success in many fields, and recent works have studied exploring LLMs for graph discriminative tasks such as node classification. However, the abilities of LLMs for graph generation remain unexplored in the literature. Graph generation requires the LLM to generate graphs with given properties, which has valuable real-world applications such as drug d… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  19. arXiv:2403.11570  [pdf, other

    cs.CV

    LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge

    Authors: Yuhe Liu, Mengxue Kang, Zengchang Qin, Xiangxiang Chu

    Abstract: Large text-to-image models have achieved astonishing performance in synthesizing diverse and high-quality images guided by texts. With detail-oriented conditioning control, even finer-grained spatial control can be achieved. However, some generated images still appear unreasonable, even with plentiful object features and a harmonious style. In this paper, we delve into the underlying causes and fi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  20. arXiv:2403.07589  [pdf, other

    cs.CV

    PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

    Authors: Honghao Chen, Xiangxiang Chu, Yongjian Ren, Xin Zhao, Kaiqi Huang

    Abstract: Recently, some large kernel convnets strike back with appealing performance and efficiency. However, given the square complexity of convolution, scaling up kernels can bring about an enormous amount of parameters and the proliferated parameters can induce severe optimization problem. Due to these issues, current CNNs compromise to scale up to 51x51 in the form of stripe convolution (i.e., 51x5 + 5… ▽ More

    Submitted 15 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Modification for Fig.1(b); Add Acknowledgements

  21. arXiv:2403.00522  [pdf, other

    cs.CV

    VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

    Authors: Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen

    Abstract: Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. V… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  22. arXiv:2402.13499  [pdf, other

    cs.AR

    Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

    Authors: Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu

    Abstract: Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) utilizing deep learning techniques. A substantial body of studies have been dedicated to dissecting the microarchitectural metrics characterizing diverse GPU generations, which helps researchers understa… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  23. arXiv:2402.12886  [pdf, other

    cs.GR

    Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning

    Authors: Tiansong Zhou, Yebin Liu, Xuangeng Chu, Chengkun Cao, Changyin Zhou, Fei Yu, Yu Li

    Abstract: Rendering photo-realistic novel-view images of complex scenes has been a long-standing challenge in computer graphics. In recent years, great research progress has been made on enhancing rendering quality and accelerating rendering speed in the realm of view synthesis. However, when rendering complex dynamic scenes with sparse views, the rendering quality remains limited due to occlusion problems.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  24. arXiv:2402.12784  [pdf, other

    cs.IR cs.CL

    Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

    Authors: Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, Guido Zuccon

    Abstract: The introduction of Vec2Text, a technique for inverting text embeddings, has raised serious privacy concerns within dense retrieval systems utilizing text embeddings, including those provided by OpenAI and Cohere. This threat comes from the ability for a malicious attacker with access to text embeddings to reconstruct the original text. In this paper, we investigate various aspects of embedding… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  25. Deformable Object Manipulation With Constraints Using Path Set Planning and Tracking

    Authors: **g Huang, Xiangyu Chu, Xin Ma, Kwok Wai Samuel Au

    Abstract: In robotic deformable object manipulation (DOM) applications, constraints arise commonly from environments and task-specific requirements. Enabling DOM with constraints is therefore crucial for its deployment in practice. However, dealing with constraints turns out to be challenging due to many inherent factors such as inaccessible deformation models of deformable objects (DOs) and varying environ… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 20 pages, 25 figures, journal

    Journal ref: IEEE Transactions on Robotics, 2023

  26. arXiv:2402.10631  [pdf, other

    cs.CL

    BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

    Authors: Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

    Abstract: The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD)… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  27. arXiv:2402.07011  [pdf, other

    cs.LG cs.AI cs.DC

    FedImpro: Measuring and Improving Client Update in Federated Learning

    Authors: Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu

    Abstract: Federated Learning (FL) models often experience client drift caused by heterogeneous data, where the distribution of data differs across clients. To address this issue, advanced research primarily focuses on manipulating the existing gradients to achieve more consistent client models. In this paper, we present an alternative perspective on client drift and aim to mitigate it by generating improved… ▽ More

    Submitted 14 March, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  28. arXiv:2402.03766  [pdf, other

    cs.CV cs.AI

    MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

    Authors: Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

    Abstract: We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM b… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  29. arXiv:2402.02105  [pdf, other

    cs.CV

    ParZC: Parametric Zero-Cost Proxies for Efficient NAS

    Authors: Peijie Dong, Lujun Li, Xinglin Pan, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu

    Abstract: Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without c… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  30. arXiv:2401.17644  [pdf, other

    cs.DC cs.PF

    BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

    Authors: Yuxin Wang, Yuhan Chen, Zeyu Li, Xueze Kang, Zhenheng Tang, Xin He, Rui Guo, Xin Wang, Qiang Wang, Amelie Chi Zhou, Xiaowen Chu

    Abstract: Serving systems for Large Language Models (LLMs) are often optimized to improve quality of service (QoS) and throughput. However, due to the lack of open-sourced LLM serving workloads, these systems are frequently evaluated under unrealistic workload assumptions. Consequently, performance may degrade when these systems are deployed in real-world scenarios. This work presents BurstGPT, an LLM servi… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  31. arXiv:2401.16796  [pdf, other

    cs.LG

    Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction

    Authors: Weibin Liao, Yinghao Zhu, Zixiang Wang, Xu Chu, Yasha Wang, Liantao Ma

    Abstract: Analyzing the health status of patients based on Electronic Health Records (EHR) is a fundamental research problem in medical informatics. The presence of extensive missing values in EHR makes it challenging for deep neural networks to directly model the patient's health status based on EHR. Existing deep learning training protocols require the use of statistical information or imputation models t… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  32. arXiv:2401.15865  [pdf, other

    cs.CV

    LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection

    Authors: Sifan Zhou, Liang Li, Xinyu Zhang, Bo Zhang, Shipeng Bai, Miao Sun, Ziyu Zhao, Xiaobo Lu, Xiangxiang Chu

    Abstract: Due to highly constrained computing power and memory, deploying 3D lidar-based detectors on edge devices equipped in autonomous vehicles and robots poses a crucial challenge. Being a convenient and straightforward model compression approach, Post-Training Quantization (PTQ) has been widely adopted in 2D vision tasks. However, applying it directly to 3D lidar-based tasks inevitably leads to perform… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted in ICLR 2024

  33. arXiv:2401.10215  [pdf, other

    cs.CV

    GPAvatar: Generalizable and Precise Head Avatar from Image(s)

    Authors: Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya Harada

    Abstract: Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community. The fundamental objective of this field is to faithfully recreate the head avatar and precisely control expressions and postures. Existing methods, categorized into 2D-based war**, mesh-based, and neural re… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: ICLR 2024, code is available at https://github.com/xg-chu/GPAvatar

  34. arXiv:2401.09943  [pdf, other

    cs.LG cs.SI

    Infinite-Horizon Graph Filters: Leveraging Power Series to Enhance Sparse Information Aggregation

    Authors: Ruizhe Zhang, Xinke Jiang, Yuchen Fang, Jiayuan Luo, Yongxin Xu, Yichen Zhu, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years. However, their performance is often constrained by a limited receptive field, a challenge that becomes more acute in the presence of sparse graphs. In light of the power series, which possesses infinite expansion capabili… ▽ More

    Submitted 18 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: version 2

  35. arXiv:2401.08625  [pdf, other

    cs.AR

    Conditional Flood Fill Method in Logic Synthesis

    Authors: Shitian Yang, Junyue Jiang, Yilai Liang, Xiaoyang Chu

    Abstract: In the field of Electronic Design Automation (EDA), logic synthesis plays a pivotal role in optimizing hardware resources. Traditional logic synthesis algorithms, such as the Quine-McCluskey method, face challenges in scalability and efficiency, particularly for higher-dimension problems. This paper introduces a novel heuristic algorithm based on Conditional Flood Fill Method aimed at addressing t… ▽ More

    Submitted 7 December, 2023; originally announced January 2024.

  36. arXiv:2401.07249  [pdf, other

    cs.LG

    Imputation with Inter-Series Information from Prototypes for Irregular Sampled Time Series

    Authors: Zhihao Yu, Xu Chu, Liantao Ma, Yasha Wang, Wenwu Zhu

    Abstract: Irregularly sampled time series are ubiquitous, presenting significant challenges for analysis due to missing values. Despite existing methods address imputation, they predominantly focus on leveraging intra-series information, neglecting the potential benefits that inter-series information could provide, such as reducing uncertainty and memorization effect. To bridge this gap, we propose PRIME, a… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  37. arXiv:2401.00329  [pdf, other

    cs.LG cs.NI

    On the Burstiness of Distributed Machine Learning Traffic

    Authors: Natchanon Luangsomboon, Fahimeh Fazel, Jörg Liebeherr, Ashkan Sobhani, Shichao Guan, Xingjun Chu

    Abstract: Traffic from distributed training of machine learning (ML) models makes up a large and growing fraction of the traffic mix in enterprise data centers. While work on distributed ML abounds, the network traffic generated by distributed ML has received little attention. Using measurements on a testbed network, we investigate the traffic characteristics generated by the training of the ResNet-50 neura… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    ACM Class: C.2.0; C.4

  38. arXiv:2312.17071  [pdf, other

    cs.CV

    SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation

    Authors: Zhengze Xu, Dongyue Wu, Changqian Yu, Xiangxiang Chu, Nong Sang, Changxin Gao

    Abstract: Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representa… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024; typos corrected; code and models have been released at https://github.com/xzz777/SCTNet

  39. arXiv:2312.16886  [pdf, other

    cs.CV

    MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

    Authors: Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen

    Abstract: We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices. It is an amalgamation of a myriad of architectural designs and techniques that are mobile-oriented, which comprises a set of language models at the scale of 1.4B and 2.7B parameters, trained from scratch, a multimodal vision model that is pre-trained in the CLIP fashion, cross-modality int… ▽ More

    Submitted 29 December, 2023; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Tech Report

  40. arXiv:2312.15883  [pdf, other

    cs.CL cs.AI

    HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses

    Authors: Xinke Jiang, Ruizhe Zhang, Yongxin Xu, Rihong Qiu, Yue Fang, Zhiyuan Wang, **yi Tang, Hongxin Ding, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (Hy… ▽ More

    Submitted 19 April, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: version 2

  41. arXiv:2312.09424  [pdf, other

    cs.CL cs.AI

    Open Domain Knowledge Extraction for Knowledge Graphs

    Authors: Kun Qian, Anton Belyi, Fei Wu, Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind, Chloe Seivwright, Yiwen Sun, Ahmed Fakhry, Theo Rekatsinas, Ihab Ilyas, Xiaoguang Qi, Yunyao Li

    Abstract: The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope… ▽ More

    Submitted 30 October, 2023; originally announced December 2023.

    Comments: 7 pages, 7 figures, 5 tables, preprint technical report, no code or data is released

    MSC Class: 68T30 (primary) ACM Class: F.4.1; I.2.4

  42. arXiv:2312.08760  [pdf, other

    cs.CV

    CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning

    Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng

    Abstract: Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. H… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted at the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI24)

  43. arXiv:2312.02433  [pdf, other

    cs.CV

    Lenna: Language Enhanced Reasoning Detection Assistant

    Authors: Fei Wei, Xinyu Zhang, Ailing Zhang, Bo Zhang, Xiangxiang Chu

    Abstract: With the fast-paced development of multimodal large language models (MLLMs), we can now converse with AI systems in natural languages to understand images. However, the reasoning power and world knowledge embedded in the large language models have been much less investigated and exploited for image perception tasks. In this paper, we propose Lenna, a language-enhanced reasoning detection assistant… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  44. arXiv:2312.01085  [pdf, other

    cs.CV

    RobustCalib: Robust Lidar-Camera Extrinsic Calibration with Consistency Learning

    Authors: Shuang Xu, Sifan Zhou, Zhi Tian, Jizhou Ma, Qiong Nie, Xiangxiang Chu

    Abstract: Current traditional methods for LiDAR-camera extrinsics estimation depend on offline targets and human efforts, while learning-based approaches resort to iterative refinement for calibration results, posing constraints on their generalization and application in on-board systems. In this paper, we propose a novel approach to address the extrinsic calibration problem in a robust, automatic, and sing… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  45. arXiv:2311.12086  [pdf, other

    cs.LG cs.NE

    Masked Autoencoders Are Robust Neural Architecture Search Learners

    Authors: Yiming Hu, Xiangxiang Chu, Bo Zhang

    Abstract: Neural Architecture Search (NAS) currently relies heavily on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust disc… ▽ More

    Submitted 26 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  46. arXiv:2311.09550  [pdf, other

    cs.LG cs.CL

    A Speed Odyssey for Deployable Quantization of LLMs

    Authors: Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Liang Li, Yifan Lu, Xiangxiang Chu, Yerui Sun, Yuchen Xie

    Abstract: The large language model era urges faster and less costly inference. Prior model compression works on LLMs tend to undertake a software-centric approach primarily focused on the simulated quantization performance. By neglecting the feasibility of deployment, these approaches are typically disabled in real practice. They used to drastically push down the quantization bit range for a reduced computa… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  47. arXiv:2311.07843  [pdf, ps, other

    cs.IT eess.SP

    On the IRS Deployment in Smart Factories Considering Blockage Effects: Collocated or Distributed?

    Authors: Yixin Zhang, Saeed R. Khosravirad, Xiaoli Chu, Mikko A. Uusitalo

    Abstract: In this article, we study the collocated and distributed deployment of intelligent reflecting surfaces (IRS) for a fixed total number of IRS elements to support enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) services inside a factory. We build a channel model that incorporates the line-of-sight (LOS) probability and power loss of each transmission path, and p… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  48. arXiv:2311.06543  [pdf, other

    cs.RO

    Bootstrap** Robotic Skill Learning With Intuitive Teleoperation: Initial Feasibility Study

    Authors: Xiangyu Chu, Yunxi Tang, Lam Him Kwok, Yuanpei Cai, Kwok Wai Samuel Au

    Abstract: Robotic skill learning has been increasingly studied but the demonstration collections are more challenging compared to collecting images/videos in computer vision and texts in natural language processing. This paper presents a skill learning paradigm by using intuitive teleoperation devices to generate high-quality human demonstrations efficiently for robotic skill learning in a data-driven manne… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 10 pages, 4 figures, accepted by ISER2023

  49. arXiv:2311.03687  [pdf, other

    cs.PF cs.CL cs.LG

    Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

    Authors: Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu

    Abstract: Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system… ▽ More

    Submitted 1 December, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  50. arXiv:2310.12670  [pdf, other

    cs.DC cs.PF

    Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining

    Authors: Yuxin Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Xinglin Pan, Yang Zheng, Xiaoyu Wu, Amelie Chi Zhou, Bingsheng He, Xiaowen Chu

    Abstract: Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks v… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D parallelism