Skip to main content

Showing 1–50 of 3,758 results for author: Chen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01445  [pdf, other

    cs.LG cs.CV

    FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

    Authors: Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang

    Abstract: Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 23 pages

  2. arXiv:2407.00952  [pdf, other

    cs.LG cs.CL cs.DC

    SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao

    Abstract: The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently h… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  3. arXiv:2407.00674  [pdf, other

    cs.MA cs.GR cs.RO

    Emergent Crowd Grou** via Heuristic Self-Organization

    Authors: Xiao-Cheng Liao, Wei-Neng Chen, Xiang-Ling Chen, Yi Mei

    Abstract: Modeling crowds has many important applications in games and computer animation. Inspired by the emergent following effect in real-life crowd scenarios, in this work, we develop a method for implicitly grou** moving agents. We achieve this by analyzing local information around each agent and rotating its preferred velocity accordingly. Each agent could automatically form an implicit group with i… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00578  [pdf, other

    cs.RO

    UniQuad: A Unified and Versatile Quadrotor Platform Series for UAV Research and Application

    Authors: Yichen Zhang, Xinyi Chen, Peize Liu, Junzhe Wang, Hetai Zou, Shaojie Shen

    Abstract: As quadrotors take on an increasingly diverse range of roles, researchers often need to develop new hardware platforms tailored for specific tasks, introducing significant engineering overhead. In this article, we introduce the UniQuad series, a unified and versatile quadrotor platform series that offers high flexibility to adapt to a wide range of common tasks, excellent customizability for advan… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA-X40)

  5. arXiv:2407.00577  [pdf, other

    cs.RO

    FALCON: Fast Autonomous Aerial Exploration using Coverage Path Guidance

    Authors: Yichen Zhang, Xinyi Chen, Chen Feng, Boyu Zhou, Shaojie Shen

    Abstract: This paper introduces FALCON, a novel Fast Autonomous expLoration framework using COverage path guidaNce, which aims at setting a new performance benchmark in the field of autonomous aerial exploration. Despite recent advancements in the domain, existing exploration planners often suffer from inefficiencies such as frequent revisitations of previously explored regions. FALCON effectively harnesses… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  6. arXiv:2407.00382  [pdf, other

    math.NA cs.AI cs.LG

    UM2N: Towards Universal Mesh Movement Networks

    Authors: Mingrui Zhang, Chunyang Wang, Stephan Kramer, Joseph G. Wallwork, Siyi Li, Jiancheng Liu, Xiang Chen, Matthew D. Piggott

    Abstract: Solving complex Partial Differential Equations (PDEs) accurately and efficiently is an essential and challenging problem in all scientific and engineering disciplines. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without increasing the overall mesh degree of freedom count. Conventional sophisticated mesh movement methods are extremely expensive and… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  7. arXiv:2406.20076  [pdf, other

    cs.CV

    EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

    Authors: Yuxuan Zhang, Tianheng Cheng, Rui Hu, ei Liu, Heng Liu, Long** Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

    Abstract: Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Preprint

  8. arXiv:2406.19999  [pdf, other

    cs.CL

    The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models

    Authors: Xinyi Chen, Baohao Liao, Jirui Qi, Panagiotis Eustratiadis, Christof Monz, Arianna Bisazza, Maarten de Rijke

    Abstract: Following multiple instructions is a crucial ability for large language models (LLMs). Evaluating this ability comes with significant challenges: (i) limited coherence between multiple instructions, (ii) positional bias where the order of instructions affects model performance, and (iii) a lack of objectively verifiable tasks. To address these issues, we introduce a benchmark designed to evaluate… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  10. arXiv:2406.19613  [pdf, other

    cs.DC

    Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

    Authors: Rui Li, Tao Ouyang, Liekang Zeng, Guocheng Liao, Zhi Zhou, Xu Chen

    Abstract: Collaborative Edge Computing (CEC) is an emerging paradigm that collaborates heterogeneous edge devices as a resource pool to compute DNN inference tasks in proximity such as edge video analytics. Nevertheless, as the key knob to improve network utility in CEC, existing works mainly focus on the workload routing strategies among edge devices with the aim of minimizing the routing cost, remaining a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE/ACM TRANSACTIONS ON NETWORKING (ToN)

  11. arXiv:2406.19070  [pdf, other

    cs.CV

    FAGhead: Fully Animate Gaussian Head from Monocular Videos

    Authors: Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan

    Abstract: High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Repre… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  12. arXiv:2406.18849  [pdf, other

    cs.CV

    Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

    Authors: Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

    Abstract: Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  13. arXiv:2406.18485  [pdf, other

    cs.DC

    LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism

    Authors: Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong, Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin **, Xuanzhe Liu

    Abstract: Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency issues. We propose LoongTrain, a novel system to efficiently train LLMs with long sequences at scale. The core of LoongTrain is the 2D-Attention mecha… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.18079  [pdf, other

    cs.CV eess.IV

    MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal

    Authors: Yiguo Jiang, Xuhang Chen, Chi-Man Pun, Shuqiang Wang, Wei Feng

    Abstract: When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by The Visual Computer journal

  15. Real-time Neural Woven Fabric Rendering

    Authors: Xiang Chen, Lu Wang, Beibei Wang

    Abstract: Woven fabrics are widely used in applications of realistic rendering, where real-time capability is also essential. However, rendering realistic woven fabrics in real time is challenging due to their complex structure and optical appearance, which cause aliasing and noise without many samples. The core of this issue is a multi-scale representation of the fabric shading model, which allows for a fa… ▽ More

    Submitted 4 May, 2024; originally announced June 2024.

    Comments: Accepted by SIGGRAPH 2024 Conference Proceedings

  16. arXiv:2406.17565  [pdf, other

    cs.DC

    MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool

    Authors: Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan

    Abstract: Large language model (LLM) serving has transformed from stateless to stateful systems, utilizing techniques like context caching and disaggregated inference. These optimizations extend the lifespan and domain of the KV cache, necessitating a new architectural approach. We present MemServe, a unified system that integrates both inter-request and intra-request optimizations. MemServe introduces MemP… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  17. arXiv:2406.17413  [pdf, other

    cs.CV

    Depth-Guided Semi-Supervised Instance Segmentation

    Authors: Xin Chen, Jie Hu, Xiawu Zheng, Jianghang Lin, Liujuan Cao, Rongrong Ji

    Abstract: Semi-Supervised Instance Segmentation (SSIS) aims to leverage an amount of unlabeled data during training. Previous frameworks primarily utilized the RGB information of unlabeled images to generate pseudo-labels. However, such a mechanism often introduces unstable noise, as a single instance can display multiple RGB values. To overcome this limitation, we introduce a Depth-Guided (DG) SSIS framewo… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 4 tables

  18. arXiv:2406.17115  [pdf, other

    cs.CV cs.AI

    Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

    Authors: Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen

    Abstract: Despite the rapid progress and outstanding performance of Large Vision-Language Models (LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, i.e., LVLMs tend to generate responses that are inconsistent with the corresponding visual inputs. To evaluate the degree of hallucination in LVLMs, previous works have proposed a series of benchmarks featuring different types of tas… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  19. arXiv:2406.16978  [pdf, other

    cs.LG cs.AI cs.RO

    MetaFollower: Adaptable Personalized Autonomous Car Following

    Authors: Xianda Chen, Kehua Chen, Meixin Zhu, Hao, Yang, Shaojie Shen, Xuesong Wang, Yinhai Wang

    Abstract: Car-following (CF) modeling, a fundamental component in microscopic traffic simulation, has attracted increasing interest of researchers in the past decades. In this study, we propose an adaptable personalized car-following framework -MetaFollower, by leveraging the power of meta-learning. Specifically, we first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from v… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  20. arXiv:2406.16942  [pdf, other

    eess.IV cs.AI cs.CV

    Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

    Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

    Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

  21. arXiv:2406.16370  [pdf, other

    cs.RO

    An Active Search Strategy with Multiple Unmanned Aerial Systems for Multiple Targets

    Authors: Chuanxiang Gao, Xinyi Wang, Xi Chen, Ben M. Chen

    Abstract: The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  22. arXiv:2406.16330  [pdf, other

    cs.CL cs.AI

    Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

    Authors: Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Xi Chen, Cunhang Fan, Zhao Lv, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo Sui

    Abstract: While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  23. arXiv:2406.16279  [pdf, other

    cs.CV

    SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments

    Authors: Neng Wang, Ruibin Guo, Chenghao Shi, Hui Zhang, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen

    Abstract: 4D LiDAR semantic segmentation, also referred to as multi-scan semantic segmentation, plays a crucial role in enhancing the environmental understanding capabilities of autonomous vehicles. It entails identifying the semantic category of each point in the LiDAR scan and distinguishing whether it is dynamic, a critical aspect in downstream tasks such as path planning and autonomous navigation. Exist… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

  24. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  25. arXiv:2406.15752  [pdf, other

    eess.AS cs.AI cs.CL

    TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

    Authors: Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen

    Abstract: Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, T… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  26. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  27. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  28. arXiv:2406.14910  [pdf, ps, other

    cs.LG cs.DC math.OC

    Towards Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning: A Two-Phase Deep Reinforcement Learning Approach

    Authors: Xiao**g Chen, Zhenyuan Li, Wei Ni, Xin Wang, Shunqing Zhang, Yanzan Sun, Shugong Xu, Qingqi Pei

    Abstract: Federated learning (FL) is a viable technique to train a shared machine learning model without sharing data. Hierarchical FL (HFL) system has yet to be studied regrading its multiple levels of energy, computation, communication, and client scheduling, especially when it comes to clients relying on energy harvesting to power their operations. This paper presents a new two-phase deep deterministic p… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  29. arXiv:2406.14896  [pdf, other

    eess.IV cs.CV

    SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper to 2024 MICCAI

  30. arXiv:2406.14863  [pdf, other

    cs.CR cs.AR

    Older and Wiser: The Marriage of Device Aging and Intellectual Property Protection of Deep Neural Networks

    Authors: Ning Lin, Shaocong Wang, Yue Zhang, Yangu He, Kwunhang Wong, Arindam Basu, Dashan Shang, Xiaoming Chen, Zhongrui Wang

    Abstract: Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference 2024

  31. arXiv:2406.14194  [pdf, other

    cs.CV cs.AI

    VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

    Authors: Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

    Abstract: The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and nar… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  32. arXiv:2406.14189  [pdf, other

    cs.CL

    In Tree Structure Should Sentence Be Generated

    Authors: Yaguang Li, Xin Chen

    Abstract: Generative models reliant on sequential autoregression have been at the forefront of language generation for an extensive period, particularly following the introduction of widely acclaimed transformers. Despite its excellent performance, there are always some issues that we face today. For example, problems such as hallucinations and getting trapped in a logic loop may occur. To enhance the perfo… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  33. arXiv:2406.12910  [pdf

    cs.LG cs.AI cs.NE physics.chem-ph q-bio.BM

    Human-level molecular optimization driven by mol-gene evolution

    Authors: Jiebin Fang, Churu Mao, Yuchen Zhu, Xiaoming Chen, Chang-Yu Hsieh, Zhongjun Ma

    Abstract: De novo molecule generation allows the search for more drug-like hits across a vast chemical space. However, lead optimization is still required, and the process of optimizing molecular structures faces the challenge of balancing structural novelty with pharmacological properties. This study introduces the Deep Genetic Molecular Modification Algorithm (DGMM), which brings structure modification to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  34. arXiv:2406.12435  [pdf, other

    cs.LG cs.AI cs.DC

    Federated Learning with Limited Node Labels

    Authors: Bisheng Tang, Xiaojun Chen, Shaopu Wang, Yuexin Xuan, Zhendong Zhao

    Abstract: Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  35. arXiv:2406.12255  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

    Authors: Lijie Hu, Liang Liu, Shu Yang, Xin Chen, Hongru Xiao, Mengdi Li, Pan Zhou, Muhammad Asif Ali, Di Wang

    Abstract: Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 21 pages

  36. arXiv:2406.12199  [pdf, other

    cs.LG cs.AI

    Time Series Modeling for Heart Rate Prediction: From ARIMA to Transformers

    Authors: Haowei Ni, Shuchen Meng, Xieming Geng, Panfeng Li, Zhuoying Li, Xupeng Chen, Xiaotong Wang, Shiyao Zhang

    Abstract: Cardiovascular disease (CVD) is a leading cause of death globally, necessitating precise forecasting models for monitoring vital signs like heart rate, blood pressure, and ECG. Traditional models, such as ARIMA and Prophet, are limited by their need for manual parameter tuning and challenges in handling noisy, sparse, and highly variable medical data. This study investigates advanced deep learning… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 6th International Conference on Electronic Engineering and Informatics

  37. arXiv:2406.11920  [pdf, other

    cs.LG cs.AI

    Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking

    Authors: Xi Chen, Chuan Qin, Chuyu Fang, Chao Wang, Chen Zhu, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong

    Abstract: In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promotin… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  38. arXiv:2406.11891  [pdf, other

    cs.SI cs.AI cs.LG

    Towards Adaptive Neighborhood for Advancing Temporal Interaction Graph Modeling

    Authors: Siwei Zhang, Xi Chen, Yun Xiong, Xixi Wu, Yao Zhang, Yongrui Fu, Yinglong Zhao, Jiawei Zhang

    Abstract: Temporal Graph Networks (TGNs) have demonstrated their remarkable performance in modeling temporal interaction graphs. These works can generate temporal node representations by encoding the surrounding neighborhoods for the target node. However, an inherent limitation of existing TGNs is their reliance on fixed, hand-crafted rules for neighborhood encoding, overlooking the necessity for an adaptiv… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: KDD'2024 Research Track Paper

  39. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, **peng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  40. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, **yi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  41. arXiv:2406.11303  [pdf, other

    cs.CV cs.AI cs.CL

    VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

    Authors: Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang

    Abstract: Despite significant breakthroughs in video analysis driven by the rapid development of large multimodal models (LMMs), there remains a lack of a versatile evaluation benchmark to comprehensively assess these models' performance in video understanding and reasoning. To address this, we present VideoVista, a video QA benchmark that integrates challenges across diverse content categories, durations,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 38 pages, 44 figures

  42. arXiv:2406.11257  [pdf, other

    cs.LG

    ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

    Authors: Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang

    Abstract: Large language models (LLM) have recently attracted significant attention in the field of artificial intelligence. However, the training process of these models poses significant challenges in terms of computational and storage capacities, thus compressing checkpoints has become an urgent problem. In this paper, we propose a novel Extreme Checkpoint Compression (ExCP) framework, which significantl… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: ICML 2024 oral

  43. arXiv:2406.11227  [pdf, ps, other

    cs.DB cs.AI

    Compound Schema Registry

    Authors: Silvery D. Fu, Xuewei Chen

    Abstract: Schema evolution is critical in managing database systems to ensure compatibility across different data versions. A schema registry typically addresses the challenges of schema evolution in real-time data streaming by managing, validating, and ensuring schema compatibility. However, current schema registries struggle with complex syntactic alterations like field renaming or type changes, which oft… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 2 pages, compound ai system workshop 2024

  44. arXiv:2406.10833  [pdf, other

    cs.CL

    A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

    Authors: Yu Zhang, Xiusi Chen, Bowen **, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han

    Abstract: In many scientific fields, large language models (LLMs) have revolutionized the way with which text and other modalities of data (e.g., molecules and proteins) are dealt, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one to two fields or a single modality. In this paper,… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 33 pages (GitHub: https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models)

  45. arXiv:2406.10831  [pdf, other

    cs.NI cs.AI cs.DC

    Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

    Authors: Weiheng Tang, **gyi Li, Lin Chen, Xu Chen

    Abstract: Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional intermediate layer between the master and workers in the original distributed learning systems, potentially leading to more severe straggler effect. Recently, coding the… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: The paper has been accepted by IEEE Transactions on Communications

  46. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  47. arXiv:2406.10540  [pdf, other

    cs.AI cs.NE cs.RO

    Generating and Evolving Reward Functions for Highway Driving with Large Language Models

    Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu

    Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in a… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures

  48. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Si** Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  49. arXiv:2406.10130  [pdf, other

    cs.CL

    The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

    Authors: Yan Liu, Yu Liu, Xiaokang Chen, Pin-Yu Chen, Daoguang Zan, Min-Yen Kan, Tsung-Yi Ho

    Abstract: Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases, which may cause negative social impacts or even bring catastrophic results in application. Previous works on this problem mainly focused on using black-box methods such as probing to detect and quantify social biases in PLMs by observing model outputs. As a result, previous debiasing me… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  50. arXiv:2406.09867  [pdf, other

    cs.CV

    Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox

    Authors: Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen

    Abstract: Most existing out-of-distribution (OOD) detection benchmarks classify samples with novel labels as the OOD data. However, some marginal OOD samples actually have close semantic contents to the in-distribution (ID) sample, which makes determining the OOD sample a Sorites Paradox. In this paper, we construct a benchmark named Incremental Shift OOD (IS-OOD) to address the issue, in which we divide th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: v1