Skip to main content

Showing 1–50 of 1,405 results for author: He, Z

.
  1. arXiv:2407.03157  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Let the Code LLM Edit Itself When You Edit the Code

    Authors: Zhenyu He, Jun Zhang, Shengjie Luo, **g**g Xu, Zhi Zhang, Di He

    Abstract: In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in Progress

  2. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  3. arXiv:2407.02350  [pdf, other

    cs.CV

    Conceptual Codebook Learning for Vision-Language Models

    Authors: Yi Zhang, Ke Yu, Siqi Wu, Zhihai He

    Abstract: In this paper, we propose Conceptual Codebook Learning (CoCoLe), a novel fine-tuning method for vision-language models (VLMs) to address the challenge of improving the generalization capability of VLMs while fine-tuning them on downstream tasks in a few-shot setting. We recognize that visual concepts, such as textures, shapes, and colors are naturally transferable across domains and play a crucial… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  4. arXiv:2407.01358  [pdf, other

    cs.CL

    Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models

    Authors: Xiaolin Xing, Zhiwei He, Haoyu Xu, Xing Wang, Rui Wang, Yu Hong

    Abstract: This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. Generative Iris Prior Embedded Transformer for Iris Restoration

    Authors: Yubo Huang, Jia Wang, Peipei Li, Liuyu Xiang, Peigang Li, Zhaofeng He

    Abstract: Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/sawyercharlton/Gformer

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 510-515

  6. arXiv:2407.00023  [pdf, other

    cs.DC cs.LG

    Preble: Efficient Distributed Prompt Scheduling for LLM Serving

    Authors: Vikranth Srivatsa, Zijian He, Reyna Abhyankar, Dongming Li, Yiying Zhang

    Abstract: Prompts to large language models (LLMs) have evolved beyond simple user questions. For LLMs to solve complex problems, today's practices include domain-specific instructions, illustration of tool usages, and long context, such as textbook chapters in prompts. As such, many parts of prompts are repetitive across requests, and their attention computation results can be reused. However, today's LLM s… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  7. arXiv:2407.00014  [pdf

    cs.RO eess.SY

    Simplifying Kinematic Parameter Estimation in sEMG Prosthetic Hands: A Two-Point Approach

    Authors: Gang Liu, Zhenxiang Wang, Ziyang He, Shanshan Guo, Rui Zhang, Dezhong Yao

    Abstract: Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict… ▽ More

    Submitted 1 May, 2024; originally announced July 2024.

    Comments: 13 pages

  8. arXiv:2406.19341  [pdf, other

    cs.CV

    Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

    Authors: Yushun Tang, Shuoshuo Chen, Zhehan Kan, Yi Zhang, Qinghai Guo, Zhihai He

    Abstract: Fully test-time adaptation aims to adapt the network model based on sequential analysis of input samples during the inference stage to address the cross-domain performance degradation problem of deep neural networks. This work is based on the following interesting finding: in transformer-based image classification, the class token at the first transformer encoder layer can be learned to capture th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: accepted by TMM

  9. arXiv:2406.17960  [pdf, other

    cs.CV cs.AI

    MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Mengjiao Shen, **gwei Yang, Chengju Liu, Qijun Chen

    Abstract: Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  10. arXiv:2406.17297  [pdf, other

    cs.CV cs.AI

    Towards Open-set Camera 3D Object Detection

    Authors: Zhuolin He, Xinrun Li, Heng Gao, Jiachen Tang, Shoumeng Qiu, Wenfu Wang, Lvjian Lu, Xuchong Qiu, Xiangyang Xue, Jian Pu

    Abstract: Traditional camera 3D object detectors are typically trained to recognize a predefined set of known object classes. In real-world scenarios, these detectors may encounter unknown objects outside the training categories and fail to identify them correctly. To address this gap, we present OS-Det3D (Open-set Camera 3D Object Detection), a two-stage training framework enhancing the ability of camera 3… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  11. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  12. arXiv:2406.16525  [pdf, other

    stat.ML cs.LG

    OAML: Outlier Aware Metric Learning for OOD Detection Enhancement

    Authors: Heng Gao, Zhuolin He, Shoumeng Qiu, Jian Pu

    Abstract: Out-of-distribution (OOD) detection methods have been developed to identify objects that a model has not seen during training. The Outlier Exposure (OE) methods use auxiliary datasets to train OOD detectors directly. However, the collection and learning of representative OOD samples may pose challenges. To tackle these issues, we propose the Outlier Aware Metric Learning (OAML) framework. The main… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.16321  [pdf, other

    cs.LG cs.AI

    Multimodal Graph Benchmark

    Authors: **g Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

    Abstract: Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and v… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://mm-graph-benchmark.github.io/

  14. arXiv:2406.15688  [pdf

    physics.app-ph

    Over 600 V Lateral AlN-on-AlN Schottky Barrier Diodes with Ultra-Low Ideality Factor

    Authors: Dinusha Herath Mudiyanselage, Dawei Wang, Ziyi He, Bingcheng Da, Houqiang Fu

    Abstract: This letter reports the demonstration of lateral AlN Schottky barrier diodes (SBDs) on single-crystal AlN substrates by metalorganic chemical vapor deposition (MOCVD) with an ultra-low ideality factor (η) of 1.65, a breakdown voltage (BV) of 640 V, and a record high normalized BV by the anode-to-cathode distance (LAC). The homoepitaxially grown AlN epilayers had much lower defect densities and exc… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  15. arXiv:2406.14913  [pdf, other

    physics.soc-ph cs.MA

    Cooperative bots exhibit nuanced effects on cooperation across strategic frameworks

    Authors: Zehua Si, Zhixue He, Chen Shen, Jun Tanimoto

    Abstract: The positive impact of cooperative bots on cooperation within evolutionary game theory is well documented; however, existing studies have predominantly used discrete strategic frameworks, focusing on deterministic actions with a fixed probability of one. This paper extends the investigation to continuous and mixed strategic approaches. Continuous strategies employ intermediate probabilities to con… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  16. arXiv:2406.14903  [pdf, other

    cs.AI

    GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

    Authors: Leyan Wang, Yonggang **, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He

    Abstract: As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individua… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  17. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, **gtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  18. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Qiaozhi Tan, Tong Chen, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, **lin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  19. arXiv:2406.13277  [pdf, other

    math.CO math.DG

    On area-minimizing subgraphs in integer lattices

    Authors: Zunwu He, Bobo Hua

    Abstract: We introduce area-minimizing subgraphs in an infinite graph via the formulation of functions of bounded variations initiated by De Giorgi. We classify area-minimizing subgraphs in the two-dimensional integer lattice up to isomorphisms, and prove general geometric properties for those in high-dimensional cases.

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 39 pages, 52 figures

  20. arXiv:2406.12878  [pdf, other

    physics.ins-det hep-ex nucl-ex

    Beam test results of the prototype of the multi wire drift chamber for the CSR external-target experiment

    Authors: Zhi Qin, Zhoubo He, Zhe Cao, Tao Chen, Zhi Deng, Limin Duan, Dong Guo, Rongjiang Hu, Jie Kong, Canwen Liu, Peng Ma, Xianglun Wei, Shihai Wen, Xiangjie Wen, Junwei Yan, Herun Yang, Zuoqiao Yang, Yuhong Yu, Zhigang Xiao

    Abstract: The half-size prototype of the multi wire drift chamber (MWDC) for the cooling storage ring (CSR) external-target experiment (CEE) was assembled and tested in 350 MeV/u Kr+Fe reactions on the heavy ion research facility in Lanzhou (HIRFL). The prototype consists of 6 sense layers, where the sense wires are stretched in three directions X, U and V, meeting $0^\circ$, $30^\circ$ and $-30^\circ$ with… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  21. arXiv:2406.12074  [pdf, other

    cs.CL

    COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities

    Authors: Zihao He, Rebecca Dorn, Siyi Guo, Minh Duc Chu, Kristina Lerman

    Abstract: Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  22. arXiv:2406.11191  [pdf, other

    cs.CL

    A Survey on Human Preference Learning for Large Language Models

    Authors: Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang

    Abstract: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which ma… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: IEEE copyright statement added (also applied to the former version)

  23. arXiv:2406.10890  [pdf, other

    cs.CL cs.AI cs.LG

    RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models

    Authors: Zhuoran **, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Large language models (LLMs) inevitably memorize sensitive, copyrighted, and harmful knowledge from the training corpus; therefore, it is crucial to erase this knowledge from the models. Machine unlearning is a promising solution for efficiently removing specific knowledge by post hoc modifying models. In this paper, we propose a Real-World Knowledge Unlearning benchmark (RWKU) for LLM unlearning.… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 48 pages, 7 figures, 12 tables

  24. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dong** Chen, Yue Huang, Siyuan Wu, **gyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  25. Nurgle: Exacerbating Resource Consumption in Blockchain State Storage via MPT Manipulation

    Authors: Zheyuan He, Zihao Li, Ao Qiao, Xiapu Luo, Xiaosong Zhang, Ting Chen, Shuwei Song, Dijun Liu, Weina Niu

    Abstract: Blockchains, with intricate architectures, encompass various components, e.g., consensus network, smart contracts, decentralized applications, and auxiliary services. While offering numerous advantages, these components expose various attack surfaces, leading to severe threats to blockchains. In this study, we unveil a novel attack surface, i.e., the state storage, in blockchains. The state storag… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  26. arXiv:2406.08772  [pdf, other

    cs.CV cs.CL

    MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

    Authors: Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He

    Abstract: Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MM… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  27. arXiv:2406.08270  [pdf, other

    cs.IR

    Boosting Multimedia Recommendation via Separate Generic and Unique Awareness

    Authors: Zhuangzhuang He, Zihan Wang, Yonghui Yang, Haoyue Bai, Le Wu

    Abstract: Multimedia recommendation, which incorporates various modalities (e.g., images, texts, etc.) into user or item representation to improve recommendation quality, has received widespread attention. Recent methods mainly focus on cross-modal alignment with self-supervised learning to obtain higher quality representation. Despite remarkable performance, we argue that there is still a limitation: compl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  28. arXiv:2406.08214  [pdf, other

    cs.IR

    Graph Bottlenecked Social Recommendation

    Authors: Yonghui Yang, Le Wu, Zihan Wang, Zhuangzhuang He, Richang Hong, Meng Wang

    Abstract: With the emergence of social networks, social recommendation has become an essential technique for personalized services. Recently, graph-based social recommendations have shown promising results by capturing the high-order social influence. Most empirical studies of graph-based social recommendations directly take the observed social networks into formulation, and produce user preferences based o… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  29. arXiv:2406.08035  [pdf, other

    cs.CV cs.AI

    LVBench: An Extreme Long Video Understanding Benchmark

    Authors: Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

    Abstract: Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sport… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  30. arXiv:2406.05355  [pdf, other

    physics.flu-dyn

    Revisit to the WGVC schemes: a nonlinear order-preserving and spectral-property-optimized methodology and its enhancement

    Authors: Kang He, Hongwei Liu, Tongbiao Guo, Xinliang Li, Zhiwei He

    Abstract: The numerical simulation of supersonic complex flow problems demands capabilities in identifying multiscale structures and capturing shocks, imposing stringent requirements on the numerical scheme. The capability to identify multiscale structures is closely related to the spectral properties of the numerical scheme. Currently, existing methods to improve the spectral properties of finite differenc… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  31. arXiv:2406.04640  [pdf, other

    cs.LG

    LinkGPT: Teaching Large Language Models To Predict Missing Links

    Authors: Zhongmou He, **g Zhu, Shengyi Qian, Joyce Chai, Danai Koutra

    Abstract: Large Language Models (LLMs) have shown promising results on various language and vision tasks. Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs). However, most studies have focused on node classification, while the use of LLMs for link prediction (LP) remains understudied. In this work, we propose a new task on LLMs, whe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  32. arXiv:2406.04600  [pdf, other

    cs.CV

    1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation

    Authors: Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang

    Abstract: Tracking and segmenting multiple objects in complex scenes has always been a challenge in the field of video object segmentation, especially in scenarios where objects are occluded and split into parts. In such cases, the definition of objects becomes very ambiguous. The motivation behind the MOSE dataset is how to clearly recognize and distinguish objects in complex scenes. In this challenge, we… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  33. arXiv:2406.03978  [pdf, other

    cs.MA cs.LG

    Mini Honor of Kings: A Lightweight Environment for Multi-Agent Reinforcement Learning

    Authors: Lin Liu, Jian Zhao, Cheng Hu, Zhengtao Cao, Youpeng Zhao, Zhenbin Ye, Meng Meng, Wenjun Wang, Zhaofeng He, Houqiang Li, Xia Lin, Lanxiao Huang

    Abstract: Games are widely used as research environments for multi-agent reinforcement learning (MARL), but they pose three significant challenges: limited customization, high computational demands, and oversimplification. To address these issues, we introduce the first publicly available map editor for the popular mobile game Honor of Kings and design a lightweight environment, Mini Honor of Kings (Mini Ho… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  34. arXiv:2406.03888  [pdf, ps, other

    cs.IT eess.SP

    MSE-Based Training and Transmission Optimization for MIMO ISAC Systems

    Authors: Zhenyao He, Wei Xu, Hong Shen, Yonina C. Eldar, Xiaohu You

    Abstract: In this paper, we investigate a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system under typical block-fading channels. As a non-trivial extension to most existing works on ISAC, both the training and transmission signals sent by the ISAC transmitter are exploited for sensing. Specifically, we develop two training and transmission design schemes to minimize a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  35. arXiv:2406.03796  [pdf, other

    physics.soc-ph

    Beyond a binary theorizing of prosociality

    Authors: Chen Shen, Zhixue He, Hao Guo, Shuyue Hu, Jun Tanimoto, Lei Shi, Petter Holme

    Abstract: A stylized experiment, the public goods game, has taught us the peculiar reproducible fact that humans tend to contribute more to shared resources than expected from economically rational assumptions. There have been two competing explanations for this phenomenon: either contributing to the public good is an innate human trait (the prosocial preference hypothesis) or a transitory effect while lear… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  36. arXiv:2406.03470  [pdf, other

    cs.NE cs.AI

    SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

    Authors: Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He

    Abstract: Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy. Currently, the ANN-to-SNN conversion methods can obtain ANN on-par accuracy SNN with ultra-low latency (8 time-steps) in CNN structure on computer vision (CV) tasks. However, as Transformer-based networks have achieved prevailing precision on both CV and natural language processing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: * These authors contributed equally to this work

  37. VQUNet: Vector Quantization U-Net for Defending Adversarial Atacks by Regularizing Unwanted Noise

    Authors: Zhixun He, Mukesh Singhal

    Abstract: Deep Neural Networks (DNN) have become a promising paradigm when develo** Artificial Intelligence (AI) and Machine Learning (ML) applications. However, DNN applications are vulnerable to fake data that are crafted with adversarial attack algorithms. Under adversarial attacks, the prediction accuracy of DNN applications suffers, making them unreliable. In order to defend against adversarial attac… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 8 pages, 6 figures

    MSC Class: 94A08 ACM Class: I.5.4

    Journal ref: 2024 7th International Conference on Machine Vision and Applications (ICMVA)

  38. arXiv:2406.02147  [pdf, other

    cs.CV

    UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

    Authors: Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang

    Abstract: 3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  39. arXiv:2406.02048  [pdf, other

    cs.IR

    Auto-Encoding or Auto-Regression? A Reality Check on Causality of Self-Attention-Based Sequential Recommenders

    Authors: Yueqi Wang, Zhankui He, Zhenrui Yue, Julian McAuley, Dong Wang

    Abstract: The comparison between Auto-Encoding (AE) and Auto-Regression (AR) has become an increasingly important topic with recent advances in sequential recommendation. At the heart of this discussion lies the comparison of BERT4Rec and SASRec, which serve as representative AE and AR models for self-attentive sequential recommenders. Yet the conclusion of this debate remains uncertain due to: (1) the lack… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  40. arXiv:2406.00432  [pdf, other

    cs.CV

    Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner

    Authors: Xing Cui, Peipei Li, Zekun Li, Xuannan Liu, Yueying Zou, Zhaofeng He

    Abstract: Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning ``how to drag'' through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  41. arXiv:2405.20112  [pdf, other

    cs.CV

    RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

    Authors: Zhiyuan He, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: The rapid advances in generative AI models have empowered the creation of highly realistic images with arbitrary content, raising concerns about potential misuse and harm, such as Deepfakes. Current research focuses on training detectors using large datasets of generated images. However, these training-based solutions are often computationally expensive and show limited generalization to unseen ge… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  42. arXiv:2405.19708  [pdf, other

    cs.CV cs.AI

    Text Guided Image Editing with Automatic Concept Locating and Forgetting

    Authors: Jia Li, Lijie Hu, Zhixian He, **gfeng Zhang, Tianhang Zheng, Di Wang

    Abstract: With the advancement of image-to-image diffusion models guided by text, significant progress has been made in image editing. However, a persistent challenge remains in seamlessly incorporating objects into images based on textual instructions, without relying on extra user-provided guidance. Text and images are inherently distinct modalities, bringing out difficulties in fully capturing the semant… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  43. arXiv:2405.18837  [pdf, other

    quant-ph

    Transformer for Parameterized Quantum Circuits Expressibility Prediction

    Authors: Fei Zhang, Jie Li, Zhimin He, Haozhen Situ

    Abstract: With the exponentially faster computation for certain problems, quantum computing has garnered significant attention in recent years. Variational Quantum Algorithm (VQA) is a crucial method to implement quantum computing, and an appropriate task-specific ansatz can effectively enhance the quantum advantage of VQAs. However, the vast search space makes it challenging to find the optimal task-specif… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.18058  [pdf, other

    cs.IR

    ReChorus2.0: A Modular and Task-Flexible Recommendation Library

    Authors: Jiayu Li, Hanyu Li, Zhiyu He, Weizhi Ma, Peijie Sun, Min Zhang, Shao** Ma

    Abstract: With the applications of recommendation systems rapidly expanding, an increasing number of studies have focused on every aspect of recommender systems with different data inputs, models, and task settings. Therefore, a flexible library is needed to help researchers implement the experimental strategies they require. Existing open libraries for recommendation scenarios have enabled reproducing vari… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures. Under review

  45. arXiv:2405.17816  [pdf, other

    cs.CV cs.LG

    Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

    Authors: Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang

    Abstract: In the open world, detecting out-of-distribution (OOD) data, whose labels are disjoint with those of in-distribution (ID) samples, is important for reliable deep neural networks (DNNs). To achieve better detection performance, one type of approach proposes to fine-tune the model with auxiliary OOD datasets to amplify the difference between ID and OOD data through a separation loss defined on model… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  46. arXiv:2405.17774  [pdf, other

    cs.CV

    Gradually Vanishing Gap in Prototypical Network for Unsupervised Domain Adaptation

    Authors: Shanshan Wang, Hao Zhou, Xun Yang, Zhenwei He, Mengzhu Wang, Xingyi Zhang, Meng Wang

    Abstract: Unsupervised domain adaptation (UDA) is a critical problem for transfer learning, which aims to transfer the semantic information from labeled source domain to unlabeled target domain. Recent advancements in UDA models have demonstrated significant generalization capabilities on the target domain. However, the generalization boundary of UDA models remains unclear. When the domain discrepancy is to… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  47. arXiv:2405.17763  [pdf

    quant-ph physics.app-ph physics.atm-clus physics.optics

    Capturing dynamics and thermodynamics of a three-level quantum heat engine via programmable quantum circuits

    Authors: Gao-xiang Deng, Zhe He, Yu Liu, Wei Shao, Zheng Cui

    Abstract: This research employs the Kraus representation and Sz.-Nagy dilation theorem to model a three-level quantum heat on quantum circuits, investigating its dynamic evolution and thermodynamic performance. The feasibility of the dynamic model is validated by tracking the changes of population. On the basis of reinforcement learning algorithm, the optimal cycle of the quantum heat engine for maximal ave… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  48. arXiv:2405.17433  [pdf

    q-bio.MN q-bio.GN

    ScAtt: an Attention based architecture to analyze Alzheimer's disease at cell type level from single-cell RNA-sequencing data

    Authors: Xiaoxia Liu, Robert R Butler III, Prashnna K Gyawali, Frank M Longo, Zihuai He

    Abstract: Alzheimer's disease (AD) is a pervasive neurodegenerative disorder that leads to memory and behavior impairment severe enough to interfere with daily life activities. Understanding this disease pathogenesis can drive the development of new targets and strategies to prevent and treat AD. Recent advances in high-throughput single-cell RNA sequencing technology (scRNA-seq) have enabled the generation… ▽ More

    Submitted 12 March, 2024; originally announced May 2024.

  49. arXiv:2405.16720  [pdf, other

    cs.CL

    Large Scale Knowledge Washing

    Authors: Yu Wang, Ruihan Wu, Zexue He, Xiusi Chen, Julian McAuley

    Abstract: Large language models show impressive abilities in memorizing world knowledge, which leads to concerns regarding memorization of private information, toxic or sensitive knowledge, and copyrighted content. We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge. Previous unlearning methods usually define the reverse loss and update… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  50. arXiv:2405.16456  [pdf, other

    cs.LG cs.AI

    Dominant Shuffle: A Simple Yet Powerful Data Augmentation for Time-series Prediction

    Authors: Kai Zhao, Zuojie He, Alex Hung, Dan Zeng

    Abstract: Recent studies have suggested frequency-domain Data augmentation (DA) is effec tive for time series prediction. Existing frequency-domain augmentations disturb the original data with various full-spectrum noises, leading to excess domain gap between augmented and original data. Although impressive performance has been achieved in certain cases, frequency-domain DA has yet to be generalized to time… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: https://kaizhao.net/time-series