Skip to main content

Showing 1–50 of 705 results for author: Cheng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01512  [pdf, other

    cs.RO cs.HC cs.LG

    Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

    Authors: Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, Xiaolong Wang

    Abstract: Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations. The intuitiveness and ease of use of the teleoperation system are crucial for ensuring high-quality, diverse, and scalable data. To achieve this, we propose an immersive teleoperation system Open-TeleVision that allows operators to actively perceive the robot's surroundings in a… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Website: https://robot-tv.github.io/

  2. arXiv:2407.00934  [pdf, other

    cs.CL

    CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

    Authors: **gheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

    Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 16 pages, 8 tables, 2 figures. Under review

  3. arXiv:2407.00924  [pdf, other

    cs.CL

    EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction

    Authors: **gheng Ye, Shang Qin, Yinghui Li, Xuxin Cheng, Libo Qin, Hai-Tao Zheng, Peng Xing, Zishan Xu, Guo Cheng, Zhao Wei

    Abstract: Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 22 pages, 10 tables, 9 figures. Under review

  4. arXiv:2407.00608  [pdf, other

    cs.AI cs.CL cs.CV

    Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

    Authors: Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

    Abstract: Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  5. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2406.18957  [pdf, other

    cs.DC cs.GT

    A Treatment of EIP-1559: Enhancing Transaction Fee Mechanism through Nth-Price Auction

    Authors: Kun Li, Guangpeng Qi, Guangyong Shang, Wanli Deng, Minghui Xu, Xiuzhen Cheng

    Abstract: With the widespread adoption of blockchain technology, the transaction fee mechanism (TFM) in blockchain systems has become a prominent research topic. An ideal TFM should satisfy user incentive compatibility (UIC), miner incentive compatibility (MIC), and miner-user side contract proofness ($c$-SCP). However, state-of-the-art works either fail to meet these three properties simultaneously or only… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.18522  [pdf, other

    cs.CV cs.CL

    ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

    Authors: Shenghai Yuan, **fa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan

    Abstract: We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e.g. Sora and Lumiere) in time-lapse video generation. In contrast to existing benchmarks that focus on the visual quality and textual relevance of generated videos, ChronoMagic-Bench focuses on the model's ability to generate time-lapse videos wi… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 31 pages, 15 figures

  8. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  9. arXiv:2406.14699  [pdf, other

    cs.LG math.OC stat.ML

    Preferential Multi-Objective Bayesian Optimization

    Authors: Raul Astudillo, Kejun Li, Maegan Tucker, Chu Xin Cheng, Aaron D. Ames, Yisong Yue

    Abstract: Preferential Bayesian optimization (PBO) is a framework for optimizing a decision-maker's latent preferences over available design choices. While preferences often involve multiple conflicting objectives, existing work in PBO assumes that preferences can be encoded by a single objective function. For example, in robotic assistive devices, technicians often attempt to maximize user comfort while si… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  10. arXiv:2406.14023  [pdf, other

    cs.CL cs.AI

    Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

    Authors: Yuchen Wen, Ke** Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng

    Abstract: As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired by psychometric principles… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Code and datasets are available at https://github.com/wen112358/ImplicitBiasPsychometricEvaluation

  11. arXiv:2406.13450  [pdf, other

    cs.AI

    Federating to Grow Transformers with Constrained Resources without Model Sharing

    Authors: Shikun Shen, Yifei Zou, Yuan Yuan, Yanwei Zheng, Peng Li, Xiuzhen Cheng, Dongxiao Yu

    Abstract: The high resource consumption of large-scale models discourages resource-constrained users from develo** their customized transformers. To this end, this paper considers a federated framework named Fed-Grow for multiple participants to cooperatively scale a transformer from their pre-trained small models. Under the Fed-Grow, a Dual-LiGO (Dual Linear Growth Operator) architecture is designed to h… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.13375  [pdf, other

    cs.CL

    ALiiCE: Evaluating Positional Fine-grained Citation Generation

    Authors: Yilong Xu, **hua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng

    Abstract: Large Language Models (LLMs) can enhance the credibility and verifiability by generating text with citations. However, existing tasks and evaluation methods are predominantly limited to sentence-level statement, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the fine-grained citation generation, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  13. arXiv:2406.13351  [pdf, other

    cs.LG cs.AI cs.DC

    A Resource-Adaptive Approach for Federated Learning under Resource-Constrained Environments

    Authors: Ruirui Zhang, Xingze Wu, Yifei Zou, Zhenzhen Xie, Peng Li, Xiuzhen Cheng, Dongxiao Yu

    Abstract: The paper studies a fundamental federated learning (FL) problem involving multiple clients with heterogeneous constrained resources. Compared with the numerous training parameters, the computing and communication resources of clients are insufficient for fast local training and real-time knowledge sharing. Besides, training on clients with heterogeneous resources may result in the straggler proble… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  14. arXiv:2406.12468  [pdf, other

    cs.CL cs.AI

    Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities

    Authors: Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei, Hongcheng Gao, Yilong Xu, Xueqi Cheng

    Abstract: The parametric knowledge memorized by large language models (LLMs) becomes outdated quickly. In-context editing (ICE) is currently the most effective method for updating the knowledge of LLMs. Recent advancements involve enhancing ICE by modifying the decoding strategy, obviating the need for altering internal model structures or adjusting external prompts. However, this enhancement operates acros… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  15. arXiv:2406.11685  [pdf, other

    cs.LG cs.SI

    Edge Classification on Graphs: New Directions in Topological Imbalance

    Authors: Xueqi Cheng, Yu Wang, Yunchao Liu, Yuying Zhao, Charu C. Aggarwal, Tyler Derr

    Abstract: Recent years have witnessed the remarkable success of applying Graph machine learning (GML) to node/graph classification and link prediction. However, edge classification task that enjoys numerous real-world applications such as social network analysis and cybersecurity, has not seen significant advancement. To address this gap, our study pioneers a comprehensive approach to edge classification. W… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.11668  [pdf, other

    cs.CL

    "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jailbreak

    Authors: Lingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi, Jiayi Mao, Xueqi Cheng

    Abstract: "Jailbreak" is a major safety concern of Large Language Models (LLMs), which occurs when malicious prompts lead LLMs to produce harmful outputs, raising issues about the reliability and safety of LLMs. Therefore, an effective evaluation of jailbreaks is very crucial to develop its mitigation strategies. However, our research reveals that many jailbreaks identified by current evaluations may actual… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  17. arXiv:2406.11290  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy

    Authors: Hengran Zhang, Ke** Bi, Jiafeng Guo, Xueqi Cheng

    Abstract: Utility and topical relevance are critical measures in information retrieval (IR), reflecting system and user perspectives, respectively. While topical relevance has long been emphasized, utility is a higher standard of relevance and is more useful for facilitating downstream tasks, e.g., in Retrieval-Augmented Generation (RAG). When we incorporate utility judgments into RAG, we realize that the t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages

  18. arXiv:2406.11277  [pdf, other

    cs.CL

    Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector

    Authors: Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen

    Abstract: Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4. In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e.g. Baichuan2-Chat 7B) to actively select suitable tools for detecting multiple hallucination types such as text, c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.07471  [pdf, other

    cs.CV

    OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Authors: Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kai**g Zhou, Zongyuan Ge

    Abstract: Surgical scene perception via videos are critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets for surgical workflow analysis, which typically face challenges such as small s… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Version 1

  20. arXiv:2406.05360  [pdf, other

    cs.CL

    Flexible and Adaptable Summarization via Expertise Separation

    Authors: Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

    Abstract: A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, published in SIGIR 2024

  21. arXiv:2406.05347  [pdf, other

    q-bio.BM cs.AI cs.LG

    MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

    Authors: Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song

    Abstract: Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in compre… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  22. arXiv:2406.05135  [pdf

    cs.RO math.OC

    Smart Navigation System for Parking Assignment at Large Events: Incorporating Heterogeneous Driver Characteristics

    Authors: Xi Cheng, Gaofeng Su, Siyuan Feng, Ke Liu, Chen Zhu, Hui Lin, Jilin Song, Jianan Chen

    Abstract: Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducte… ▽ More

    Submitted 14 May, 2024; originally announced June 2024.

  23. arXiv:2406.02058  [pdf, other

    cs.CV cs.RO

    OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

    Authors: Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, **gdong Wang, Jian Zhang

    Abstract: This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations.… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: technical report, 15 pages

  24. arXiv:2406.02017  [pdf, other

    cs.LG stat.ML

    On the Mode-Seeking Properties of Langevin Dynamics

    Authors: Xiwei Cheng, Kexin Fu, Farzan Farnia

    Abstract: The Langevin Dynamics framework, which aims to generate samples from the score function of a probability distribution, is widely used for analyzing and interpreting score-based generative modeling. While the convergence behavior of Langevin Dynamics under unimodal distributions has been extensively studied in the literature, in practice the data distribution could consist of multiple distinct mode… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  25. arXiv:2406.01605  [pdf, other

    eess.IV cs.CV

    An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

    Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

    Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

  26. arXiv:2406.01304  [pdf, other

    cs.CL cs.AI cs.SE

    CodeR: Issue Resolving with Multi-Agent and Task Graphs

    Authors: Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

    Abstract: GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issue… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: https://github.com/NL2Code/CodeR

  27. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  28. arXiv:2406.00944  [pdf, other

    cs.CL cs.AI cs.IR

    Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

    Authors: Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng

    Abstract: Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 23 pages

  29. arXiv:2406.00356  [pdf, other

    eess.AS cs.SD

    AudioLCM: Text-to-Audio Generation with Latent Consistency Models

    Authors: Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao

    Abstract: Recent advancements in Latent Diffusion Models (LDMs) have propelled them to the forefront of various generative tasks. However, their iterative sampling process poses a significant computational burden, resulting in slow generation speeds and limiting their application in text-to-audio generation deployment. In this work, we introduce AudioLCM, a novel consistency-based model tailored for efficie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  30. arXiv:2405.20852  [pdf, other

    cs.CL

    Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

    Authors: Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou

    Abstract: Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inhere… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  31. arXiv:2405.19915  [pdf, other

    cs.AI

    P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

    Authors: Huihong Shi, Xin Cheng, Wendong Mao, Zhongfeng Wang

    Abstract: Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield non-negligible re-quantization overhead, limiting ViTs' hardware efficien… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  32. arXiv:2405.19689  [pdf, other

    cs.CV cs.IR

    Uncertainty-aware sign language video retrieval with probability distribution modeling

    Authors: Xuan Wu, Hongxiang Li, Yuanjiang Luo, Xuxin Cheng, Xianwei Zhuang, Meng Cao, Keren Fu

    Abstract: Sign language video retrieval plays a key role in facilitating information access for the deaf community. Despite significant advances in video-text retrieval, the complexity and inherent uncertainty of sign language preclude the direct application of these techniques. Previous methods achieve the map** between sign language video and text through fine-grained modal alignment. However, due to th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  33. arXiv:2405.19688  [pdf, other

    cs.CV

    DNPM: A Neural Parametric Model for the Synthesis of Facial Geometric Details

    Authors: Haitao Cao, Bao** Cheng, Qiran Pu, Haocheng Zhang, Bin Luo, Yixiang Zhuang, Juncong Lin, Liyan Chen, Xuan Cheng

    Abstract: Parametric 3D models have enabled a wide variety of computer vision and graphics tasks, such as modeling human faces, bodies and hands. In 3D face modeling, 3DMM is the most widely used parametric model, but can't generate fine geometric details solely from identity and expression inputs. To tackle this limitation, we propose a neural parametric model named DNPM for the facial geometric details, w… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  34. arXiv:2405.19099  [pdf, other

    cs.CR

    DataSafe: Copyright Protection with PUF Watermarking and Blockchain Tracking

    Authors: Xiaolong Xue, Guangyong Shang, Zhen Ma, Minghui Xu, Hechuan Guo, Kun Li, Xiuzhen Cheng

    Abstract: Digital watermarking methods are commonly used to safeguard digital media copyrights by confirming ownership and deterring unauthorized use. However, without reliable third-party oversight, these methods risk security vulnerabilities during watermark extraction. Furthermore, digital media lacks tangible ownership attributes, posing challenges for secure copyright transfer and tracing. This study i… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  35. arXiv:2405.19015  [pdf, other

    eess.SY cs.LG cs.MA

    Distributed Management of Fluctuating Energy Resources in Dynamic Networked Systems

    Authors: Xiaotong Cheng, Ioannis Tsetis, Setareh Maghsudi

    Abstract: Modern power systems integrate renewable distributed energy resources (DERs) as an environment-friendly enhancement to meet the ever-increasing demands. However, the inherent unreliability of renewable energy renders develo** DER management algorithms imperative. We study the energy-sharing problem in a system consisting of several DERs. Each agent harvests and distributes renewable energy in it… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  36. arXiv:2405.17816  [pdf, other

    cs.CV cs.LG

    Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

    Authors: Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang

    Abstract: In the open world, detecting out-of-distribution (OOD) data, whose labels are disjoint with those of in-distribution (ID) samples, is important for reliable deep neural networks (DNNs). To achieve better detection performance, one type of approach proposes to fine-tune the model with auxiliary OOD datasets to amplify the difference between ID and OOD data through a separation loss defined on model… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2405.17638  [pdf, other

    cs.LG cs.AI

    The surprising efficiency of temporal difference learning for rare event prediction

    Authors: Xiaoou Cheng, Jonathan Weare

    Abstract: We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC), estimator for policy evaluation in reinforcement learning, with an emphasis on estimation of quantities related to rare events. Policy evaluation is complicated in the rare event setting by the long timescale of the event and by the need for \emph{relative accuracy} in estimates of very small valu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  38. arXiv:2405.17336  [pdf, other

    cs.CL

    XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

    Authors: Xianfu Cheng, Hang Zhang, Jian Yang, Xiang Li, Weixiao Zhou, Kui Wu, Fei Liu, Wei Zhang, Tao Sun, Tongliang Li, Zhoujun Li

    Abstract: In the domain of document AI, semi-structured form parsing plays a crucial role. This task leverages techniques from key information extraction (KIE), dealing with inputs that range from plain text to intricate modal data comprising images and structural layouts. The advent of pre-trained multimodal models has driven the extraction of key information from form documents in different formats such a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures, 6 tables

  39. Negative as Positive: Enhancing Out-of-distribution Generalization for Graph Contrastive Learning

    Authors: Zixu Wang, Bingbing Xu, Yige Yuan, Huawei Shen, Xueqi Cheng

    Abstract: Graph contrastive learning (GCL), standing as the dominant paradigm in the realm of graph pre-training, has yielded considerable progress. Nonetheless, its capacity for out-of-distribution (OOD) generalization has been relatively underexplored. In this work, we point out that the traditional optimization of InfoNCE in GCL restricts the cross-domain pairs only to be negative samples, which inevitab… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 5 pages, 5 figures, In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA

    ACM Class: I.2

  40. arXiv:2405.15816  [pdf, other

    math.OC cs.AI cs.LG

    Riemannian Bilevel Optimization

    Authors: Sanchayan Dutta, Xiang Cheng, Suvrit Sra

    Abstract: We develop new algorithms for Riemannian bilevel optimization. We focus in particular on batch and stochastic gradient-based methods, with the explicit goal of avoiding second-order information such as Riemannian hyper-gradients. We propose and analyze $\mathrm{RF^2SA}$, a method that leverages first-order gradient information to navigate the complex geometry of Riemannian manifolds efficiently. N… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  41. arXiv:2405.15495  [pdf, other

    cs.LG

    Towards Natural Machine Unlearning

    Authors: Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, Xiaolin Huang

    Abstract: Machine unlearning (MU) aims to eliminate information that has been learned from specific training data, namely forgetting data, from a pre-trained model. Currently, the mainstream of existing MU methods involves modifying the forgetting data with incorrect labels and subsequently fine-tuning the model. While learning such incorrect information can indeed remove knowledge, the process is quite unn… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  42. arXiv:2405.15349  [pdf, other

    cs.CL

    UnKE: Unstructured Knowledge Editing in Large Language Models

    Authors: **gcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by l… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  43. arXiv:2405.15256  [pdf, other

    cs.LG

    FTMixer: Frequency and Time Domain Representations Fusion for Time Series Modeling

    Authors: Zhengnan Li, Yunxiao Qin, Xilong Cheng, Yuting Tan

    Abstract: Time series data can be represented in both the time and frequency domains, with the time domain emphasizing local dependencies and the frequency domain highlighting global dependencies. To harness the strengths of both domains in capturing local and global dependencies, we propose the Frequency and Time Domain Mixer (FTMixer). To exploit the global characteristics of the frequency domain, we intr… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  44. arXiv:2405.15245  [pdf, other

    cs.LG cs.AI

    Cooperative Backdoor Attack in Decentralized Reinforcement Learning with Theoretical Guarantee

    Authors: Mengtong Gao, Yifei Zou, Zuyuan Zhang, Xiuzhen Cheng, Dongxiao Yu

    Abstract: The safety of decentralized reinforcement learning (RL) is a challenging problem since malicious agents can share their poisoned policies with benign agents. The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Differing from the existing methods that hide a whole backdoor attack behind their shared policies, our method decomposes the backdoor be… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  45. arXiv:2405.14347  [pdf, other

    eess.SP cs.AI

    Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng

    Abstract: Integrated sensing and communication (ISAC) technology is essential for enabling the vehicular networks. However, the communication channel in this scenario exhibits time-varying characteristics, and the potential targets may move rapidly, creating a doubly-dynamic phenomenon. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, the… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  46. arXiv:2405.13810  [pdf, other

    cs.LG cs.AI

    Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

    Authors: Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui Yan

    Abstract: Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individua… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  47. arXiv:2405.13792  [pdf, other

    cs.CL cs.AI cs.IR

    xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

    Authors: Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao

    Abstract: This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  48. arXiv:2405.13548  [pdf, other

    cs.SE cs.CL

    ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing

    Authors: Wei Zhang, Xianfu Cheng, Yi Zhang, Jian Yang, Hongcheng Guo, Zhoujun Li, Xiaolin Yin, Xiangyuan Guan, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: Log parsing, a vital task for interpreting the vast and complex data produced within software architectures faces significant challenges in the transition from academic benchmarks to the industrial domain. Existing log parsers, while highly effective on standardized public datasets, struggle to maintain performance and efficiency when confronted with the sheer scale and diversity of real-world ind… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  49. arXiv:2405.13037  [pdf, other

    cs.CL cs.AI

    Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation

    Authors: Cheng Niu, Xingguang Wang, Xuxin Cheng, Juntong Song, Tong Zhang

    Abstract: Dialogue State Tracking (DST) is designed to monitor the evolving dialogue state in the conversations and plays a pivotal role in develo** task-oriented dialogue systems. However, obtaining the annotated data for the DST task is usually a costly endeavor. In this paper, we focus on employing LLMs to generate dialogue data to reduce dialogue collection and annotation costs. Specifically, GPT-4 is… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  50. arXiv:2405.11704  [pdf

    cs.LG cs.AI

    Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

    Authors: Taiyuan Mei, Yun Zi, Xiaohan Cheng, Zijun Gao, Qi Wang, Haowei Yang

    Abstract: The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies. Further, we dig deep into the efficiency bottleneck of the training phase, and evaluate in detail the contribution of adaptive optimization algorithms (such as… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.