Skip to main content

Showing 1–50 of 436 results for author: Peng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19602  [pdf, other

    cs.CV cs.LG

    A Survey on Deep Clustering: From the Prior Perspective

    Authors: Yiding Lu, Haobin Li, Yunfan Li, Yijie Lin, Xi Peng

    Abstract: Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation… ▽ More

    Submitted 30 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.18629  [pdf, other

    cs.LG cs.AI cs.CL

    Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

    Authors: Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia

    Abstract: Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benef… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code, data, and models are available at https://github.com/dvlab-research/Step-DPO

  3. arXiv:2406.17304  [pdf, other

    cs.CL

    Leveraging LLMs for Dialogue Quality Measurement

    Authors: **ghan Jia, Abi Komma, Timothy Leffel, Xujun Peng, Ajay Nagesh, Tamer Soliman, Aram Galstyan, Anoop Kumar

    Abstract: In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and pro… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.14185  [pdf, other

    cs.DC cs.AI

    Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices

    Authors: Li Wang, Liang Li, Lianming Xu, Xian Peng, Aiguo Fei

    Abstract: The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their comp… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.11161  [pdf, other

    cs.AI cs.MM

    Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

    Authors: Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, **gdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann

    Abstract: Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing su… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 37 pages, 12 figures, Project: https://github.com/ZebangCheng/Emotion-LLaMA, Demo: https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA

  6. arXiv:2406.11147  [pdf, other

    cs.SE cs.AI

    Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

    Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

    Abstract: Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.11087  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    MemDPT: Differential Privacy for Memory Efficient Language Models

    Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Chen Ma, Songhang Deng, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages first version

  8. arXiv:2406.10018  [pdf, other

    cs.SE

    STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

    Authors: Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou

    Abstract: Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based rep… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  9. arXiv:2406.09834  [pdf, other

    cs.SE

    How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study

    Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng

    Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  10. arXiv:2406.06615  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Language Guided Skill Discovery

    Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

    Abstract: Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  11. arXiv:2406.04482  [pdf, other

    cs.CL cs.AI cs.HC cs.SE

    Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

    Authors: Claire **, Sudha Rao, Xiangyu Peng, Portia Botchway, Jessica Quaye, Chris Brockett, Bill Dolan

    Abstract: Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detec… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in Findings of the Association for Computational Linguistics: ACL 2024

  12. arXiv:2406.03807  [pdf, other

    cs.AI cs.CL cs.RO

    Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering

    Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Jiannan Cao, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 46pages first version

  13. arXiv:2405.18741  [pdf, other

    cs.CL cs.AI

    Genshin: General Shield for Natural Language Processing with Large Language Models

    Authors: Xiao Peng, Tao Liu, Ying Wang

    Abstract: Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains lik… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  14. arXiv:2405.18347  [pdf, other

    cs.LG

    Dataset Growth

    Authors: Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

    Abstract: Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. H… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  15. arXiv:2405.17403  [pdf, other

    cs.LG cs.AI

    A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

    Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

    Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    ACM Class: I.2

  16. arXiv:2405.11126  [pdf, other

    cs.CV cs.GR cs.LG

    Flexible Motion In-betweening with Diffusion Models

    Authors: Setareh Cohan, Guy Tevet, Daniele Reda, Xue Bin Peng, Michiel van de Panne

    Abstract: Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a s… ▽ More

    Submitted 23 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024. For project page and code, see https://setarehc.github.io/CondMDI/

  17. arXiv:2405.10492  [pdf

    cs.CL cs.LG

    Automatic News Generation and Fact-Checking System Based on Language Processing

    Authors: Xirui Peng, Qiming Xu, Zheng Feng, Haopeng Zhao, Lianghao Tan, Yan Zhou, Zecheng Zhang, Chenwei Gong, Yingqiao Zheng

    Abstract: This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key in… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    ACM Class: I.5; H.4

  18. arXiv:2405.09054  [pdf, other

    cs.CV

    Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory Association

    Authors: Weihua Gao, Wenlong Niu, Wenlong Lu, Pengcheng Wang, Zhaoyuan Qi, Xiaodong Peng, Zhen Yang

    Abstract: The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  19. arXiv:2405.06059  [pdf, other

    cs.CL cs.AI

    A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds

    Authors: Christopher Z. Cui, Xiangyu Peng, Mark O. Riedl

    Abstract: Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a pr… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  20. arXiv:2404.19264  [pdf, other

    cs.RO

    DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

    Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

    Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  21. arXiv:2404.18947  [pdf, other

    cs.LG cs.AI

    Multimodal Fusion on Low-quality Data: A Comprehensive Survey

    Authors: Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang

    Abstract: Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges… ▽ More

    Submitted 5 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Feel free to comment on our manuscript: [email protected]

  22. arXiv:2404.18398  [pdf, other

    cs.CL cs.MM

    MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

    Authors: Xiang Li, Zhi-Qi Cheng, Jun-Yan He, Xiaojiang Peng, Alexander G. Hauptmann

    Abstract: Emotional Text-to-Speech (E-TTS) synthesis has gained significant attention in recent years due to its potential to enhance human-computer interaction. However, current E-TTS approaches often struggle to capture the complexity of human emotions, primarily relying on oversimplified emotional labels or single-modality inputs. To address these limitations, we propose the Multimodal Emotional Text-to-… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  23. arXiv:2404.17205  [pdf, other

    cs.CV

    Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

    Authors: Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, **bao Wang, Feng Zheng, Xiaojiang Peng, Xuelong Li

    Abstract: Emotion recognition aims to discern the emotional state of subjects within an image, relying on subject-centric and contextual visual cues. Current approaches typically follow a two-stage pipeline: first localize subjects by off-the-shelf detectors, then perform emotion classification through the late fusion of subject and context features. However, the complicated paradigm suffers from disjoint t… ▽ More

    Submitted 28 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  24. arXiv:2404.17027  [pdf, other

    cs.CL cs.AI

    Player-Driven Emergence in LLM-Driven Game Narrative

    Authors: Xiangyu Peng, Jessica Quaye, Sudha Rao, Weijia Xu, Portia Botchway, Chris Brockett, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire **, Bill Dolan

    Abstract: We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers t… ▽ More

    Submitted 3 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted at IEEE Conference on Games 2024

    Journal ref: IEEE Conference on Games 2024

  25. arXiv:2404.15041  [pdf, other

    cs.CV

    LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

    Authors: Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li

    Abstract: Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  26. arXiv:2404.11721  [pdf, other

    cs.AR

    Functionality Locality, Mixture & Control = Logic = Memory

    Authors: Xiangjun Peng

    Abstract: This work provides new insights and constructs to the field of computer architecture and systems, and these insights are expected to be useful for the broad software stack. First, this work introduces Functionality Locality: this form of Functionality Locality shows that functionalities can be changed with a single piece of information, by solely changing the access order. This broadens the scope… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  27. arXiv:2404.10685  [pdf, other

    cs.CV cs.GR

    Generating Human Interaction Motions in Scenes with Text Control

    Authors: Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe

    Abstract: We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project Page: https://research.nvidia.com/labs/toronto-ai/tesmo/

  28. arXiv:2404.07517  [pdf, other

    cs.HC

    Efficient sEMG-based Cross-Subject Joint Angle Estimation via Hierarchical Spiking Attentional Feature Decomposition Network

    Authors: Xin Zhou, Chuang Lin, Can Wang, Xiaojiang Peng

    Abstract: Surface electromyography (sEMG) has demonstrated significant potential in simultaneous and proportional control (SPC). However, existing algorithms for predicting joint angles based on sEMG often suffer from high inference costs or are limited to specific subjects rather than cross-subject scenarios. To address these challenges, we introduced a hierarchical Spiking Attentional Feature Decompositio… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  29. arXiv:2404.05415  [pdf

    cs.CL cs.AI

    Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations

    Authors: Yiming Li, Xueqing Peng, Jianfu Li, Xu Zuo, Suyuan Peng, Donghong Pei, Cui Tao, Hua Xu, Na Hong

    Abstract: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPT) present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to compare the performance of GPT with t… ▽ More

    Submitted 14 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  30. arXiv:2404.04212  [pdf, other

    cs.CL

    Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

    Authors: Tong Su, Xin Peng, Sarubi Thillainathan, David Guzmán, Surangika Ranathunga, En-Shiun Annie Lee

    Abstract: Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies sign… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to the Findings of NAACL 2024

  31. arXiv:2404.00511  [pdf, other

    cs.CL cs.CV cs.MM

    MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models

    Authors: Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, Bowen Zhang, Xiaojiang Peng

    Abstract: This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations. We propose a novel Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction (MER-MCE) framework that integrates text, audio, and visual modalities using specialized emotion encoders. Our approach sets itself apart from top-performing teams by leveragin… ▽ More

    Submitted 11 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Ranked 3rd in SemEval '24 Task 3 with F1 of 0.3435, close to 1st & 2nd by 0.0339 & 0.0025

  32. arXiv:2404.00404  [pdf, other

    cs.IT

    Value, Representation, Information and Communication

    Authors: Xiangjun Peng

    Abstract: A new analytic framework is first formalized via the usage of the Monadology (Leibniz 1898), to expand the understanding of Zermelo-Fraenkel-choice set theory (ZFC) and Von Neumann-Bernays-Godel set theory (NBG). Implicitly, the framework levels value, representation and information separately. Given the fact that there exists a coincidental equivalence between Von Neumann universe and originally-… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  33. arXiv:2403.19386  [pdf, other

    cs.CV cs.AI

    PointCloud-Text Matching: Benchmark Datasets and a Baseline

    Authors: Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu

    Abstract: In this paper, we present and study a new instance-level retrieval task: PointCloud-Text Matching~(PTM), which aims to find the exact cross-modal instance that matches a given point-cloud query or text query. PTM could be applied to various scenarios, such as indoor/urban-canyon localization and scene retrieval. However, there exists no suitable and targeted dataset for PTM in practice. Therefore,… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  34. arXiv:2403.17712  [pdf, other

    cs.CV

    Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark

    Authors: Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng

    Abstract: The widespread use of various chemical gases in industrial processes necessitates effective measures to prevent their leakage during transportation and storage, given their high toxicity. Thermal infrared-based computer vision detection techniques provide a straightforward approach to identify gas leakage areas. However, the development of high-quality algorithms has been challenging due to the lo… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  35. arXiv:2403.11220  [pdf, ps, other

    cs.CV cs.AI cs.LG

    CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations

    Authors: Yuwei Zhang, Yan Wu, Yanming Liu, Xinyue Peng

    Abstract: Object detection methods under known single degradations have been extensively investigated. However, existing approaches require prior knowledge of the degradation type and train a separate model for each, limiting their practical applications in unpredictable environments. To address this challenge, we propose a chain-of-thought (CoT) prompted adaptive enhancer, CPA-Enhancer, for object detectio… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  36. arXiv:2403.11145  [pdf, other

    cs.CL

    A Challenge Dataset and Effective Models for Conversational Stance Detection

    Authors: Fuqiang Niu, Min Yang, Ang Li, Baoquan Zhang, Xiaojiang Peng, Bowen Zhang

    Abstract: Previous stance detection studies typically concentrate on evaluating stances within individual instances, thereby exhibiting limitations in effectively modeling multi-party discussions concerning the same specific topic, as naturally transpire in authentic social media interactions. This constraint arises primarily due to the scarcity of datasets that authentically replicate real social media con… ▽ More

    Submitted 21 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Journal ref: LREC-COLING 2024

  37. arXiv:2403.08433  [pdf, other

    cs.CV

    An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

    Authors: Yuxin Tian, Mouxing Yang, Yunfan Li, Dayiheng Liu, Xingzhang Ren, Xi Peng, Jiancheng Lv

    Abstract: Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable p… ▽ More

    Submitted 18 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME2024

  38. arXiv:2403.07932  [pdf, other

    cs.GT cs.AI

    Feint in Multi-Player Games

    Authors: Junyu Liu, Wangkai **, Xiangjun Peng

    Abstract: This paper introduces the first formalization, implementation and quantitative evaluation of Feint in Multi-Player Games. Our work first formalizes Feint from the perspective of Multi-Player Games, in terms of the temporal, spatial, and their collective impacts. The formalization is built upon Non-transitive Active Markov Game Model, where Feint can have a considerable amount of impacts. Then, our… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  39. arXiv:2403.07931  [pdf, other

    cs.GT cs.GR

    Formalizing Feint Actions, and Example Studies in Two-Player Games

    Authors: Junyu Liu, Wangkai **, Xiangjun Peng

    Abstract: Feint actions refer to a set of deceptive actions, which enable players to obtain temporal advantages from their opponents. Such actions are regarded as widely-used tactic in most non-deterministic Two-player Games (e.g. boxing and fencing). However, existing literature does not provide comprehensive and concrete formalization on Feint actions, and their implications on Two-Player Games. We argue… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  40. arXiv:2403.07088  [pdf, other

    cs.CL

    SPA: Towards A Computational Friendly Cloud-Base and On-Devices Collaboration Seq2seq Personalized Generation

    Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Le Dai, Xingzu Liu, Ruilin Nong, Weihao Liu

    Abstract: Large language models(LLMs) have shown its outperforming ability on various tasks and question answering. However, LLMs require substantial memory storage on low-resource devices. More critically, the computational speed on these devices is also severely limited. In this paper, we propose SPA(Side Plugin Adaption), a lightweight architecture for fast on-devices inference on the constraints of stri… ▽ More

    Submitted 20 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 15 pages, second version of SPA(Side Plugin Adaption)

  41. arXiv:2403.06932  [pdf, other

    cs.CL

    ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis

    Authors: Yanming Liu, Xinyue Peng, Tianyu Du, Jianwei Yin, Weihao Liu, Xuhong Zhang

    Abstract: Large language models (LLMs) have achieved commendable accomplishments in various natural language processing tasks. However, LLMs still encounter significant challenges when dealing with complex scenarios involving multiple entities. These challenges arise from the presence of implicit relationships that demand multi-step reasoning. In this paper, we propose a novel approach ERA-CoT, which aids L… ▽ More

    Submitted 6 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 15 pages, second version of ERA-CoT

  42. arXiv:2403.06840  [pdf, other

    cs.CL cs.AI

    RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback

    Authors: Yanming Liu, Xinyue Peng, Xuhong Zhang, Weihao Liu, Jianwei Yin, Jiannan Cao, Tianyu Du

    Abstract: Large language models (LLMs) demonstrate exceptional performance in numerous tasks but still heavily rely on knowledge stored in their parameters. Moreover, updating this knowledge incurs high training costs. Retrieval-augmented generation (RAG) methods address this issue by integrating external knowledge. The model can answer questions it couldn't previously by retrieving knowledge relevant to th… ▽ More

    Submitted 6 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 20 pages, multiple figures. Providing second version RA-ISF

  43. arXiv:2403.06168  [pdf, other

    cs.CV cs.AI

    DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

    Authors: Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, **long Peng, Zhengkai Jiang, Jiangning Zhang, Taisong **, Chengjie Wang, Rongrong Ji

    Abstract: Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matti… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  44. arXiv:2403.03740  [pdf, other

    cs.CV cs.MM

    Self-supervised Photographic Image Layout Representation Learning

    Authors: Zhaoran Zhao, Peng Lu, Xujun Peng, Wenhao Guo

    Abstract: In the domain of image layout representation learning, the critical process of translating image layouts into succinct vector forms is increasingly significant across diverse applications, such as image retrieval, manipulation, and generation. Most approaches in this area heavily rely on costly labeled datasets and notably lack in adapting their modeling and learning methods to the specific nuance… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  45. arXiv:2403.02950  [pdf, other

    cs.AI cs.CR

    A general approach to enhance the survivability of backdoor attacks by decision path coupling

    Authors: Yufei Zhao, Dingji Wang, Bihuan Chen, Ziqian Chen, Xin Peng

    Abstract: Backdoor attacks have been one of the emerging security threats to deep neural networks (DNNs), leading to serious consequences. One of the mainstream backdoor defenses is model reconstruction-based. Such defenses adopt model unlearning or pruning to eliminate backdoors. However, little attention has been paid to survive from such defenses. To bridge the gap, we propose Venom, the first generic ba… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  46. arXiv:2403.01169  [pdf, other

    cs.CV

    Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection

    Authors: Chenchen Tao, Chong Wang, Yuexian Zou, Xiaohao Peng, Jiafei Wu, Jiangbo Qian

    Abstract: Most models for weakly supervised video anomaly detection (WS-VAD) rely on multiple instance learning, aiming to distinguish normal and abnormal snippets without specifying the type of anomaly. The ambiguous nature of anomaly definitions across contexts introduces bias in detecting abnormal and normal snippets within the abnormal bag. Taking the first step to show the model why it is anomalous, a… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  47. arXiv:2402.18879  [pdf

    cs.CV

    Dose Prediction Driven Radiotherapy Paramters Regression via Intra- and Inter-Relation Modeling

    Authors: Jiaqi Cui, Yuanyuan Xu, Jianghong Xiao, Yuchen Fei, Jiliu Zhou, Xingcheng Peng, Yan Wang

    Abstract: Deep learning has facilitated the automation of radiotherapy by predicting accurate dose distribution maps. However, existing methods fail to derive the desirable radiotherapy parameters that can be directly input into the treatment planning system (TPS), impeding the full automation of radiotherapy. To enable more thorough automatic radiotherapy, in this paper, we propose a novel two-stage framew… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted by ISBI 2024

  48. arXiv:2402.18584  [pdf, other

    cs.NE nlin.CD

    Adjusting Dynamics of Hopfield Neural Network via Time-variant Stimulus

    Authors: Xuenan Peng, Chengqing Li, Yicheng Zeng, Chun-Lai Li

    Abstract: As a paradigmatic model for nonlinear dynamics studies, the Hopfield Neural Network (HNN) demonstrates a high susceptibility to external disturbances owing to its intricate structure. This paper delves into the challenge of modulating HNN dynamics through time-variant stimuli. The effects of adjustments using two distinct types of time-variant stimuli, namely the Weight Matrix Stimulus (WMS) and t… ▽ More

    Submitted 15 January, 2024; originally announced February 2024.

    Comments: 14 pages, 21 figures

    MSC Class: 68T07

  49. arXiv:2402.15919  [pdf, other

    cs.CV cs.GR cs.LG eess.IV physics.optics

    Learning to See Through Dazzle

    Authors: Xiaopeng Peng, Erin F. Fleet, Abbie T. Watnik, Grover A. Swartzlander

    Abstract: Machine vision is susceptible to laser dazzle, where intense laser light can blind and distort its perception of the environment through oversaturation or permanent damage to sensor pixels. Here we employ a wavefront-coded phase mask to diffuse the energy of laser light and introduce a sandwich generative adversarial network (SGAN) to restore images from complex image degradations, such as varying… ▽ More

    Submitted 4 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  50. arXiv:2402.15718  [pdf, other

    stat.ML cs.LG

    A Duality Analysis of Kernel Ridge Regression in the Noiseless Regime

    Authors: Jihao Long, Xiaojun Peng, Lei Wu

    Abstract: In this paper, we conduct a comprehensive analysis of generalization properties of Kernel Ridge Regression (KRR) in the noiseless regime, a scenario crucial to scientific computing, where data are often generated via computer simulations. We prove that KRR can attain the minimax optimal rate, which depends on both the eigenvalue decay of the associated kernel and the relative smoothness of target… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.