Skip to main content

Showing 1–50 of 290 results for author: Shen, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16810  [pdf, other

    cs.LG cs.AI cs.CL

    PISTOL: Dataset Compilation Pipeline for Structural Unlearning of LLMs

    Authors: Xinchi Qiu, William F. Shen, Yihong Chen, Nicola Cancedda, Pontus Stenetorp, Nicholas D. Lane

    Abstract: Recently, machine unlearning, which seeks to erase specific data stored in the pre-trained or fine-tuned models, has emerged as a crucial protective measure for LLMs. However, unlearning approaches for LLMs that have been considered thus far have focused on the removal of independent data points and have not taken into account that the stored facts are logically connected to one another and form a… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.15313  [pdf, other

    cs.IR cs.CL

    STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals

    Authors: Weihang Su, Yiran Hu, Anzhe Xie, Qingyao Ai, Zibing Que, Ning Zheng, Yun Liu, Weixing Shen, Yiqun Liu

    Abstract: Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, legal document drafting, etc. Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents, thereby neglecting non-professional quer… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  3. arXiv:2406.14877  [pdf, other

    cs.CL

    Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video

    Authors: Zhengbang Yang, Haotian Xia, **gxi Li, Zezhi Chen, Zhuangdi Zhu, Weining Shen

    Abstract: Understanding sports is crucial for the advancement of Natural Language Processing (NLP) due to its intricate and dynamic nature. Reasoning over complex sports scenarios has posed significant challenges to current NLP technologies which require advanced cognitive capabilities. Toward addressing the limitations of existing benchmarks on sports understanding in the NLP field, we extensively evaluate… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.12252  [pdf, other

    cs.CL

    Language and Multimodal Models in Sports: A Survey of Datasets and Applications

    Authors: Haotian Xia, Zhengbang Yang, Yun Zhao, Yuqing Wang, **gxi Li, Rhys Tracy, Zhuangdi Zhu, Yuan-fang Wang, Hanjie Chen, Weining Shen

    Abstract: Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are fo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.11507  [pdf, other

    cs.CV

    Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

    Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

    Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industrial Informatics

  6. arXiv:2406.07333  [pdf, other

    cs.CV

    Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

    Authors: Haiming Yao, Wei Luo, Yunkang Cao, Yiheng Zhang, Wenyong Yu, Weiming Shen

    Abstract: Texture surface anomaly detection finds widespread applications in industrial settings. However, existing methods often necessitate gathering numerous samples for model training. Moreover, they predominantly operate within a close-set detection framework, limiting their ability to identify anomalies beyond the training dataset. To tackle these challenges, this paper introduces a novel zero-shot te… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SUBMISSION TO IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

  7. arXiv:2406.07176  [pdf, other

    cs.CV

    RAD: A Comprehensive Dataset for Benchmarking the Robustness of Image Anomaly Detection

    Authors: Yuqi Cheng, Yunkang Cao, Rui Chen, Weiming Shen

    Abstract: Robustness against noisy imaging is crucial for practical image anomaly detection systems. This study introduces a Robust Anomaly Detection (RAD) dataset with free views, uneven illuminations, and blurry collections to systematically evaluate the robustness of current anomaly detection methods. Specifically, RAD aims to identify foreign objects on working platforms as anomalies. The collection pro… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  8. arXiv:2406.05271  [pdf, other

    cs.CV

    USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

    Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

    Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.04687  [pdf, other

    cs.LG cs.CV

    LogiCode: an LLM-Driven Framework for Logical Anomaly Detection

    Authors: Yiheng Zhang, Yunkang Cao, Xiaohao Xu, Weiming Shen

    Abstract: This paper presents LogiCode, a novel framework that leverages Large Language Models (LLMs) for identifying logical anomalies in industrial settings, moving beyond traditional focus on structural inconsistencies. By harnessing LLMs for logical reasoning, LogiCode autonomously generates Python codes to pinpoint anomalies such as incorrect component quantities or missing elements, marking a signific… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.04573  [pdf

    cs.CV

    Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection

    Authors: Yiheng Zhang, Yunkang Cao, Tianhang Zhang, Weiming Shen

    Abstract: This study targets Multi-Lighting Image Anomaly Detection (MLIAD), where multiple lighting conditions are utilized to enhance imaging quality and anomaly detection performance. While numerous image anomaly detection methods have been proposed, they lack the capacity to handle multiple inputs for a single sample, like multi-lighting images in MLIAD. Hence, this study proposes Attention Fusion Rever… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2406.01014  [pdf, other

    cs.CL cs.CV

    Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

    Authors: Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

    Abstract: Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the tw… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 11 figures, 10 Tables

  12. arXiv:2405.18840  [pdf, other

    cs.CV

    Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

    Authors: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Lingxi Xie, Qi Tian, Wei Shen

    Abstract: Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.17846  [pdf, other

    cs.RO cs.AI

    Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs

    Authors: Yong Qi, Gabriel Kyebambo, Siyuan Xie, Wei Shen, Shenghui Wang, Bitao Xie, Bin He, Zhipeng Wang, Shuo Jiang

    Abstract: Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent saf… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  14. arXiv:2405.17495  [pdf, other

    cs.LG cs.CR

    Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey

    Authors: Mang Ye, Wei Shen, Bo Du, Eduard Snezhko, Vassili Kovalev, Pong C. Yuen

    Abstract: Vertical Federated Learning (VFL) is a privacy-preserving distributed learning paradigm where different parties collaboratively learn models using partitioned features of shared samples, without leaking private data. Recent research has shown promising results addressing various challenges in VFL, highlighting its potential for practical applications in cross-domain collaboration. However, the cor… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 31 pages, 9 figures, 10 tables

  15. arXiv:2405.16849  [pdf, other

    cs.CV

    Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation

    Authors: Zhoujie Fu, Jiacheng Wei, Wenhao Shen, Chaoyue Song, Xiaofeng Yang, Fayao Liu, Xulei Yang, Guosheng Lin

    Abstract: In this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shap… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project page: https://sync4dphys.github.io/

  16. arXiv:2405.14739  [pdf, other

    cs.CV

    FLoRA: Low-Rank Core Space for N-dimension

    Authors: Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen

    Abstract: Adapting pre-trained foundation models for various downstream tasks has been prevalent in artificial intelligence. Due to the vast number of tasks and high costs, adjusting all parameters becomes unfeasible. To mitigate this, several fine-tuning techniques have been developed to update the pre-trained model weights in a more resource-efficient manner, such as through low-rank adjustments. Yet, alm… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  17. arXiv:2405.14446  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Worldwide Federated Training of Language Models

    Authors: Alex Iacob, Lorenzo Sani, Bill Marino, Preslav Aleksandrov, William F. Shen, Nicholas Donald Lane

    Abstract: The reliance of language model training on massive amounts of computation and vast datasets scraped from potentially low-quality, copyrighted, or sensitive data has come into question practically, legally, and ethically. Federated learning provides a plausible alternative by enabling previously untapped data to be voluntarily gathered from collaborating organizations. However, when scaled globally… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 19 pages, 8 figures, Under Review

    ACM Class: I.2.7

  18. arXiv:2405.10853  [pdf, other

    cs.LG cs.AI cs.DC

    The Future of Large Language Model Pre-training is Federated

    Authors: Lorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. Lane

    Abstract: Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training. Federated learning (FL) has the potential to unl… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, pre-print

  19. arXiv:2404.12152  [pdf, other

    cs.CL

    FecTek: Enhancing Term Weight in Lexicon-Based Retrieval with Feature Context and Term-level Knowledge

    Authors: Zunran Wang, Zhonghua Li, Wei Shen, Qi Ye, Liqiang Nie

    Abstract: Lexicon-based retrieval has gained siginificant popularity in text retrieval due to its efficient and robust performance. To further enhance performance of lexicon-based retrieval, researchers have been diligently incorporating state-of-the-art methodologies like Neural retrieval and text-level contrastive learning approaches. Nonetheless, despite the promising outcomes, current lexicon-based retr… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  20. arXiv:2404.11981  [pdf, other

    cs.CV

    Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation

    Authors: Chongjie Si, Xuehui Wang, Xiaokang Yang, Wei Shen

    Abstract: Weakly Incremental Learning for Semantic Segmentation (WILSS) leverages a pre-trained segmentation model to segment new classes using cost-effective and readily available image-level labels. A prevailing way to solve WILSS is the generation of seed areas for each new class, serving as a form of pixel-level supervision. However, a scenario usually arises where a pixel is concurrently predicted as a… ▽ More

    Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  21. arXiv:2404.11355  [pdf, other

    cs.CV

    Consisaug: A Consistency-based Augmentation for Polyp Detection in Endoscopy Image Analysis

    Authors: Ziyu Zhou, Wenyuan Shen, Chang Liu

    Abstract: Colorectal cancer (CRC), which frequently originates from initially benign polyps, remains a significant contributor to global cancer-related mortality. Early and accurate detection of these polyps via colonoscopy is crucial for CRC prevention. However, traditional colonoscopy methods depend heavily on the operator's experience, leading to suboptimal polyp detection rates. Besides, the public data… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: MLMI 2023

  22. arXiv:2404.10573  [pdf, other

    cs.AI cs.CE q-bio.BM

    AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation

    Authors: Lijun Liu, Jiali Yang, Jianfei Song, Xinglin Yang, Lele Niu, Zeqi Cai, Hui Shi, Tingjun Hou, Chang-yu Hsieh, Weiran Shen, Yafeng Deng

    Abstract: Recombinant adeno-associated virus (rAAV) vectors have revolutionized gene therapy, but their broad tropism and suboptimal transduction efficiency limit their clinical applications. To overcome these limitations, researchers have focused on designing and screening capsid libraries to identify improved vectors. However, the large sequence space and limited resources present challenges in identifyin… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  23. arXiv:2404.07443  [pdf

    physics.optics cs.ET cs.LG

    1-bit Quantized On-chip Hybrid Diffraction Neural Network Enabled by Authentic All-optical Fully-connected Architecture

    Authors: Yu Shao, Haiqi Gao, Yipeng Chen, Yujie liu, Junren Wen, Haidong He, Yuchuan Shao, Yueguang Zhang, Weidong Shen, Chenying Yang

    Abstract: Optical Diffraction Neural Networks (DNNs), a subset of Optical Neural Networks (ONNs), show promise in mirroring the prowess of electronic networks. This study introduces the Hybrid Diffraction Neural Network (HDNN), a novel architecture that incorporates matrix multiplication into DNNs, synergizing the benefits of conventional ONNs with those of DNNs to surmount the modulation limitations inhere… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  24. arXiv:2403.18072  [pdf, other

    stat.CO cs.LG stat.ME stat.ML

    Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

    Authors: Shijie Zhong, Wanggang Shen, Tommie Catanach, Xun Huan

    Abstract: Optimal experimental design (OED) provides a systematic approach to quantify and maximize the value of experimental data. Under a Bayesian approach, conventional OED maximizes the expected information gain (EIG) on model parameters. However, we are often interested in not the parameters themselves, but predictive quantities of interest (QoIs) that depend on the parameters in a nonlinear manner. We… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    MSC Class: 62K05; 62F15; 62B15

  25. arXiv:2403.16075  [pdf, other

    cs.LG

    IBCB: Efficient Inverse Batched Contextual Bandit for Behavioral Evolution History

    Authors: Yi Xu, Weiran Shen, Xiao Zhang, Jun Xu

    Abstract: Traditional imitation learning focuses on modeling the behavioral mechanisms of experts, which requires a large amount of interaction history generated by some fixed expert. However, in many streaming applications, such as streaming recommender systems, online decision-makers typically engage in online learning during the decision-making process, meaning that the interaction history generated by o… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  26. arXiv:2403.15124  [pdf, other

    cs.CV

    EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting

    Authors: Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen

    Abstract: Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Map**) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperat… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  27. arXiv:2403.13679  [pdf, other

    cs.CL

    RoleInteract: Evaluating the Social Interaction of Role-Playing Agents

    Authors: Hongzhan Chen, Hehong Chen, Ming Yan, Wenshen Xu, Xing Gao, Weizhou Shen, Xiaojun Quan, Chenliang Li, Ji Zhang, Fei Huang, **gren Zhou

    Abstract: Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their… ▽ More

    Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  28. arXiv:2403.11083  [pdf, other

    cs.CV cs.CL

    Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning

    Authors: Xiaohao Xu, Yunkang Cao, Yongqi Chen, Weiming Shen, Xiaonan Huang

    Abstract: Anomaly detection is vital in various industrial scenarios, including the identification of unusual patterns in production lines and the detection of manufacturing defects for quality control. Existing techniques tend to be specialized in individual scenarios and lack generalization capacities. In this study, we aim to develop a generic anomaly detection model applicable across multiple scenarios.… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  29. arXiv:2403.07708  [pdf, other

    cs.CL cs.AI

    Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

    Authors: Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

    Abstract: Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences. Yet existing RLHF heavily relies on accurate and informative reward models, which are vulnerable and sensitive to noise from various sources, e.g. human labeling errors, making the pipeline fragile. In this work, we improve the effectiveness of the reward… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  30. arXiv:2403.05402  [pdf, other

    cs.CV

    DualBEV: CNN is All You Need in View Transformation

    Authors: Peidong Li, Wancheng Shen, Qihao Huang, Dixiao Cui

    Abstract: Camera-based Bird's-Eye-View (BEV) perception often struggles between adopting 3D-to-2D or 2D-to-3D view transformation (VT). The 3D-to-2D VT typically employs resource intensive Transformer to establish robust correspondences between 3D and 2D feature, while the 2D-to-3D VT utilizes the Lift-Splat-Shoot (LSS) pipeline for real-time application, potentially missing distant information. To address… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 16 pages, 6 figures, Tech Report

  31. arXiv:2403.05171  [pdf, other

    cs.LG cs.AI

    Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation

    Authors: Xiaoying Zhang, Jean-Francois Ton, Wei Shen, Hongning Wang, Yang Liu

    Abstract: We introduce Adversarial Policy Optimization (AdvPO), a novel solution to the pervasive issue of reward over-optimization in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs). Over-optimization occurs when a reward model serves as an imperfect proxy for human preference, and RL-driven policy optimization erroneously exploits reward inaccuracies. In this paper, we b… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  32. arXiv:2402.15862  [pdf, other

    cs.CL

    SportQA: A Benchmark for Sports Understanding in Large Language Models

    Authors: Haotian Xia, Zhengbang Yang, Yuqing Wang, Rhys Tracy, Yun Zhao, Dongdong Huang, Zezhi Chen, Yan Zhu, Yuan-fang Wang, Weining Shen

    Abstract: A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP). This holds particular significance in the context of evaluating and advancing Large Language Models (LLMs), given the existing gap in specialized benchmarks. To bridge this gap, we introduce SportQA, a novel benchmark specifically designed for evaluating LLMs i… ▽ More

    Submitted 17 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: NAACL 2024

  33. arXiv:2402.10259  [pdf, other

    cs.CV cs.GR

    GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

    Authors: Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

    Abstract: Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or hig… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Project page: https://gaussianobject.github.io/

  34. arXiv:2402.05808  [pdf, other

    cs.AI cs.CL cs.LG

    Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

    Authors: Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie **, Rui Zheng, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuan**g Huang

    Abstract: In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for o… ▽ More

    Submitted 17 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Preprint. Codes released: https://github.com/WooooDyy/LLM-Reverse-Curriculum-RL

  35. arXiv:2402.01391  [pdf, other

    cs.SE cs.CL

    StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

    Authors: Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuan**g Huang, Tao Gui

    Abstract: The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit te… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 13 pages, 5 figures

  36. arXiv:2401.17618  [pdf, other

    cs.CR cs.OS

    Beyond Control: Exploring Novel File System Objects for Data-Only Attacks on Linux Systems

    Authors: **meng Zhou, Jiayi Hu, Ziyue Pan, Jiaxun Zhu, Guoren Li, Wenbo Shen, Yulei Sui, Zhiyun Qian

    Abstract: The widespread deployment of control-flow integrity has propelled non-control data attacks into the mainstream. In the domain of OS kernel exploits, by corrupting critical non-control data, local attackers can directly gain root access or privilege escalation without hijacking the control flow. As a result, OS kernels have been restricting the availability of such non-control data. This forces att… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 14 pages, in submission of the 31th ACM Conference on Computer and Communications Security (CCS), 2024

  37. Ambush from All Sides: Understanding Security Threats in Open-Source Software CI/CD Pipelines

    Authors: Ziyue Pan, Wenbo Shen, Xingkai Wang, Yutian Yang, Rui Chang, Yao Liu, Chengwei Liu, Yang Liu, Kui Ren

    Abstract: The continuous integration and continuous deployment (CI/CD) pipelines are widely adopted on Internet hosting platforms, such as GitHub. With the popularity, the CI/CD pipeline faces various security threats. However, current CI/CD pipelines suffer from malicious code and severe vulnerabilities. Even worse, people have not been fully aware of its attack surfaces and the corresponding impacts. Th… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Journal ref: IEEE Transactions on Dependable and Secure Computing (Volume: 21, Issue: 1, Jan.-Feb. 2024)

  38. arXiv:2401.17050  [pdf, other

    cs.CV cs.AI

    ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual Categorization

    Authors: Danning Lao, Qi Liu, Jiazi Bu, Junchi Yan, Wei Shen

    Abstract: As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to explain the decision-making process, which can be indirect and lack intrinsic illustration. In this research, we introduce ViTree, a novel approach for fine-gr… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  39. arXiv:2401.16402  [pdf, other

    cs.CV cs.AI

    A Survey on Visual Anomaly Detection: Challenge, Approach, and Prospect

    Authors: Yunkang Cao, Xiaohao Xu, Jiangning Zhang, Yuqi Cheng, Xiaonan Huang, Guansong Pang, Weiming Shen

    Abstract: Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of normality in visual data, widely applied across diverse domains, e.g., industrial defect inspection, and medical lesion detection. This survey comprehensively examines recent advancements in VAD by identifying three primary challenges: 1) scarcity of training data, 2) diversity of visual modalities, and 3) complexi… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Work in progress. Yunkang Cao, Xiaohao Xu, and Jiangning Zhang contribute equally to this work

  40. arXiv:2401.16158  [pdf, other

    cs.CL cs.CV

    Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

    Authors: Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang

    Abstract: Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface. Based on the perceived vision context, it the… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by ICLR 2024 Workshop in Large Language Model (LLM) Agents

  41. arXiv:2401.14613  [pdf, ps, other

    cs.GT

    Multiplayer General Lotto game

    Authors: Bonan Ni, Yan Liu, Weiran Shen, Zihe Wang

    Abstract: In this paper, we explore the multiplayer General Lotto Blotto game over a single battlefield, a notable variant of the Colonel Blotto game. In this version, each player employs a probability distribution for resource allocation, ensuring that the expected expenditure does not surpass their budget. We first establish the existence of a Nash equilibrium for a modified version of this game, in which… ▽ More

    Submitted 31 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  42. arXiv:2401.11458  [pdf, other

    cs.CL

    Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

    Authors: Songyang Gao, Qiming Ge, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

    Abstract: The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by complex annotation and training requirements. This reliance limits the applicability of RLHF and hinders the development of professional assistants tailored to d… ▽ More

    Submitted 6 May, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Accepted by ICML2024

  43. arXiv:2401.08332  [pdf, other

    cs.CV

    Generative Denoise Distillation: Simple Stochastic Noises Induce Efficient Knowledge Transfer for Dense Prediction

    Authors: Zhaoge Liu, Xiaohao Xu, Yunkang Cao, Weiming Shen

    Abstract: Knowledge distillation is the process of transferring knowledge from a more powerful large model (teacher) to a simpler counterpart (student). Numerous current approaches involve the student imitating the knowledge of the teacher directly. However, redundancy still exists in the learned representations through these prevalent methods, which tend to learn each spatial location's features indiscrimi… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  44. arXiv:2401.07324  [pdf, other

    cs.AI cs.CL

    Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

    Authors: Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang

    Abstract: Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarizati… ▽ More

    Submitted 16 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: On progress, github repo: https://github.com/X-PLUG/Multi-LLM-Agent

  45. arXiv:2401.06785  [pdf, other

    cs.CL cs.AI

    Human-Instruction-Free LLM Self-Alignment with Limited Samples

    Authors: Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

    Abstract: Aligning large language models (LLMs) with human values is a vital task for LLM practitioners. Current alignment techniques have several limitations: (1) requiring a large amount of annotated data; (2) demanding heavy human involvement; (3) lacking a systematic mechanism to continuously improve. In this work, we study aligning LLMs to a new domain with limited samples (e.g. < 100). We propose an a… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  46. arXiv:2401.06080  [pdf, other

    cs.AI

    Secrets of RLHF in Large Language Models Part II: Reward Modeling

    Authors: Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie **, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuan**g Huang , et al. (2 additional authors not shown)

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they f… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  47. ModuleGuard:Understanding and Detecting Module Conflicts in Python Ecosystem

    Authors: Ruofan Zhu, Xingyu Wang, Chengwei Liu, Zhengzi Xu, Wenbo Shen, Rui Chang, Yang Liu

    Abstract: Python has become one of the most popular programming languages for software development due to its simplicity, readability, and versatility. As the Python ecosystem grows, developers face increasing challenges in avoiding module conflicts, which occur when different packages have the same namespace modules. Unfortunately, existing work has neither investigated the module conflict comprehensively… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: The paper was accepted by ICSE24

    MSC Class: 65-04 ACM Class: D.2; K.6.3

  48. arXiv:2312.15253  [pdf, other

    cs.CV cs.AI

    Efficient Deformable Tissue Reconstruction via Orthogonal Neural Plane

    Authors: Chen Yang, Kailing Wang, Yuehao Wang, Qi Dou, Xiaokang Yang, Wei Shen

    Abstract: Intraoperative imaging techniques for reconstructing deformable tissues in vivo are pivotal for advanced surgical systems. Existing methods either compromise on rendering quality or are excessively computationally intensive, often demanding dozens of hours to perform, which significantly hinders their practical application. In this paper, we introduce Fast Orthogonal Plane (Forplane), a novel, eff… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  49. arXiv:2312.11034  [pdf, other

    cs.LG

    Appeal: Allow Mislabeled Samples the Chance to be Rectified in Partial Label Learning

    Authors: Chongjie Si, Xuehui Wang, Yan Wang, Xiaokang Yang, Wei Shen

    Abstract: In partial label learning (PLL), each instance is associated with a set of candidate labels among which only one is ground-truth. The majority of the existing works focuses on constructing robust classifiers to estimate the labeling confidence of candidate labels in order to identify the correct one. However, these methods usually struggle to identify and rectify mislabeled samples. To help these… ▽ More

    Submitted 28 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Under review. An extended version of 2024 AAAI oral paper "Partial Label Learning with a Partner"

  50. arXiv:2312.09979  [pdf, other

    cs.CL

    LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

    Authors: Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuan**g Huang

    Abstract: Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to align the model with a broader range of downstream tasks or notably improve its performance on a specific task. However, we find that large-scale increase… ▽ More

    Submitted 8 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 14 pages, 7 figures