Skip to main content

Showing 1–50 of 1,899 results for author: Xue, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye **, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01461  [pdf, other

    cs.CL

    Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

    Authors: Zisu Huang, Xiaohua Wang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Xiaoqing Zheng, Xuan**g Huang

    Abstract: The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.01292  [pdf, other

    cs.RO

    Preserving Relative Localization of FoV-Limited Drone Swarm via Active Mutual Observation

    Authors: Lianjie Guo, Zaitian Gongye, Ziyi Xu, Yingjian Wang, Xin Zhou, **ni Zhou, Fei Gao

    Abstract: Relative state estimation is crucial for vision-based swarms to estimate and compensate for the unavoidable drift of visual odometry. For autonomous drones equipped with the most compact sensor setting -- a stereo camera that provides a limited field of view (FoV), the demand for mutual observation for relative state estimation conflicts with the demand for environment observation. To balance the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024, 8 pages, 10 figures

  4. arXiv:2407.01219  [pdf, other

    cs.CL

    Searching for Best Practices in Retrieval-Augmented Generation

    Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuan**g Huang

    Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. arXiv:2407.01093  [pdf, other

    cs.CL cs.AI cs.MA

    IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation

    Authors: Senyu Han, Lu Chen, Li-Min Lin, Zhengshan Xu, Kai Yu

    Abstract: Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates dram… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Main

  6. arXiv:2407.00983  [pdf, other

    cs.CV

    FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models

    Authors: Ruinan **, Zikang Xu, Yuan Zhong, Qiongsong Yao, Qi Dou, S. Kevin Zhou, Xiaoxiao Li

    Abstract: The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmark… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 29 pages, 17 figures

  7. arXiv:2407.00934  [pdf, other

    cs.CL

    CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

    Authors: **gheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

    Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 16 pages, 8 tables, 2 figures. Under review

  8. arXiv:2407.00924  [pdf, other

    cs.CL

    EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction

    Authors: **gheng Ye, Shang Qin, Yinghui Li, Xuxin Cheng, Libo Qin, Hai-Tao Zheng, Peng Xing, Zishan Xu, Guo Cheng, Zhao Wei

    Abstract: Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 22 pages, 10 tables, 9 figures. Under review

  9. arXiv:2407.00063  [pdf, other

    cs.IR cs.AI cs.LG

    An Interpretable Alternative to Neural Representation Learning for Rating Prediction -- Transparent Latent Class Modeling of User Reviews

    Authors: Giuseppe Serra, Peter Tino, Zhao Xu, Xin Yao

    Abstract: Nowadays, neural network (NN) and deep learning (DL) techniques are widely adopted in many applications, including recommender systems. Given the sparse and stochastic nature of collaborative filtering (CF) data, recent works have critically analyzed the effective improvement of neural-based approaches compared to simpler and often transparent algorithms for recommendation. Previous results showed… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  10. arXiv:2407.00031  [pdf, other

    cs.DC cs.SE

    Supercharging Federated Learning with Flower and NVIDIA FLARE

    Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

    Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

  11. arXiv:2406.19844  [pdf, other

    cs.CV cs.RO

    StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

    Authors: Jiaheng Zhuang, Guoan Wang, Siyu Zhang, Xiyang Wang, Hangning Zhou, Ziyao Xu, Chi Zhang, Zhiheng Li

    Abstract: 3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations bet… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  12. arXiv:2406.19720  [pdf

    cs.HC cs.AI

    CUPID: Improving Battle Fairness and Position Satisfaction in Online MOBA Games with a Re-matchmaking System

    Authors: Ge Fan, Chaoyun Zhang, Kai Wang, Yingjie Li, Junyang Chen, Zenglin Xu

    Abstract: The multiplayer online battle arena (MOBA) genre has gained significant popularity and economic success, attracting considerable research interest within the Human-Computer Interaction community. Enhancing the gaming experience requires a deep understanding of player behavior, and a crucial aspect of MOBA games is matchmaking, which aims to assemble teams of comparable skill levels. However, exist… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 38 pages, accepted by CSCW 24

  13. arXiv:2406.19693  [pdf, other

    cs.RO cs.CV

    MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?

    Authors: **ming Li, Yichen Zhu, Zhiyuan Xu, **dong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang

    Abstract: It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, m… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  14. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang **g, Haining Tan, **g** Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  15. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  16. Start from Zero: Triple Set Prediction for Automatic Knowledge Graph Completion

    Authors: Wen Zhang, Ya**g Xu, Peng Ye, Zhiwei Huang, Zezhong Xu, Jiaoyan Chen, Jeff Z. Pan, Huajun Chen

    Abstract: Knowledge graph (KG) completion aims to find out missing triples in a KG. Some tasks, such as link prediction and instance completion, have been proposed for KG completion. They are triple-level tasks with some elements in a missing triple given to predict the missing element of the triple. However, knowing some elements of the missing triple in advance is not always a realistic setting. In this p… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Paper accepted by TKDE in 2024

  17. arXiv:2406.17898  [pdf, other

    cs.RO cs.AI

    Human-centered In-building Embodied Delivery Benchmark

    Authors: Zhuoqun Xu, Yang Liu, Xiaoqi Li, Jiyao Zhang, Hao Dong

    Abstract: Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  18. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  19. arXiv:2406.16221  [pdf, other

    cs.LG cs.AI cs.GR econ.EM stat.ME

    F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data

    Authors: Zexing Xu, Linjun Zhang, Sitan Yang, Rasoul Etesami, Hanghang Tong, Huan Zhang, Jiawei Han

    Abstract: Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns f… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    MSC Class: 68T07; 68T05; 62M10; 62M20; 90C90; 91B84

  20. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  21. arXiv:2406.15718  [pdf, other

    cs.CL

    Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

    Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

    Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  22. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  23. arXiv:2406.14643  [pdf, other

    cs.CV cs.AI cs.CL

    Holistic Evaluation for Interleaved Text-and-Image Generation

    Authors: Minqian Liu, Zhiyang Xu, Zihao Lin, Trevor Ashby, Joy Rimchala, Jiaxin Zhang, Lifu Huang

    Abstract: Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress. 13 pages, 5 figure, 6 tables

  24. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  25. arXiv:2406.12935  [pdf, other

    cs.CR cs.AI cs.LG

    ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

    Authors: Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

    Abstract: Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  26. arXiv:2406.12257  [pdf, other

    cs.AI cs.CR

    CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

    Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran

    Abstract: The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  27. arXiv:2406.12009  [pdf, other

    cs.CL

    FinTruthQA: A Benchmark Dataset for Evaluating the Quality of Financial Information Disclosure

    Authors: Ziyue Xu, Peilin Zhou, Xinyu Shi, Jiageng Wu, Yikang Jiang, Bin Ke, Jie Yang

    Abstract: Accurate and transparent financial information disclosure is crucial in the fields of accounting and finance, ensuring market efficiency and investor confidence. Among many information disclosure platforms, the Chinese stock exchanges' investor interactive platform provides a novel and interactive way for listed firms to disclose information of interest to investors through an online question-and-… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  28. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  29. arXiv:2406.10928  [pdf, other

    cs.CR cs.AI cs.NI

    Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

    Authors: **gyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo, Penghui Hu, Yong Jiang, Zixuan Weng, Michael R. Lyv

    Abstract: Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effec… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  30. arXiv:2406.10847  [pdf, other

    cs.AI cs.CE cs.CL cs.MA

    TorchOpera: A Compound AI System for LLM Safety

    Authors: Shanshan Han, Yuhang Yao, Zijian Hu, Dimitris Stripelis, Zhaozhuo Xu, Chaoyang He

    Abstract: We introduce TorchOpera, a compound AI system for enhancing the safety and quality of prompts and responses for Large Language Models. TorchOpera ensures that all user prompts are safe, contextually grounded, and effectively processed, while enhancing LLM responses to be relevant and high quality. TorchOpera utilizes the vector database for contextual grounding, rule-based wrappers for flexible mo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  31. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  32. arXiv:2406.10635  [pdf, other

    cs.RO cs.DB cs.OS

    ROSfs: A User-Level File System for ROS

    Authors: Zijun Xu, Xuanjun Wen, Yanjie Song, Shu Yin

    Abstract: We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label. ROSfs applies a time index structure to enhance the flexible data query while the data file is under modification. It provides multi-robot systems (MRS) with prompt cross-robot data acquisition and collaboration. We impleme… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  33. arXiv:2406.10490  [pdf, other

    stat.ML cs.LG

    Active, anytime-valid risk controlling prediction sets

    Authors: Ziyu Xu, Nikos Karampatziakis, Paul Mineiro

    Abstract: Rigorously establishing the safety of black-box machine learning models concerning critical risk measures is important for providing guarantees about model behavior. Recently, Bates et. al. (JACM '24) introduced the notion of a risk controlling prediction set (RCPS) for producing prediction sets that are statistically guaranteed low risk from machine learning models. Our method extends this notion… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 25 pages, 6 figures

  34. arXiv:2406.10200  [pdf, other

    cs.CV cs.AI cs.MM

    SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation

    Authors: Ziang Xu, Jens Rittscher, Sharib Ali

    Abstract: Polyps are early cancer indicators, so assessing occurrences of polyps and their removal is critical. They are observed through a colonoscopy screening procedure that generates a stream of video frames. Segmenting polyps in their natural video screening procedure has several challenges, such as the co-existence of imaging artefacts, motion blur, and floating debris. Most existing polyp segmentatio… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 12 pages

  35. arXiv:2406.09371  [pdf, other

    cs.CV cs.LG

    LRM-Zero: Training Large Reconstruction Models with Synthesized Data

    Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

    Abstract: We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D data… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 23 pages, 8 figures. Our code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/

  36. arXiv:2406.09324  [pdf, other

    cs.CR cs.AI cs.CL

    Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

    Authors: Zhao Xu, Fan Liu, Hao Liu

    Abstract: Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key fac… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  37. arXiv:2406.09272  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    Authors: Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman

    Abstract: Generating realistic audio for human interactions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinat… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://vision.cs.utexas.edu/projects/action2sound

  38. arXiv:2406.08773  [pdf, other

    cs.CV

    DenoiseReID: Denoising Model for Representation Learning of Person Re-Identification

    Authors: Zhengrui Xu, Guan'an Wang, Xiaowen Huang, Jitao Sang

    Abstract: In this paper, we propose a novel Denoising Model for Representation Learning and take Person Re-Identification (ReID) as a benchmark task, named DenoiseReID, to improve feature discriminative with joint feature extraction and denoising. In the deep learning epoch, backbones which consists of cascaded embedding layers (e.g. convolutions or transformers) to progressively extract useful features, be… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  39. arXiv:2406.08464  [pdf, other

    cs.CL cs.AI

    Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

    Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Ye** Choi, Bill Yuchen Lin

    Abstract: High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Link: https://magpie-align.github.io/

  40. arXiv:2406.08455  [pdf, other

    cs.RO

    AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind

    Authors: Wei Ding, Fanhong Li, Ziteng Ji, Zhengrong Xue, Jia Liu

    Abstract: We propose AToM-Bot, a novel task generation and execution framework for proactive robot-human interaction, which leverages the human mental and physical state inference capabilities of the Vision Language Model (VLM) prompted by the Affective Theory of Mind (AToM). Without requiring explicit commands by humans, AToM-Bot proactively generates and follows feasible tasks to improve general human wel… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  41. arXiv:2406.08192  [pdf, other

    cs.CV

    2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

    Authors: Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu

    Abstract: Complex video object segmentation serves as a fundamental task for a wide range of downstream applications such as video editing and automatic data annotation. Here we present the 2nd place solution in the MOSE track of PVUW 2024. To mitigate problems caused by tiny objects, similar objects and fast movements in MOSE. We use instance segmentation to generate extra pretraining data from the valid a… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 5pages, 4 figures, technique report for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation

  42. arXiv:2406.07979  [pdf, other

    cs.LG cs.AI cs.IR

    Heuristic Learning with Graph Neural Networks: A Unified Framework for Link Prediction

    Authors: Juzheng Zhang, Lanning Wei, Zhen Xu, Quanming Yao

    Abstract: Link prediction is a fundamental task in graph learning, inherently shaped by the topology of the graph. While traditional heuristics are grounded in graph topology, they encounter challenges in generalizing across diverse graphs. Recent research efforts have aimed to leverage the potential of heuristics, yet a unified formulation accommodating both local and global heuristics remains undiscovered… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  43. arXiv:2406.07754  [pdf, other

    cs.CV

    HOI-Swap: Swap** Objects in Videos with Hand-Object Interaction Awareness

    Authors: Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman

    Abstract: We study the problem of precisely swap** objects in videos, with a focus on those interacted with by hands, given one user-provided reference object image. Despite the great advancements that diffusion models have made in video editing recently, these models often fall short in handling the intricacies of hand-object interactions (HOI), failing to produce realistic edits -- especially when objec… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project website: https://vision.cs.utexas.edu/projects/HOI-Swap/

  44. arXiv:2406.07520  [pdf, other

    cs.CV cs.AI cs.GR

    Neural Gaffer: Relighting Any Object via Diffusion

    Authors: Haian **, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, ** Sun, Noah Snavely

    Abstract: Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BR… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project Website: https://neural-gaffer.github.io

  45. arXiv:2406.07471  [pdf, other

    cs.CV

    OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Authors: Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kai**g Zhou, Zongyuan Ge

    Abstract: Surgical scene perception via videos are critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets for surgical workflow analysis, which typically face challenges such as small s… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Version 1

  46. arXiv:2406.07239  [pdf, other

    cs.CL

    On the Hallucination in Simultaneous Machine Translation

    Authors: Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min Zhang

    Abstract: It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the dis… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  47. arXiv:2406.07115  [pdf, other

    cs.CL cs.AI cs.LG

    Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

    Authors: Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

    Abstract: Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to enhance their reasoning capabilities on complex tasks, thus taking on the role of intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2024] utilizes the depth-first search-based decision tree (DFSDT) method for reasoning with $16000+$ real-world APIs, whi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  48. arXiv:2406.06839  [pdf, other

    cs.CL

    EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction

    Authors: Li Yang, Qifan Wang, Jianfeng Chi, Jiahao Liu, **gang Wang, Fuli Feng, Zenglin Xu, Yi Fang, Lifu Huang, Dongfang Liu

    Abstract: Product attribute value extraction involves identifying the specific values associated with various attributes from a product profile. While existing methods often prioritize the development of effective models to improve extraction performance, there has been limited emphasis on extraction efficiency. However, in real-world scenarios, products are typically associated with multiple attributes, ne… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  49. arXiv:2406.06622  [pdf, other

    cs.CL cs.AI cs.CR

    Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs

    Authors: Fan Liu, Zhao Xu, Hao Liu

    Abstract: Although safely enhanced Large Language Models (LLMs) have achieved remarkable success in tackling various complex tasks in a zero-shot manner, they remain susceptible to jailbreak attacks, particularly the unknown jailbreak attack. To enhance LLMs' generalized defense capabilities, we propose a two-stage adversarial tuning framework, which generates adversarial prompts to explore worst-case scena… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  50. arXiv:2406.06282  [pdf, other

    cs.LG

    PowerInfer-2: Fast Large Language Model Inference on a Smartphone

    Authors: Zhenliang Xue, Yixin Song, Zeyu Mi, Le Chen, Yubin Xia, Haibo Chen

    Abstract: This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's memory capacity. The key insight of PowerInfer-2 is to utilize the heterogeneous computation, memory, and I/O resources in smartphones by decomposing traditional matrix computations into fine-grained neur… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 14 pages, 11 figures