Skip to main content

Showing 1–50 of 725 results for author: Lu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00369  [pdf, other

    cs.CL

    How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models

    Authors: Jaeyoung Lee, Ximing Lu, Jack Hessel, Faeze Brahman, Youngjae Yu, Yonatan Bisk, Ye** Choi, Saadia Gabriel

    Abstract: Given the growing influx of misinformation across news and social media, there is a critical need for systems that can provide effective real-time verification of news claims. Large language or multimodal model based verification has been proposed to scale up online policing mechanisms for mitigating spread of false and harmful content. While these can potentially reduce burden on human fact-check… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.18510  [pdf, other

    cs.CL

    WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

    Authors: Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Ye** Choi, Nouha Dziri

    Abstract: We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of novel jailbreaks. Compared to prior work that performed red-teaming via recruited human workers, gradient-based optimization, or iterative revision with… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.18259  [pdf, other

    cs.CL cs.AI

    Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated

    Authors: Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, Xinru Lu

    Abstract: As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlap** behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentia… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 2 figures

  5. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  6. arXiv:2406.15720  [pdf, other

    cs.CL

    Scaling Laws for Fact Memorization of Large Language Models

    Authors: Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuan**g Huang, Xipeng Qiu

    Abstract: Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law r… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.14874  [pdf, other

    cs.CV

    TraceNet: Segment one thing efficiently

    Authors: Mingyuan Wu, Zichuan Liu, Haozhen Zheng, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt

    Abstract: Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the co… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  8. arXiv:2406.14507  [pdf, other

    cs.LG cs.AI

    On Newton's Method to Unlearn Neural Networks

    Authors: Nhung Bui, Xinyang Lu, See-Kiong Ng, Bryan Kian Hsian Low

    Abstract: Machine unlearning facilitates personal data ownership, including the ``right to be forgotten''. The proliferation of applications of \emph{neural networks} (NNs) trained on users' personal data calls for the need to develop algorithms to unlearn an NN. Since retraining is costly, efficiency is often achieved through approximate unlearning which aims to unlearn a trained NN to be close to the retr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.12221  [pdf, other

    cs.CL

    On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

    Authors: Xueru Wen, Xinyu Lu, Xinyan Guan, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun

    Abstract: Hallucination occurs when large language models (LLMs) exhibit behavior that deviates from the boundaries of their knowledge during the response generation process. Previous learning-based methods focus on detecting knowledge boundaries and finetuning models with instance-level feedback, but they suffer from inaccurate signals due to off-policy data sampling and coarse-grained feedback. In this pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, **gning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong **, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.10956  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Channel Learning for Large-Scale Radio Speaker Verification

    Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

    Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 11 figures

  13. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  14. arXiv:2406.09900  [pdf, other

    cs.CL

    GEB-1.3B: Open Lightweight Large Language Model

    Authors: Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu

    Abstract: Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the ex… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: GEB-1.3B technical report

  15. arXiv:2406.07409  [pdf, other

    stat.ML cs.IT cs.LG eess.SP math.OC

    Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent

    Authors: HanQin Cai, Longxiu Huang, Xiliang Lu, Juntao You

    Abstract: This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of th… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    MSC Class: 15A29; 15A83; 47B35; 90C17; 90C26; 90C53

  16. arXiv:2406.06925  [pdf, other

    cs.LG cs.IR

    Non-autoregressive Personalized Bundle Generation

    Authors: Wenchuan Yang, Cheng Yang, Jichao Li, Yue** Tan, Xin Lu, Chuan Shi

    Abstract: The personalized bundle generation problem, which aims to create a preferred bundle for user from numerous candidate items, receives increasing attention in recommendation. However, existing works ignore the order-invariant nature of the bundle and adopt sequential modeling methods as the solution, which might introduce inductive bias and cause a large latency in prediction. To address this proble… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to Information Processing & Management

  17. arXiv:2406.04842  [pdf, other

    cs.CV

    3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation

    Authors: Feiyu Pan, Hao Fang, Xiankai Lu

    Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video, emphasizing modeling dense text-video relations. The current RVOS methods typically use independently pre-trained vision and language models as backbones, resulting in a significant domain gap between video and text. In cross-modal feature interaction, text features are only used a… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2406.01388  [pdf, other

    cs.CV

    AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation

    Authors: Junhao Cheng, Xi Lu, Hanhui Li, Khun Loun Zai, Baiqiao Yin, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: As cutting-edge Text-to-Image (T2I) generation models already excel at producing remarkable single images, an even more challenging task, i.e., multi-turn interactive image generation begins to attract the attention of related research communities. This task requires models to interact with users over multiple turns to generate a coherent sequence of images. However, since users may switch subject… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Multi-turn interactive image generation

  19. arXiv:2406.01252  [pdf, other

    cs.CL cs.AI stat.ML

    Towards Scalable Automated Alignment of LLMs: A Survey

    Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

    Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2405.18734  [pdf, other

    cs.CV cs.RO

    PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram

    Authors: Sifan Zhou, Zhihang Yuan, Dawei Yang, Xubin Wen, Xing Hu, Yuguang Shi, Ziyu Zhao, Xiaobo Lu

    Abstract: Real-time and high-performance 3D object detection plays a critical role in autonomous driving and robotics. Recent pillar-based 3D object detectors have gained significant attention due to their compact representation and low computational overhead, making them suitable for onboard deployment and quantization. However, existing pillar-based detectors still suffer from information loss along heigh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 17 pages, 3 figures

  21. arXiv:2405.17220  [pdf, other

    cs.CL

    RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

    Authors: Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

    Abstract: Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily rely on costly proprietary models l… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Website: https://github.com/RLHF-V/RLAIF-V

  22. arXiv:2405.16940  [pdf, other

    cs.CV

    Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

    Authors: Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling

    Abstract: Adversarial attacks on Face Recognition (FR) systems have proven highly effective in compromising pure FR models, yet adversarial examples may be ineffective to the complete FR systems as Face Anti-Spoofing (FAS) models are often incorporated and can detect a significant number of them. To address this under-explored and essential problem, we propose a novel setting of adversarially attacking both… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  23. arXiv:2405.16057  [pdf, other

    cs.CL cs.LG

    SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

    Authors: Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li

    Abstract: Large Language Models (LLMs) have become pivotal in advancing the field of artificial intelligence, yet their immense sizes pose significant challenges for both fine-tuning and deployment. Current post-training pruning methods, while reducing the sizes of LLMs, often fail to maintain their original performance. To address these challenges, this paper introduces SPP, a Sparsity-Preserved Parameter-… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  24. arXiv:2405.14854  [pdf, other

    cs.CV cs.LG

    TerDiT: Ternary Diffusion Models with Transformers

    Authors: Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li

    Abstract: Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among these diffusion models, diffusion transformers have demonstrated superior image generation capabilities, boosting lower FID scores and higher scalability.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 18 pages, 13 figures

  25. arXiv:2405.14014  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

    Authors: Fangqiang Ding, Xiangyu Wen, Lawrence Zhu, Yiming Li, Chris Xiaoxuan Lu

    Abstract: 3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment… ▽ More

    Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 16 pages, 3 figures

  26. arXiv:2405.13227  [pdf

    cs.LG physics.app-ph

    A rapid approach to urban traffic noise map** with a generative adversarial network

    Authors: Xinhao Yang, Zhen Han, Xiaodong Lu, Yuan Zhang

    Abstract: With rapid urbanisation and the accompanying increase in traffic density, traffic noise has become a major concern in urban planning. However, traditional grid noise map** methods have limitations in terms of time consumption, software costs, and a lack of parameter integration interfaces. These limitations hinder their ability to meet the need for iterative updates and rapid performance feedbac… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: submitted to Applied Acoustics as a technical note

  27. Generative Students: Using LLM-Simulated Student Profiles to Support Question Item Evaluation

    Authors: Xinyi Lu, Xu Wang

    Abstract: Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: To be published in L@S'24: Proceedings of the Eleventh ACM Conference on Learning @ Scale

  28. arXiv:2405.10818  [pdf

    cs.SI

    Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System

    Authors: Jiawei Feng, Mengsi Cai, Fangze Dai, Tianci Bu, Xiaoyu Zhang, Huijun Zheng, Xin Lu

    Abstract: In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.10428 by other authors

  29. arXiv:2405.10767  [pdf, other

    cs.HC cs.AI

    Evaluating Saliency Explanations in NLP by Crowdsourcing

    Authors: Xiaotian Lu, Jiyi Li, Zhen Wan, Xiaofeng Lin, Koh Takeuchi, Hisashi Kashima

    Abstract: Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 13 pages, 4 figures, Accepted for LREC-Coling 2024 (Oral)

  30. arXiv:2405.09455  [pdf, ps, other

    stat.CO cs.IT

    Efficient pooling designs and screening performance in group testing for two type defectives

    Authors: Hiroyasu Matsushima, Yusuke Tajima, Xiao-Nan Lu, Masakazu Jimbo

    Abstract: Group testing is utilized in the case when we want to find a few defectives among large amount of items. Testing n items one by one requires n tests, but if the ratio of defectives is small, group testing is an efficient way to reduce the number of tests. Many research have been developed for group testing for a single type of defectives. In this paper, we consider the case where two types of defe… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  31. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  32. arXiv:2405.08322  [pdf, other

    cs.CV

    StraightPCF: Straight Point Cloud Filtering

    Authors: Dasith de Silva Edirimuni, Xuequan Lu, Gang Li, Lei Wei, Antonio Robles-Kelly, Hongdong Li

    Abstract: Point cloud filtering is a fundamental 3D vision task, which aims to remove noise while recovering the underlying clean surfaces. State-of-the-art methods remove noise by moving noisy points along stochastic trajectories to the clean surfaces. These methods often require regularization within the training objective and/or during post-processing, to ensure fidelity. In this paper, we introduce Stra… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted to the IEEE/CVF CVPR Conference, 2024

  33. arXiv:2405.08114  [pdf, other

    cs.CV

    RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations

    Authors: Chengde Lin, Xijun Lu, Guangxi Chen

    Abstract: Synthesizing high-quality photorealistic images with textual descriptions as a condition is very challenging. Generative Adversarial Networks (GANs), the classical model for this task, frequently suffer from low consistency between image and text descriptions and insufficient richness in synthesized images. Recently, conditional affine transformations (CAT), such as conditional batch normalization… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  34. Quality-aware Selective Fusion Network for V-D-T Salient Object Detection

    Authors: Liuxin Bao, Xiaofei Zhou, Xiankai Lu, Yaoqi Sun, Haibing Yin, Zhenghui Hu, Jiyong Zhang, Chenggang Yan

    Abstract: Depth images and thermal images contain the spatial geometry information and surface temperature information, which can act as complementary information for the RGB modality. However, the quality of the depth and thermal images is often unreliable in some challenging scenarios, which will result in the performance degradation of the two-modal based salient object detection (SOD). Meanwhile, some r… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Image Processing (TIP)

  35. QuakeBERT: Accurate Classification of Social Media Texts for Rapid Earthquake Impact Assessment

    Authors: ** Han, Zhe Zheng, Xin-Zheng Lu, Ke-Yin Chen, Jia-Rui Lin

    Abstract: Social media aids disaster response but suffers from noise, hindering accurate impact assessment and decision making for resilient cities, which few studies considered. To address the problem, this study proposes the first domain-specific LLM model and an integrated method for rapid earthquake impact assessment. First, a few categories are introduced to classify and filter microblogs considering t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: International Journal of Disaster Risk Reduction, 2024

  36. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  37. arXiv:2405.03467  [pdf, ps, other

    cs.GT

    Welfare Loss in Connected Resource Allocation

    Authors: Xiaohui Bei, Alexander Lam, Xinhang Lu, Warut Suksompong

    Abstract: We study the allocation of indivisible goods that form an undirected graph and investigate the worst-case welfare loss when requiring that each agent must receive a connected subgraph. Our focus is on both egalitarian and utilitarian welfare. Specifically, we introduce the concept of egalitarian (resp., utilitarian) price of connectivity, which captures the worst-case ratio between the optimal ega… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Appears in the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024

  38. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  39. arXiv:2404.17128  [pdf, other

    q-bio.NC cs.SI

    Network Structure Trumps Neuron Dynamics: Insights from Drosophila Connectome Simulations

    Authors: Xiaoyu Zhang, Pengcheng Yang, Jiawei Feng, Qiang Luo, Wei Lin, Xin Lu

    Abstract: Despite the success of artificial neural networks, the necessity of real network structures in simulating intelligence remains unclear. Utilizing the largest adult Drosophila connectome data set, we constructed a large-scale network communication model framework based on simple neuronal activation mechanisms to simulate the activation behavior observed in the connectome. The results demonstrate th… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  40. arXiv:2404.16348  [pdf, other

    cs.CV

    Dual Expert Distillation Network for Generalized Zero-Shot Learning

    Authors: Zhijie Rao, **gcai Guo, Xiaocheng Lu, **gming Liang, Jie Zhang, Haozhao Wang, Kang Wei, Xiaofeng Cao

    Abstract: Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform map** function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 9 pages, 4 figures; Accepted to IJCAI 2024

  41. arXiv:2404.14591  [pdf, other

    cs.CE

    Predicting the Temporal Dynamics of Prosthetic Vision

    Authors: Yuchen Hou, Laya Pullela, Jiaxin Su, Sriya Aluru, Shivani Sista, Xiankun Lu, Michael Beyeler

    Abstract: Retinal implants are a promising treatment option for degenerative retinal disease. While numerous models have been developed to simulate the appearance of elicited visual percepts ("phosphenes"), these models often either focus solely on spatial characteristics or inadequately capture the complex temporal dynamics observed in clinical trials, which vary heavily across implant technologies, subjec… ▽ More

    Submitted 1 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  42. arXiv:2404.07794  [pdf, other

    cs.CV

    DGMamba: Domain Generalization via Generalized State Space Model

    Authors: Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan

    Abstract: Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can… ▽ More

    Submitted 9 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  43. arXiv:2404.05316  [pdf, other

    cs.LG

    HOEG: A New Approach for Object-Centric Predictive Process Monitoring

    Authors: Tim K. Smit, Hajo A. Reijers, Xixi Lu

    Abstract: Predictive Process Monitoring focuses on predicting future states of ongoing process executions, such as forecasting the remaining time. Recent developments in Object-Centric Process Mining have enriched event data with objects and their explicit relations between events. To leverage this enriched data, we propose the Heterogeneous Object Event Graph encoding (HOEG), which integrates events and ob… ▽ More

    Submitted 16 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: accepted to 36th International Conference on Advanced Information Systems Engineering (CAISE), 2024

  44. arXiv:2404.05198  [pdf, ps, other

    cs.GT

    Fair Lotteries for Participatory Budgeting

    Authors: Haris Aziz, Xinhang Lu, Mashbat Suzuki, Jeremy Vollen, Toby Walsh

    Abstract: In pursuit of participatory budgeting (PB) outcomes with broader fairness guarantees, we initiate the study of lotteries over discrete PB outcomes. As the projects have heterogeneous costs, the amount spent may not be equal ex ante and ex post. To address this, we develop a technique to bound the amount by which the ex-post spend differs from the ex-ante spend -- the property is termed budget bala… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Appears in the 38th AAAI Conference on Artificial Intelligence (AAAI), 2024

  45. arXiv:2404.03602  [pdf, other

    cs.CL

    Evaluating LLMs at Detecting Errors in LLM Responses

    Authors: Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

    Abstract: With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Benchmark and code: https://github.com/psunlpgroup/ReaLMistake

  46. arXiv:2404.00795  [pdf, other

    cs.SE

    Towards Practical Requirement Analysis and Verification: A Case Study on Software IP Components in Aerospace Embedded Systems

    Authors: Zhi Ma, Cheng Wen, Jie Su, Ming Zhao, Bin Yu, Xu Lu, Cong Tian

    Abstract: IP-based software design is a crucial research field that aims to improve efficiency and reliability by reusing complex software components known as intellectual property (IP) components. To ensure the reusability of these components, particularly in security-sensitive software systems, it is necessary to analyze the requirements and perform formal verification for each IP component. However, conv… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  47. arXiv:2404.00140  [pdf, other

    cs.AI cs.LG

    Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks

    Authors: Xiaolei Lu, Jianghong Ma

    Abstract: Explainability algorithms aimed at interpreting decision-making AI systems usually consider balancing two critical dimensions: 1) \textit{faithfulness}, where explanations accurately reflect the model's inference process. 2) \textit{plausibility}, where explanations are consistent with domain experts. However, the question arises: do faithfulness and plausibility inherently conflict? In this study… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  48. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, **g Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  49. arXiv:2403.19334  [pdf, other

    cs.CV

    Test-Time Domain Generalization for Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Tai** Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma

    Abstract: Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. While domain generalization (DG) methods have been developed to enhance FAS performance, they predominantly focus on learning domain-invariant features during training, which may not guarantee generalizability to unseen data that differs largely from the source distributions. Our insight is… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  50. arXiv:2403.17532  [pdf, other

    cs.AI

    KC-GenRe: A Knowledge-constrained Generative Re-ranking Method Based on Large Language Models for Knowledge Graph Completion

    Authors: Yilin Wang, Minghao Hu, Zhen Huang, Dongsheng Li, Dong Yang, Xicheng Lu

    Abstract: The goal of knowledge graph completion (KGC) is to predict missing facts among entities. Previous methods for KGC re-ranking are mostly built on non-generative language models to obtain the probability of each candidate. Recently, generative large language models (LLMs) have shown outstanding performance on several tasks such as information extraction and dialog systems. Leveraging them for KGC re… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for publication in the proceedings of LREC-COLING 2024