Skip to main content

Showing 1–50 of 746 results for author: Zhu, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01530  [pdf, other

    eess.IV cs.CV

    xLSTM-UNet can be an Effective 2D \& 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

    Authors: Tianrun Chen, Chaotao Ding, Lanyun Zhu, Tao Xu, Deyi Ji, Ying Zang, Zejian Li

    Abstract: Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.18539  [pdf, other

    cs.CV cs.GR

    TexPainter: Generative Mesh Texturing with Multi-view Consistency

    Authors: Hongkun Zhang, Zherong Pan, Congyi Zhang, Lifeng Zhu, Xifeng Gao

    Abstract: The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our meth… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

    Comments: accepted by Siggraph 2024

  3. arXiv:2406.17697  [pdf, other

    cs.LG cs.AI cs.CV

    HGTDP-DTA: Hybrid Graph-Transformer with Dynamic Prompt for Drug-Target Binding Affinity Prediction

    Authors: Xi Xiao, Wentao Wang, Jiacheng Xie, Li**g Zhu, Gaofei Chen, Zhengji Li, Tianyang Wang, Min Xu

    Abstract: Drug target binding affinity (DTA) is a key criterion for drug screening. Existing experimental methods are time-consuming and rely on limited structural and domain information. While learning-based methods can model sequence and structural information, they struggle to integrate contextual data and often lack comprehensive modeling of drug-target interactions. In this study, we propose a novel DT… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.14360  [pdf, other

    cs.CV

    Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment

    Authors: Yunshan Qi, Lin Zhu, Yifan Zhao, Nan Bao, Jia Li

    Abstract: Neural Radiance Fields (NeRF) achieve impressive 3D representation learning and novel view synthesis results with high-quality multi-view images as input. However, motion blur in images often occurs in low-light and high-speed motion scenes, which significantly degrade the reconstruction quality of NeRF. Previous deblurring NeRF methods are struggling to estimate information during the exposure ti… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.13890  [pdf, other

    cs.CL cs.AI

    ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

    Authors: Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu

    Abstract: LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical eval… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.13645  [pdf, other

    eess.IV cs.CV

    Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

    Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

    Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  7. arXiv:2406.13583  [pdf, other

    cs.CV

    Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

    Authors: Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

    Abstract: The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.13149  [pdf, other

    cs.CV

    High-Fidelity Facial Albedo Estimation via Texture Quantization

    Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

    Abstract: Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo recons… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

    Authors: Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, Liansheng Wang

    Abstract: With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D vi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (2024)

  10. arXiv:2406.11837  [pdf, other

    cs.CV

    Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

    Authors: Lei Zhu, Fangyun Wei, Yanye Lu, Dong Chen

    Abstract: In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as VQGAN-FC (Factorized Codes) and VQGAN-EMA, continue to grapple with challenges r… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.10475  [pdf, other

    cs.CV

    Discrete Latent Perspective Learning for Segmentation and Detection

    Authors: Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei **, Hongtao Lu, Jie** Ye

    Abstract: In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: ICML 2024 Spotlight

  12. arXiv:2406.09795  [pdf, other

    cs.LG math.NA

    DeltaPhi: Learning Physical Trajectory Residual for PDE Solving

    Authors: Xihang Yue, Linchao Zhu, Yi Yang

    Abstract: Although neural operator networks theoretically approximate any operator map**, the limited generalization capability prevents them from learning correct physical dynamics when potential data biases exist, particularly in the practical PDE solving scenario where the available data amount is restricted or the resolution is extremely low. To address this issue, we propose and formulate the Physica… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  13. arXiv:2406.09692  [pdf, other

    cs.CE cs.CG

    SplineGen: a generative model for B-spline approximation of unorganized points

    Authors: Qiang Zou, Lizhen Zhu

    Abstract: This paper presents a learning-based method to solve the traditional parameterization and knot placement problems in B-spline approximation. Different from conventional heuristic methods or recent AI-based methods, the proposed method does not assume ordered or fixed-size data points as input. There is also no need for manually setting the number of knots. It casts the parameterization and knot pl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.05720  [pdf, other

    cs.AI cs.MA

    VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft

    Authors: Yubo Dong, Xukun Zhu, Zhengzhe Pan, Linchao Zhu, Yi Yang

    Abstract: In this paper, we aim to evaluate multi-agent systems against complex dependencies, including spatial, causal, and temporal constraints. First, we construct a new benchmark, named VillagerBench, within the Minecraft environment.VillagerBench comprises diverse tasks crafted to test various aspects of multi-agent collaboration, from workload distribution to dynamic adaptation and synchronized task e… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  15. arXiv:2406.04888  [pdf, other

    cs.CV

    Zero-Shot Video Editing through Adaptive Sliding Score Distillation

    Authors: Lianghan Zhu, Yanqi Bao, **g Huo, **g Wu, Yu-Kun Lai, Wenbin Li, Yang Gao

    Abstract: The burgeoning field of text-based video generation (T2V) has reignited significant interest in the research of controllable video editing. Although pre-trained T2V-based editing models have achieved efficient editing capabilities, current works are still plagued by two major challenges. Firstly, the inherent limitations of T2V models lead to content inconsistencies and motion discontinuities betw… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  16. arXiv:2406.04542  [pdf, other

    cs.CV cs.GR

    M&M VTO: Multi-Garment Virtual Try-On and Editing

    Authors: Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman

    Abstract: We present M&M VTO, a mix and match virtual try-on method that takes as input multiple garment images, text description for garment layout and an image of a person. An example input includes: an image of a shirt, an image of a pair of pants, "rolled sleeves, shirt tucked in", and an image of a person. The output is a visualization of how those garments (in the desired layout) would look like on th… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Highlight. Project website: https://mmvto.github.io/

  17. arXiv:2406.03092  [pdf, other

    cs.CL

    FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models

    Authors: Xihang Yue, Linchao Zhu, Yi Yang

    Abstract: To process contexts with unlimited length using Large Language Models (LLMs), recent studies explore hierarchically managing the long text. Only several text fragments are taken from the external memory and passed into the temporary working memory, i.e., LLM's context window. However, existing approaches isolatedly handle the text fragments without considering their structural connections, thereby… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  18. arXiv:2406.03065  [pdf, other

    cs.LG cs.CV

    Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner

    Authors: Qiang Nie, Weifu Fu, Yuhuan Lin, Jialin Li, Yifeng Zhou, Yong Liu, Lei Zhu, Chengjie Wang

    Abstract: Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with t… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 14 pages

  19. arXiv:2406.01080  [pdf, other

    cs.CR cs.DC cs.LG

    No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning

    Authors: Zhibo Xing, Zijian Zhang, Zi'ang Zhang, Jiamou Liu, Liehuang Zhu, Giovanni Russello

    Abstract: Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection. However, traditional federated learning is vulnerable to poisoning attacks, which can not only decrease the model performance, but also implant malicious backdoors. In addition, direct submission of local model parameters can also lead to the privacy lea… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2406.01047  [pdf, other

    cs.DC cs.AI cs.LG

    An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

    Authors: Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda

    Abstract: Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.00683  [pdf, other

    eess.IV cs.CV cs.MM

    Exploiting Frequency Correlation for Hyperspectral Image Reconstruction

    Authors: Muge Yan, Lizhi Wang, Lin Zhu, Hua Huang

    Abstract: Deep priors have emerged as potent methods in hyperspectral image (HSI) reconstruction. While most methods emphasize space-domain learning using image space priors like non-local similarity, frequency-domain learning using image frequency priors remains neglected, limiting the reconstruction capability of networks. In this paper, we first propose a Hyperspectral Frequency Correlation (HFC) prior r… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 14 pages, 11 figures

  22. arXiv:2406.00672  [pdf, other

    cs.CV

    Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

    Authors: Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

    Abstract: In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  23. arXiv:2405.19718  [pdf, other

    cs.CV

    LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising

    Authors: Yuxing Duan, Shihan Peng, Lin Zhu, Wei Zhang, Yi Chang, Sheng Zhong, Luxin Yan

    Abstract: Event camera has significant advantages in capturing dynamic scene information while being prone to noise interference, particularly in challenging conditions like low threshold and low illumination. However, most existing research focuses on gentle situations, hindering event camera applications in realistic complex scenarios. To tackle this limitation and advance the field, we construct a new pa… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024

  24. arXiv:2405.19326  [pdf, other

    cs.CV cs.GR cs.HC

    Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models

    Authors: Tianrun Chen, Chunan Yu, **g Li, Jianqi Zhang, Lanyun Zhu, Deyi Ji, Yong Zhang, Ying Zang, Zejian Li, Lingyun Sun

    Abstract: In this paper, we introduce a new task: Zero-Shot 3D Reasoning Segmentation for parts searching and localization for objects, which is a new paradigm to 3D segmentation that transcends limitations for previous category-specific 3D semantic segmentation, 3D instance segmentation, and open-vocabulary 3D segmentation. We design a simple baseline method, Reasoning3D, with the capability to understand… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  25. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  26. arXiv:2405.19131  [pdf, other

    cs.DC

    Learning Interpretable Scheduling Algorithms for Data Processing Clusters

    Authors: Zhibo Hu, Chen Wang, Helen, Paik, Yanfeng Shu, Liming Zhu

    Abstract: Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 20 pages, 18 figures

    MSC Class: 68M20 ACM Class: I.2.8; D.4.1

  27. arXiv:2405.18428  [pdf, other

    cs.CV cs.AI

    DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

    Authors: Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang

    Abstract: Diffusion models with large-scale pre-training have achieved significant success in the field of visual content generation, particularly exemplified by Diffusion Transformers (DiT). However, DiT models have faced challenges with scalability and quadratic complexity efficiency. In this paper, we aim to leverage the long sequence modeling capability of Gated Linear Attention (GLA) Transformers, expa… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Code is released at https://github.com/hustvl/DiG

  28. arXiv:2405.18425  [pdf, other

    cs.CV cs.AI

    ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

    Authors: Bencheng Liao, Xinggang Wang, Lianghui Zhu, Qian Zhang, Chang Huang

    Abstract: Recently, linear complexity sequence modeling networks have achieved modeling capabilities similar to Vision Transformers on a variety of computer vision tasks, while using fewer FLOPs and less memory. However, their advantage in terms of actual runtime speed is not significant. To address this issue, we introduce Gated Linear Attention (GLA) for vision, leveraging its superior hardware-awareness… ▽ More

    Submitted 28 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress. Code is available at \url{https://github.com/hustvl/ViG}

  29. arXiv:2405.17872  [pdf, other

    cs.CV

    HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction

    Authors: Haoyu Zhao, Xingyue Zhao, Lingting Zhu, Weixi Zheng, Yongchao Xu

    Abstract: Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent tre… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 13 pages, 4 figures

  30. arXiv:2405.16599  [pdf, other

    cs.RO

    MCGMapper: Light-Weight Incremental Structure from Motion and Visual Localization With Planar Markers and Camera Groups

    Authors: Yusen Xie, Zhenmin Huang, Kai Chen, Lei Zhu, Jun Ma

    Abstract: Structure from Motion (SfM) and visual localization in indoor texture-less scenes and industrial scenarios present prevalent yet challenging research topics. Existing SfM methods designed for natural scenes typically yield low accuracy or map-building failures due to insufficient robust feature extraction in such settings. Visual markers, with their artificially designed features, can effectively… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 8 pages,8 figures

  31. arXiv:2405.16417  [pdf, other

    cs.CV

    CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

    Authors: Lin Zhu, Yifeng Yang, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang Ye

    Abstract: Recent vision-language pre-trained models (VL-PTMs) have shown remarkable success in open-vocabulary tasks. However, downstream use cases often involve further fine-tuning of VL-PTMs, which may distort their general knowledge and impair their ability to handle distribution shifts. In real-world scenarios, machine learning systems inevitably encounter both covariate shifts (e.g., changes in image s… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  32. arXiv:2405.14014  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

    Authors: Fangqiang Ding, Xiangyu Wen, Lawrence Zhu, Yiming Li, Chris Xiaoxuan Lu

    Abstract: 3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment… ▽ More

    Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 16 pages, 3 figures

  33. arXiv:2405.12432  [pdf, ps, other

    cs.IT eess.SP

    Power Measurement Based Channel Estimation for IRS-Enhanced Wireless Coverage

    Authors: He Sun, Lipeng Zhu, Weidong Mei, Rui Zhang

    Abstract: In this paper, we study an IRS-assisted coverage enhancement problem for a given region, aiming to optimize the passive reflection of the IRS for improving the average communication performance in the region by accounting for both deterministic and random channels in the environment. To this end, we first derive the closed-form expression of the average received signal power in terms of the determ… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.08275

  34. arXiv:2405.12217  [pdf, other

    cs.CV cs.AI cs.LG

    Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning

    Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Salman Khan, Xin Gao, Lina Yao

    Abstract: Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 17 pages, 7 figures, 7 tables

  35. arXiv:2405.11171  [pdf, other

    cs.LG

    Graph Feedback Bandits with Similar Arms

    Authors: Han Qi, Guo Fei, Li Zhu

    Abstract: In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by the clinical trials and recommendation problem, we assume that two arms are connected if and only if they are similar (i.e., their means are close enough). We establish a regret lower bound for this novel feedback structure and introduce two UCB-based algorithms: D-UCB with problem-independent regre… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  36. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  37. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang **, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  38. arXiv:2405.10467  [pdf, other

    cs.AI cs.SE

    Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents

    Authors: Yue Liu, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, Jon Whittle

    Abstract: Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking… ▽ More

    Submitted 24 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  39. arXiv:2405.08035  [pdf, other

    cs.HC cs.AI

    A LLM-based Controllable, Scalable, Human-Involved User Simulator Framework for Conversational Recommender Systems

    Authors: Lixi Zhu, Xiaowen Huang, Jitao Sang

    Abstract: Conversational Recommender System (CRS) leverages real-time feedback from users to dynamically model their preferences, thereby enhancing the system's ability to provide personalized recommendations and improving the overall user experience. CRS has demonstrated significant promise, prompting researchers to concentrate their efforts on develo** user simulators that are both more realistic and tr… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  40. arXiv:2405.04133  [pdf, other

    cs.CV

    Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method

    Authors: Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li

    Abstract: The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides,… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  41. arXiv:2405.04108  [pdf, other

    cs.CR cs.AI

    A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model

    Authors: Tianxiu Xie, Keke Gai, **g Yu, Liehuang Zhu, Kim-Kwang Raymond Choo

    Abstract: Recent booming development of Generative Artificial Intelligence (GenAI) has facilitated an emerging model commercialization for the purpose of reinforcement on model performance, such as licensing or trading Deep Neural Network (DNN) models. However, DNN model trading may trigger concerns of the unauthorized replications or misuses over the model, so that the benefit of the model ownership will b… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  42. arXiv:2405.04064  [pdf, other

    cs.AI

    MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation

    Authors: Yanli Yuan, Bingbing Wang, Chuan Zhang, **gyi Xu, Ximeng Liu, Liehuang Zhu

    Abstract: Segmentation of organs of interest in medical CT images is beneficial for diagnosis of diseases. Though recent methods based on Fully Convolutional Neural Networks (F-CNNs) have shown success in many segmentation tasks, fusing features from images with different scales is still a challenge: (1) Due to the lack of spatial awareness, F-CNNs share the same weights at different spatial locations. (2)… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024

  43. arXiv:2405.03316  [pdf, other

    cs.LG cs.CR

    Provably Unlearnable Examples

    Authors: Derui Wang, Minhui Xue, Bo Li, Seyit Camtepe, Liming Zhu

    Abstract: The exploitation of publicly accessible data has led to escalating concerns regarding data privacy and intellectual property (IP) breaches in the age of artificial intelligence. As a strategy to safeguard both data privacy and IP-related domain knowledge, efforts have been undertaken to render shared data unlearnable for unauthorized models in the wild. Existing methods apply empirically optimized… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  44. arXiv:2405.02696  [pdf, other

    cs.CR cs.AI

    DiffuseTrace: A Transparent and Flexible Watermarking Scheme for Latent Diffusion Model

    Authors: Liangqi Lei, Keke Gai, **g Yu, Liehuang Zhu

    Abstract: Latent Diffusion Models (LDMs) enable a wide range of applications but raise ethical concerns regarding illegal utilization.Adding watermarks to generative model outputs is a vital technique employed for copyright tracking and mitigating potential risks associated with AI-generated content. However, post-hoc watermarking techniques are susceptible to evasion. Existing watermarking methods for LDMs… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  45. arXiv:2405.01215  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna Enhanced Wireless Sensing Via Antenna Position Optimization

    Authors: Wenyan Ma, Lipeng Zhu, Rui Zhang

    Abstract: In this paper, we propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs). First, we show that the angle estimation performance in wireless sensing is fundamentally determined by the array geometry, where… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures. We propose a new wireless sensing system equipped with the movable-antenna (MA) array, which can flexibly adjust the positions of antenna elements for improving the sensing performance over conventional antenna arrays with fixed-position antennas (FPAs)

  46. arXiv:2404.19063  [pdf, other

    cs.CL

    SuperCLUE-Fin: Graded Fine-Grained Analysis of Chinese LLMs on Diverse Financial Tasks and Applications

    Authors: Liang Xu, Lei Zhu, Yaotong Wu, Hang Xue

    Abstract: The SuperCLUE-Fin (SC-Fin) benchmark is a pioneering evaluation framework tailored for Chinese-native financial large language models (FLMs). It assesses FLMs across six financial application domains and twenty-five specialized tasks, encompassing theoretical knowledge and practical applications such as compliance, risk management, and investment analysis. Using multi-turn, open-ended conversation… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 11 pages, 19 figures, and tables

  47. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.16581  [pdf, other

    cs.CV

    AudioScenic: Audio-Driven Video Scene Editing

    Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

    Abstract: Audio-driven visual scene editing endeavors to manipulate the visual background while leaving the foreground content unchanged, according to the given audio signals. Unlike current efforts focusing primarily on image editing, audio-driven video scene editing has not been extensively addressed. In this paper, we introduce AudioScenic, an audio-driven framework designed for video scene editing. Audi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  49. arXiv:2404.16579  [pdf, other

    cs.AI cs.RO

    Neural Interaction Energy for Multi-Agent Trajectory Prediction

    Authors: Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

    Abstract: Maintaining temporal stability is crucial in multi-agent trajectory prediction. Insufficient regularization to uphold this stability often results in fluctuations in kinematic states, leading to inconsistent predictions and the amplification of errors. In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE). This framework assesses the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  50. arXiv:2404.15643  [pdf, ps, other

    cs.IT eess.SP

    Dynamic Beam Coverage for Satellite Communications Aided by Movable-Antenna Array

    Authors: Lipeng Zhu, Xiangyu Pi, Wenyan Ma, Zhenyu Xiao, Rui Zhang

    Abstract: Due to the ultra-dense constellation, efficient beam coverage and interference mitigation are crucial to low-earth orbit (LEO) satellite communication systems, while the conventional directional antennas and fixed-position antenna (FPA) arrays both have limited degrees of freedom (DoFs) in beamforming to adapt to the time-varying coverage requirement of terrestrial users. To address this challenge… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.