Skip to main content

Showing 1–50 of 5,834 results for author: Wang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01191  [pdf, other

    cs.RO cs.AI cs.CV

    MARS: Multimodal Active Robotic Sensing for Articulated Characterization

    Authors: Hongliang Zeng, ** Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

    Abstract: Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characteri… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wen** Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, **bo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equip** LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  4. arXiv:2407.01094  [pdf, other

    cs.CV

    Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

    Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, **gdong Wang

    Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. arXiv:2407.01085  [pdf, other

    cs.LG cs.CL

    Rethinking LLM-based Preference Evaluation

    Authors: Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, **gang Wang, Zhenyu Chen, Jieyu Zhao, Hui Xiong

    Abstract: Recently, large language model (LLM)-based preference evaluation has been widely adopted to compare pairs of model responses. However, a severe bias towards lengthy responses has been observed, raising concerns about the reliability of this evaluation method. In this work, we designed a series of controlled experiments to study the major impacting factors of the metric of LLM-based preference eval… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.01065  [pdf, other

    cs.LG

    Improve ROI with Causal Learning and Conformal Prediction

    Authors: Meng Ai, Zhuo Chen, Jibin Wang, **g Shang, Tao Tao, Zhen Li

    Abstract: In the commercial sphere, such as operations and maintenance, advertising, and marketing recommendations, intelligent decision-making utilizing data mining and neural network technologies is crucial, especially in resource allocation to optimize ROI. This study delves into the Cost-aware Binary Treatment Assignment Problem (C-BTAP) across different industries, with a focus on the state-of-the-art… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ICDE 2024; Link: https://icde2024.github.io/papers.html

  7. arXiv:2407.01031  [pdf, other

    cs.LG cs.CL

    PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

    Authors: Dan Peng, Zhihui Fu, Jun Wang

    Abstract: Recent advancements in large language models (LLMs) have indeed showcased their impressive capabilities. On mobile devices, the wealth of valuable, non-public data generated daily holds great promise for locally fine-tuning personalized LLMs, while maintaining privacy through on-device processing. However, the constraints of mobile device resources pose challenges to direct on-device LLM fine-tuni… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to the ACL 2024 Workshop on Privacy in Natural Language Processing (PrivateNLP)

  8. arXiv:2407.00979  [pdf, other

    cs.CV

    Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval

    Authors: Hanwen Su, Ge Song, Kai Huang, Jiyan Wang, Ming Yang

    Abstract: In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR). The prior methods tackle the problem in a two-modality setting with only category labels or even no textual information involved. However, the growing prevalence of Large-scale pre-trained Language Models (LLMs), which have demonstrated great knowledge learned from web-scale data, can provide us with an opport… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.00896  [pdf, other

    eess.SP cs.AI

    Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

    Authors: Yupeng Li, Gang Li, Zirui Wen, Shuangfeng Han, Shijian Gao, Guangyi Liu, Jiangzhou Wang

    Abstract: The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  11. arXiv:2407.00731  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models Struggle in Token-Level Clinical Named Entity Recognition

    Authors: Qiuhao Lu, Rui Li, Andrew Wen, **lian Wang, Liwei Wang, Hongfang Liu

    Abstract: Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: AMIA 2024 Annual Symposium Proceedings

  12. arXiv:2407.00696  [pdf, other

    cs.LG

    Graph in Graph Neural Network

    Authors: Jiongshu Wang, **g Yang, Jiankang Deng, Hatice Gunes, Siyang Song

    Abstract: Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    MSC Class: 68T05

  13. arXiv:2407.00657  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Real-Time Music Accompaniment Separation with MMDenseNet

    Authors: Chun-Hsiang Wang, Chung-Che Wang, Jun-You Wang, Jyh-Shing Roger Jang, Yen-Hsun Chu

    Abstract: Music source separation aims to separate polyphonic music into different types of sources. Most existing methods focus on enhancing the quality of separated results by using a larger model structure, rendering them unsuitable for deployment on edge devices. Moreover, these methods may produce low-quality output when the input duration is short, making them impractical for real-time applications. T… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  14. arXiv:2407.00634  [pdf, other

    cs.CV cs.LG

    Tarsier: Recipes for Training and Evaluating Large Video Description Models

    Authors: Jiawei Wang, Li** Yuan, Yuchen Zhang

    Abstract: Generating fine-grained video descriptions is a fundamental challenge in video understanding. In this work, we introduce Tarsier, a family of large-scale video-language models designed to generate high-quality video descriptions. Tarsier employs CLIP-ViT to encode frames separately and then uses an LLM to model temporal relationships. Despite its simple architecture, we demonstrate that with a met… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  15. arXiv:2407.00625  [pdf, other

    cs.LO

    Nonlinear Craig Interpolant Generation over Unbounded Domains by Separating Semialgebraic Sets

    Authors: Hao Wu, Jie Wang, Bican Xia, Xiakun Li, Naijun Zhan, Ting Gan

    Abstract: Interpolation-based techniques become popular in recent years, as they can improve the scalability of existing verification techniques due to their inherent modularity and local reasoning capabilities. Synthesizing Craig interpolants is the cornerstone of these techniques. In this paper, we investigate nonlinear Craig interpolant synthesis for two polynomial formulas of the general form, essenti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 21 pages (with appendix); accepted by the 26th International Symposium on Formal Methods (FM2024)

  16. arXiv:2407.00623  [pdf, other

    cs.CV

    Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

    Authors: Yiquan Li, Zhongzhu Chen, Kun **, Jiongxiao Wang, Bo Li, Chaowei Xiao

    Abstract: Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images res… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  17. arXiv:2407.00578  [pdf, other

    cs.RO

    UniQuad: A Unified and Versatile Quadrotor Platform Series for UAV Research and Application

    Authors: Yichen Zhang, Xinyi Chen, Peize Liu, Junzhe Wang, Hetai Zou, Shaojie Shen

    Abstract: As quadrotors take on an increasingly diverse range of roles, researchers often need to develop new hardware platforms tailored for specific tasks, introducing significant engineering overhead. In this article, we introduce the UniQuad series, a unified and versatile quadrotor platform series that offers high flexibility to adapt to a wide range of common tasks, excellent customizability for advan… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Submitted to 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA-X40)

  18. Generative Iris Prior Embedded Transformer for Iris Restoration

    Authors: Yubo Huang, Jia Wang, Peipei Li, Liuyu Xiang, Peigang Li, Zhaofeng He

    Abstract: Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/sawyercharlton/Gformer

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 510-515

  19. arXiv:2407.00187  [pdf, other

    cs.RO cs.CV cs.GR

    SMPLOlympics: Sports Environments for Physically Simulated Humanoids

    Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, **gbo Wang, Ye Yuan, **kun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

    Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Project page: https://smplolympics.github.io/SMPLOlympics

  20. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, **hong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  21. arXiv:2406.20015  [pdf, other

    cs.CL cs.AI

    ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

    Authors: Yuxiang Zhang, **g Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen Wan, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

    Abstract: Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and bre… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  22. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, **long Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  23. arXiv:2406.19736  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

    Authors: Jihao Liu, Xin Huang, **liang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li

    Abstract: This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs). While existing visual instruction datasets often focus on question-answering, they struggle to generalize to broader application scenarios such as creative writing, summarization, or image analysis… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Dataset and models are available at https://github.com/jihaonew/MM-Instruct

  24. arXiv:2406.18916  [pdf, other

    cs.CL cs.AI

    TrustUQA: A Trustful Framework for Unified Structured Data Question Answering

    Authors: Wen Zhang, Long **, Yushan Zhu, Jiaoyan Chen, Zhiwei Huang, Junjie Wang, Yin Hua, Lei Liang, Huajun Chen

    Abstract: Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2406.18846  [pdf, other

    cs.CE

    AFBench: A Large-scale Benchmark for Airfoil Design

    Authors: Jian Liu, Jianyu Wu, Hairun Xie, Guoqing Zhang, **g Wang, Wei Liu, Wanli Ouyang, Junjun Jiang, Xianming Liu, Shixiang Tang, Miao Zhang

    Abstract: Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified ai… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024 Dataset & Benchmark Track

  26. arXiv:2406.18832  [pdf, other

    cs.CL

    OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

    Authors: **guang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, **gyu Wang, Zirui Zhuang, Tingting Yang, Jianxin Liao

    Abstract: Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ)… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  27. arXiv:2406.18579  [pdf, other

    cs.CV cs.IR

    Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

    Authors: Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Jie Wang, Joemon M. Jose

    Abstract: Image-text matching (ITM) is a fundamental problem in computer vision. The key issue lies in jointly learning the visual and textual representation to estimate their similarity accurately. Most existing methods focus on feature enhancement within modality or feature interaction across modalities, which, however, neglects the contextual information of the object representation based on the inter-ob… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 22pages, 5 Figures, 6 tables, the extension of CMSEI in WACV23, and submitted to ACM TIST. arXiv admin note: text overlap with arXiv:2210.08908

  28. arXiv:2406.18462  [pdf, other

    cs.CV cs.GR

    GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

    Authors: Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://taoranyi.com/gaussiandreamerpro/

  29. arXiv:2406.18453  [pdf, other

    cs.CV

    Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

    Authors: Yuan Gao, Ya**g Luo, Junhong Wang, Kui Jia, Gui-Song Xia

    Abstract: Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. This is arguably achieved by incorporating (i) 3D/2.5D shape perception from a single image, (ii) render-and-compare simulation, and (iii) rich semantic cue awareness to furnish (coarse) reference-query correspondence. Existing methods implement (i) by a 3D CAD mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: The codes are available at https://github.com/ethanygao/training-free_generalizable_relative_pose

  30. arXiv:2406.18375  [pdf, other

    cs.CV

    From Majority to Minority: A Diffusion-based Augmentation for Underrepresented Groups in Skin Lesion Analysis

    Authors: Janet Wang, Yunsung Chung, Zhengming Ding, Jihun Hamm

    Abstract: AI-based diagnoses have demonstrated dermatologist-level performance in classifying skin cancer. However, such systems are prone to under-performing when tested on data from minority groups that lack sufficient representation in the training sets. Although data collection and annotation offer the best means for promoting minority groups, these processes are costly and time-consuming. Prior works h… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  31. arXiv:2406.18360  [pdf, other

    cs.CV

    XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis

    Authors: Hao Li, Ming Yuan, Yan Zhang, Chenming Wu, Chen Zhao, Chunyu Song, Haocheng Feng, Errui Ding, Dingwen Zhang, **gdong Wang

    Abstract: Thoroughly testing autonomy systems is crucial in the pursuit of safe autonomous driving vehicles. It necessitates creating safety-critical scenarios that go beyond what can be safely collected from real-world data, as many of these scenarios occur infrequently on public roads. However, the evaluation of most existing NVS methods relies on sporadic sampling of image frames from the training data,… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: project page: https://3d-aigc.github.io/XLD/

  32. arXiv:2406.18327  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

    Authors: Yuxuan Qi, Li Lin, Jiajun Wang, **gya Zhang, Bin Zhang

    Abstract: Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by network… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  33. arXiv:2406.18204  [pdf, other

    cs.NI

    Analysis of Channel Uncertainty in Trusted Wireless Services via Repeated Interactions

    Authors: Bingwen Chen, Xintong Ling, Weihang Cao, Jiaheng Wang, Zhi Ding

    Abstract: The coexistence of heterogeneous sub-networks in 6G poses new security and trust concerns and thus calls for a perimeterless-security model. Blockchain radio access network (B-RAN) provides a trust-building approach via repeated interactions rather than relying on pre-established trust or central authentication. Such a trust-building process naturally supports dynamic trusted services across vario… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  34. arXiv:2406.18198  [pdf, other

    cs.CV

    VDG: Vision-Only Dynamic Gaussian for Driving Simulation

    Authors: Hao Li, **gfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, **gdong Wang, Junwei Han

    Abstract: Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (V… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  35. arXiv:2406.17795  [pdf, other

    cs.CV cs.GR

    RACon: Retrieval-Augmented Simulated Character Locomotion Control

    Authors: Yuxuan Mu, Shihao Zou, Kangning Yin, Zheng Tian, Li Cheng, Weinan Zhang, Jun Wang

    Abstract: In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted in ICME2024 for oral presentation

  36. arXiv:2406.17503  [pdf, other

    cs.LG

    WAVE: Weight Template for Adaptive Initialization of Variable-sized Models

    Authors: Fu Feng, Yucheng Xie, **g Wang, Xin Geng

    Abstract: The expansion of model parameters underscores the significance of pre-trained models; however, the constraints encountered during model deployment necessitate models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address the initialization problem when target models are incompatible with pre-trained models. We tackle this issue from a multitasking p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  37. arXiv:2406.17342  [pdf, other

    cs.CV cs.AI

    Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds

    Authors: Hongliang Zeng, ** Zhang, Fang Li, Jiahua Wang, Tingyu Ye, Pengteng Guo

    Abstract: In the field of 2D image generation modeling and representation learning, Masked Generative Encoder (MAGE) has demonstrated the synergistic potential between generative modeling and representation learning. Inspired by this, we propose Point-MAGE to extend this concept to point cloud data. Specifically, this framework first utilizes a Vector Quantized Variational Autoencoder (VQVAE) to reconstruct… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  38. arXiv:2406.17276  [pdf, other

    cs.CL

    OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

    Authors: Jikai Wang, Yi Su, Juntao Li, Qinrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang

    Abstract: Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  39. arXiv:2406.17262  [pdf, other

    cs.CL

    D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

    Authors: Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

    Abstract: The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  40. arXiv:2406.16925  [pdf, other

    cs.CL cs.AI

    Analyzing Multi-Head Attention on Trojan BERT Models

    Authors: **gwei Wang

    Abstract: This project investigates the behavior of multi-head attention in Transformer models, specifically focusing on the differences between benign and trojan models in the context of sentiment analysis. Trojan attacks cause models to perform normally on clean inputs but exhibit misclassifications when presented with inputs containing predefined triggers. We characterize attention head functions in troj… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  41. arXiv:2406.16560  [pdf

    cs.SI physics.soc-ph

    GNNTAL:A Novel Model for Identifying Critical Nodes in Complex Networks

    Authors: Hao Wang, Ting Luo, Shuang-** Yang, Ming **g, Jian Wang, Na Zhao

    Abstract: Identification of critical nodes is a prominent topic in the study of complex networks. Numerous methods have been proposed, yet most exhibit inherent limitations. Traditional approaches primarily analyze specific structural features of the network; however, node influence is typically the result of a combination of multiple factors. Machine learning-based methods struggle to effectively represent… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  42. arXiv:2406.16272  [pdf, other

    cs.CV cs.AI

    Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

    Authors: Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang Liu

    Abstract: Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 11 pages, 3 figures

  43. arXiv:2406.16026  [pdf

    physics.med-ph cs.LG eess.IV

    CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

    Authors: Jiawen Wang, Pei Cai, Ziyan Wang, Huabin Zhang, Jianpan Huang

    Abstract: Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same ne… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  44. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  45. arXiv:2406.15848  [pdf, other

    cs.CV

    Quality-guided Skin Tone Enhancement for Portrait Photography

    Authors: Shiqi Gao, Huiyu Duan, Xinyue Li, Kang Fu, Yicong Peng, Qihang Xu, Yuanyuan Chang, Jia Wang, Xiongkuo Min, Guangtao Zhai

    Abstract: In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a map** from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image contin… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  46. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  47. arXiv:2406.15716  [pdf, other

    eess.IV cs.CV

    Predicting fluorescent labels in label-free microscopy images with pix2pix and adaptive loss in Light My Cells challenge

    Authors: Han Liu, Hao Li, Jiacheng Wang, Yubo Fan, Zhoubing Xu, Ipek Oguz

    Abstract: Fluorescence labeling is the standard approach to reveal cellular structures and other subcellular constituents for microscopy images. However, this invasive procedure may perturb or even kill the cells and the procedure itself is highly time-consuming and complex. Recently, in silico labeling has emerged as a promising alternative, aiming to use machine learning models to directly predict the flu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  48. arXiv:2406.15695  [pdf, other

    cs.CL

    SS-Bench: A Benchmark for Social Story Generation and Evaluation

    Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Li** **g, Jian Yu

    Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timelin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  49. arXiv:2406.15346  [pdf, other

    cs.LG cs.AI

    Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach

    Authors: Chengzhe Piao, Taiyu Zhu, Yu Wang, Stephanie E Baldeweg, Paul Taylor, Pantelis Georgiou, Jiahao Sun, Jun Wang, Kezhi Li

    Abstract: Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant "cold start" problem in patient care. Utilizing population models to address this challenge is a potential solution, but collecting patient data for training population models… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  50. arXiv:2406.15306  [pdf

    cs.LG cs.CL cs.CV

    Advanced Multimodal Deep Learning Architecture for Image-Text Matching

    Authors: **yin Wang, Hai**g Zhang, Yihao Zhong, Yingbin Liang, Rongwei Ji, Yiru Cang

    Abstract: Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17460 by other authors