Skip to main content

Showing 1–50 of 125 results for author: Zhong, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02043  [pdf, other

    cs.CL

    Concise and Precise Context Compression for Tool-Using Language Models

    Authors: Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che

    Abstract: Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suita… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.00569  [pdf, other

    cs.CV cs.AI cs.CL

    Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

    Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

    Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

  3. arXiv:2406.19827  [pdf, other

    cs.LG

    Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory

    Authors: Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Liqiang Nie

    Abstract: The rapid evolution of deep learning and large language models has led to an exponential growth in the demand for training data, prompting the development of Dataset Distillation methods to address the challenges of managing large datasets. Among these, Matching Training Trajectories (MTT) has been a prominent approach, which replicates the training trajectory of an expert network on real data wit… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 11 pages

  4. arXiv:2406.15796  [pdf, other

    cs.CL

    Rethinking Entity-level Unlearning for Large Language Models

    Authors: Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Bing Qin

    Abstract: Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Work in progress

  5. arXiv:2406.02987  [pdf, other

    cs.CV

    Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

    Authors: Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Shioulin Sam, Karim Bouyarmane, Ismail Tutar, Junzhou Huang

    Abstract: Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  6. arXiv:2405.20314  [pdf, ps, other

    cs.CL

    S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

    Authors: Wei Zhong, Manasa Bharadwaj

    Abstract: Speculative decoding (SD) has attracted a significant amount of research attention due to the substantial speedup it can achieve for LLM inference. However, despite the high speedups they offer, speculative decoding methods often achieve optimal performance on high-end devices or with a substantial GPU memory overhead. Given limited memory and the necessity of quantization, a high-performing model… ▽ More

    Submitted 1 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  7. arXiv:2405.17132  [pdf, other

    cs.LG

    Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

    Authors: Chun**g Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

    Abstract: Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook th… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  8. arXiv:2405.15600  [pdf, ps, other

    stat.ML cs.LG econ.EM stat.ME

    Transfer Learning for Spatial Autoregressive Models

    Authors: Hao Zeng, Wei Zhong, Xingbai Xu

    Abstract: The spatial autoregressive (SAR) model has been widely applied in various empirical economic studies to characterize the spatial dependence among subjects. However, the precision of estimating the SAR model diminishes when the sample size of the target data is limited. In this paper, we propose a new transfer learning framework for the SAR model to borrow the information from similar source data t… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  9. arXiv:2405.11273  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

    Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

  10. arXiv:2404.13646  [pdf, other

    math.NA cs.LG

    Physics-informed Mesh-independent Deep Compositional Operator Network

    Authors: Weiheng Zhong, Hadi Meidani

    Abstract: Solving parametric Partial Differential Equations (PDEs) for a broad range of parameters is a critical challenge in scientific computing. To this end, neural operators, which learn map**s from parameters to solutions, have been successfully used. However, the training of neural operators typically demands large training datasets, the acquisition of which can be prohibitively expensive. To addres… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  11. arXiv:2404.12602  [pdf

    cs.CV cs.LG

    A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks

    Authors: Minzhe Huang, Changwei Nie, Weihong Zhong

    Abstract: In recent years, Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology. With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating. Existing FAS technologies primarily focus on intercepting physically forged faces and lack a robust solution for cross-domain FAS cha… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  12. arXiv:2404.01735  [pdf, other

    cs.IR cs.MM

    CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling

    Authors: Yunshan Ma, Yingzhi He, Wenjun Zhong, Xiang Wang, Roger Zimmermann, Tat-Seng Chua

    Abstract: Product bundling has been a prevailing marketing strategy that is beneficial in the online shop** scenario. Effective product bundling methods depend on high-quality item representations, which need to capture both the individual items' semantics and cross-item relations. However, previous item representation learning methods, either feature fusion or graph learning, suffer from inadequate cross… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: arXiv preprint, 10 pages, 4 figures, 6 tables

    ACM Class: H.3.0

  13. arXiv:2403.11211  [pdf

    cs.CV

    RCdpia: A Renal Carcinoma Digital Pathology Image Annotation dataset based on pathologists

    Authors: Qingrong Sun, Weixiang Zhong, Jie Zhou, Chong Lai, Xiaodong Teng, Maode Lai

    Abstract: The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of t… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures, 1 table

  14. arXiv:2403.08967  [pdf, other

    cs.CV cs.AI

    PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning

    Authors: Qifeng Zhou, Wenliang Zhong, Yuzhi Guo, Michael Xiao, Hehuan Ma, Junzhou Huang

    Abstract: In the field of computational histopathology, both whole slide images (WSIs) and diagnostic captions provide valuable insights for making diagnostic decisions. However, aligning WSIs with diagnostic captions presents a significant challenge. This difficulty arises from two main factors: 1) Gigapixel WSIs are unsuitable for direct input into deep learning models, and the redundancy and correlation… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  15. arXiv:2403.05890  [pdf, other

    cs.LG cs.DC

    Towards Efficient Replay in Federated Incremental Learning

    Authors: Yichen Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Wenliang Zhong, Guannan Zhang

    Abstract: In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain ful… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  16. arXiv:2403.03514  [pdf, other

    cs.CL

    CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

    Authors: Zexuan Qiu, **g**g Li, Shijue Huang, Wanjun Zhong, Irwin King

    Abstract: Develo** Large Language Models (LLMs) with robust long-context capabilities has been the recent research focus, resulting in the emergence of long-context LLMs proficient in Chinese. However, the evaluation of these models remains underdeveloped due to a lack of benchmarks. To address this gap, we present CLongEval, a comprehensive Chinese benchmark for evaluating long-context LLMs. CLongEval is… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 19 pages, 4 figures

  17. Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction

    Authors: Wangjie Zhong, Leimin Tian, Duy Tho Le, Hamid Rezatofighi

    Abstract: Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: accepted to HRI 2024 (LBR track)

  18. arXiv:2402.16288  [pdf, other

    cs.CL cs.AI cs.IR

    PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

    Authors: Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang, Wanjun Zhong, Zezhong Wang, Kam-Fai Wong

    Abstract: Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA dataset that combines semantic and episodic memories, including world knowledge, profiles, social relationships, events, and dialogues. This dataset is collected to i… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  19. arXiv:2402.11905  [pdf, other

    cs.CL

    Learning to Edit: Aligning LLMs with Knowledge Editing

    Authors: Yuxin Jiang, Yufei Wang, Chuhan Wu, Wanjun Zhong, Xingshan Zeng, Jiahui Gao, Liangyou Li, Xin Jiang, Lifeng Shang, Ruiming Tang, Qun Liu, Wei Wang

    Abstract: Knowledge editing techniques, aiming to efficiently modify a minor proportion of knowledge in large language models (LLMs) without negatively impacting performance across other inputs, have garnered widespread attention. However, existing methods predominantly rely on memorizing the updated knowledge, impeding LLMs from effectively combining the new knowledge with their inherent knowledge when ans… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 17 pages, 8 figures, 9 tables. ACL 2024 main camera-ready version

  20. arXiv:2402.02718  [pdf, other

    cs.IR cs.AI

    Denoising Time Cycle Modeling for Recommendation

    Authors: Sicong Xie, Qunwei Li, Weidi Xu, Kaiming Shen, Shaohu Chen, Wenliang Zhong

    Abstract: Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation perform… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  21. arXiv:2401.17167  [pdf, other

    cs.CL

    Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

    Authors: Shijue Huang, Wanjun Zhong, Jianqiao Lu, Qi Zhu, Jiahui Gao, Weiwen Liu, Yutai Hou, Xingshan Zeng, Yasheng Wang, Lifeng Shang, Xin Jiang, Ruifeng Xu, Qun Liu

    Abstract: The recent trend of using Large Language Models (LLMs) as tool agents in real-world applications underscores the necessity for comprehensive evaluations of their capabilities, particularly in complex scenarios involving planning, creating, and using tools. However, existing benchmarks typically focus on simple synthesized queries that do not reflect real-world complexity, thereby offering limited… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted by ACL2024 Findings

  22. arXiv:2401.15670  [pdf, other

    cs.CL cs.AI cs.LG

    YODA: Teacher-Student Progressive Learning for Language Models

    Authors: Jianqiao Lu, Wanjun Zhong, Yufei Wang, Zhijiang Guo, Qi Zhu, Wenyong Huang, Yanlin Wang, Fei Mi, Baojun Wang, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-stude… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: 14 pages, 4 figures, 3 tables

  23. arXiv:2401.06300  [pdf, other

    quant-ph cond-mat.dis-nn cs.AI cs.LG

    Advantage of Quantum Neural Networks as Quantum Information Decoders

    Authors: Weishun Zhong, Oles Shtanko, Ramis Movassagh

    Abstract: A promising strategy to protect quantum information from noise-induced errors is to encode it into the low-energy states of a topological quantum memory device. However, readout errors from such memory under realistic settings is less understood. We study the problem of decoding quantum information encoded in the groundspaces of topological stabilizer Hamiltonians in the presence of generic pertur… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 25 pages, 5 figures

  24. arXiv:2401.05975  [pdf, other

    cs.IR cs.AI

    End-to-end Learnable Clustering for Intent Learning in Recommendation

    Authors: Yue Liu, Shihao Zhu, Jun Xia, Yingwei Ma, Jian Ma, Wenliang Zhong, Xinwang Liu, Guannan Zhang, Kejun Zhang

    Abstract: Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation l… ▽ More

    Submitted 2 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: 24 pages

  25. arXiv:2401.02035  [pdf, ps, other

    cs.IT

    Efficient Information Geometry Approach for Massive MIMO-OFDM Channel Estimation

    Authors: Jiyuan Yang, Yan Chen, Mingrui Fan, An-An Lu, Wen Zhong, Xiqi Gao, Xiaohu You, Xiang-Gen Xia, Dirk Slock

    Abstract: We investigate the channel estimation for massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. We revisit the information geometry approach (IGA) for massive MIMO-OFDM channel estimation. By using the constant magnitude property of the entries of the measurement matrix, we find that the second-order natural parameters of the distributions on all th… ▽ More

    Submitted 3 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

  26. arXiv:2312.17266  [pdf

    eess.IV cs.AI cs.CV cs.RO

    Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery

    Authors: Zhuofu Li, Yonghong Zhang, Chengxia Wang, Shanshan Liu, Xiongkang Song, Xuquan Ji, Shuai Jiang, Woquan Zhong, Lei Hu, Weishi Li

    Abstract: Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  27. arXiv:2312.17109  [pdf, other

    cs.CV cs.AI cs.CL

    MIVC: Multiple Instance Visual Component for Visual-Language Models

    Authors: Wenyi Wu, Qi Li, Wenliang Zhong, Junzhou Huang

    Abstract: Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it's under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between va… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted at WACV 2024

  28. arXiv:2312.11370  [pdf, other

    cs.CL

    G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

    Authors: Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

    Abstract: Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 10 pages

  29. arXiv:2312.01916  [pdf, other

    cs.IR

    PEACE: Prototype lEarning Augmented transferable framework for Cross-domain rEcommendation

    Authors: Chun**g Gan, Bo Huang, Binbin Hu, Jian Ma, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, Guannan Zhang, Wenliang Zhong

    Abstract: To help merchants/customers to provide/access a variety of services through miniapps, online service platforms have occupied a critical position in the effective content delivery, in which how to recommend items in the new domain launched by the service provider for customers has become more urgent. However, the non-negligible gap between the source and diversified target domains poses a considera… ▽ More

    Submitted 17 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted by WSDM 2024

  30. arXiv:2312.01700  [pdf, other

    cs.CL cs.AI

    Data Management For Large Language Models: A Survey

    Authors: Zige Wang, Wanjun Zhong, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Data plays a fundamental role in the training of Large Language Models (LLMs). Effective data management, particularly in the formulation of a well-suited training dataset, holds significance for enhancing model performance and improving training efficiency during pretraining and supervised fine-tuning phases. Despite the considerable importance of data management, the current research community s… ▽ More

    Submitted 25 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Work in progress

  31. arXiv:2312.00553  [pdf

    cs.HC eess.SP

    A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography

    Authors: Wenjuan Zhong, Yuyang Zhang, Peiwen Fu, Wenxuan Xiong, Mingming Zhang

    Abstract: Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully explo… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  32. arXiv:2311.17571  [pdf, other

    cs.CV

    LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching

    Authors: Wenhao Zhong, Jie Jiang

    Abstract: Image matching that finding robust and accurate correspondences across images is a challenging task under extreme conditions. Capturing local and global features simultaneously is an important way to mitigate such an issue but recent transformer-based decoders were still stuck in the issues that CNN-based encoders only extract local features and the transformers lack locality. Inspired by the loca… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 8 pages of main text, 7 pages of supplementary material, 3 pages of references, 6 figures in main text and 8 figures in supplementary material, 5 tables in main text and 2 tables in supplementary material

  33. arXiv:2311.13864  [pdf, other

    cs.LG

    Which Matters Most in Making Fund Investment Decisions? A Multi-granularity Graph Disentangled Learning Framework

    Authors: Chun**g Gan, Binbin Hu, Bo Huang, Tianyu Zhao, Yingru Lin, Wenliang Zhong, Zhiqiang Zhang, Jun Zhou, Chuan Shi

    Abstract: In this paper, we highlight that both conformity and risk preference matter in making fund investment decisions beyond personal interest and seek to jointly characterize these aspects in a disentangled manner. Consequently, we develop a novel M ulti-granularity Graph Disentangled Learning framework named MGDL to effectively perform intelligent matching of fund investment products. Benefiting from… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Accepted by SIGIR 2023

  34. arXiv:2311.07536  [pdf, other

    cs.CL

    A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

    Authors: Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang

    Abstract: The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction w… ▽ More

    Submitted 27 January, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 18 pages, 13pages; working in progress

  35. arXiv:2311.07395  [pdf

    cs.RO cs.AI

    Predicting Continuous Locomotion Modes via Multidimensional Feature Learning from sEMG

    Authors: Peiwen Fu, Wenjuan Zhong, Yuyang Zhang, Wenxuan Xiong, Yuzhou Lin, Yanlong Tai, Lin Meng, Mingming Zhang

    Abstract: Walking-assistive devices require adaptive control methods to ensure smooth transitions between various modes of locomotion. For this purpose, detecting human locomotion modes (e.g., level walking or stair ascent) in advance is crucial for improving the intelligence and transparency of such robotic systems. This study proposes Deep-STF, a unified end-to-end deep learning model designed for integra… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 10 pages,7 figures

  36. arXiv:2311.05232  [pdf, other

    cs.CL

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    Authors: Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting Liu

    Abstract: The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses subs… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Work in progress; 49 pages

  37. arXiv:2310.20410  [pdf, other

    cs.CL

    FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

    Authors: Yuxin Jiang, Yufei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li, Fei Mi, Lifeng Shang, Xin Jiang, Qun Liu, Wei Wang

    Abstract: The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response follows constraints stated in the instruction. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 22 pages, 11 figures, 16 tables. ACL 2024 main camera-ready version

  38. arXiv:2310.09430  [pdf, ps, other

    cs.CL cs.AI

    Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning

    Authors: Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan, Yang Chen, Michael Witbrock, Jiamou Liu

    Abstract: Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets name… ▽ More

    Submitted 30 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: The short version (v3) was accepted for oral presentation at the first LLM@IJCAI 2023 non-archival symposium; the full version is under review

  39. arXiv:2310.01446  [pdf, other

    cs.CL cs.AI

    Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning

    Authors: Jianpeng Zhou, Wanjun Zhong, Yanlin Wang, Jiahai Wang

    Abstract: Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks. In real-world situations, problems often span a spectrum of complexities. Humans inherently adjust their problem-solving approaches based on task complexity. However, most methodologies that leverage LLMs tend to adopt a uniform approach: utilizing consistent models, prompting methods, and degrees o… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: 10 pages

  40. arXiv:2310.00533  [pdf, other

    cs.CL cs.AI cs.LG

    SELF: Self-Evolution with Language Feedback

    Authors: Jianqiao Lu, Wanjun Zhong, Wenyong Huang, Yufei Wang, Qi Zhu, Fei Mi, Baojun Wang, Weichao Wang, Xingshan Zeng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable versatility across various domains. To further advance LLMs, we propose 'SELF' (Self-Evolution with Language Feedback), a novel approach that enables LLMs to self-improve through self-reflection, akin to human learning processes. SELF initiates with a meta-skill learning process that equips the LLMs with capabilities for self-feedback and s… ▽ More

    Submitted 1 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: 20 pages, 4 figures, 11 tables

  41. arXiv:2309.10444  [pdf, other

    cs.AI cs.CL

    Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

    Authors: Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Gaël Gendron, Timothy Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu

    Abstract: Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other stud… ▽ More

    Submitted 10 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: The short version (v4) was accepted as a non-archival workshop paper at AGI@ICLR 2024; the full version is under review

  42. Practical Program Repair via Preference-based Ensemble Strategy

    Authors: Wenkang Zhong, Chuanyi Li, Kui Liu, Tongtong Xu, Tegawendé F. Bissyandé, Jidong Ge, Bin Luo, Vincent Ng

    Abstract: To date, over 40 Automated Program Repair (APR) tools have been designed with varying bug-fixing strategies, which have been demonstrated to have complementary performance in terms of being effective for different bug classes. Intuitively, it should be feasible to improve the overall bug-fixing performance of APR via assembling existing tools. Unfortunately, simply invoking all available APR tools… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: accepted by icse2024 early

  43. arXiv:2308.16437  [pdf, other

    cs.IR cs.LG

    AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

    Authors: Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, **jie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang

    Abstract: Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existin… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  44. arXiv:2308.06926  [pdf, other

    cs.CV

    OpenGCD: Assisting Open World Recognition with Generalized Category Discovery

    Authors: Fulin Gao, Weimin Zhong, Zhixing Cao, Xin Peng, Zhi Li

    Abstract: A desirable open world recognition (OWR) system requires performing three tasks: (1) Open set recognition (OSR), i.e., classifying the known (classes seen during training) and rejecting the unknown (unseen$/$novel classes) online; (2) Grou** and labeling these unknown as novel known classes; (3) Incremental learning (IL), i.e., continual learning these novel classes and retaining the memory of o… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  45. arXiv:2308.04914  [pdf, ps, other

    cs.AI

    Service Reservation and Pricing for Green Metaverses: A Stackelberg Game Approach

    Authors: Xumin Huang, Yuan Wu, Jiawen Kang, Jiangtian Nie, Weifeng Zhong, Dong In Kim, Shengli Xie

    Abstract: Metaverse enables users to communicate, collaborate and socialize with each other through their digital avatars. Due to the spatio-temporal characteristics, co-located users are served well by performing their software components in a collaborative manner such that a Metaverse service provider (MSP) eliminates redundant data transmission and processing, ultimately reducing the total energy consump… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  46. arXiv:2308.02097  [pdf, other

    cs.CV

    Multi-interactive Feature Learning and a Full-time Multi-modality Benchmark for Image Fusion and Segmentation

    Authors: **yuan Liu, Zhu Liu, Guanyao Wu, Long Ma, Risheng Liu, Wei Zhong, Zhongxuan Luo, Xin Fan

    Abstract: Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation. Early efforts focus on boosting the performance for only one task, \emph{e.g.,} fusion or segmentation, making it hard to reach~`Best of Both Worlds'. To overcome this issue, in this paper, we propose a \textbf{M}ulti-\textbf{i}nteractive \textbf{F}eature learning architecture for image fusi… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV 2023. The source code and benchmark are available at https://github.com/**yuanLiu-CV/SegMiF

  47. arXiv:2308.01538  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.AI quant-ph stat.ML

    Non-equilibrium physics: from spin glasses to machine and neural learning

    Authors: Weishun Zhong

    Abstract: Disordered many-body systems exhibit a wide range of emergent phenomena across different scales. These complex behaviors can be utilized for various information processing tasks such as error correction, learning, and optimization. Despite the empirical success of utilizing these systems for intelligent tasks, the underlying principles that govern their emergent intelligent behaviors remain largel… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: PhD dissertation, MIT Physics (2023). This arXiv version contains a updated summary of the thesis in Chapter 1

  48. arXiv:2307.13209  [pdf

    cs.RO cs.AI

    Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG

    Authors: Xueming Fu, Hao Zheng, Luyan Liu, Wenjuan Zhong, Haowen Liu, Wenxuan Xiong, Yuyang Zhang, Yifeng Chen, Dong Wei, Mingjie Dong, Yefeng Zheng, Mingming Zhang

    Abstract: Predicting lower limb motion intent is vital for controlling exoskeleton robots and prosthetic limbs. Surface electromyography (sEMG) attracts increasing attention in recent years as it enables ahead-of-time prediction of motion intentions before actual movement. However, the estimation performance of human joint trajectory remains a challenging problem due to the inter- and intra-subject variatio… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  49. arXiv:2307.12966  [pdf, other

    cs.CL

    Aligning Large Language Models with Human: A Survey

    Authors: Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information. Hence, aligning LLMs w… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: work in progress

  50. arXiv:2306.15255  [pdf, other

    cs.CV cs.CL

    GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

    Authors: Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

    Abstract: In this report, we present our champion solution for Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2023. Essentially, to accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required. Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, 4 tables, the champion solution for Ego4D Natural Language Queries Challenge in CVPR 2023