Skip to main content

Showing 1–50 of 247 results for author: Wan, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19280  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

    Authors: Junying Chen, Ruyi Ouyang, Anningzhe Gao, Shunian Chen, Guiming Hardy Chen, Xidong Wang, Ruifei Zhang, Zhenyang Cai, Ke Ji, Guangjun Yu, Xiang Wan, Benyou Wang

    Abstract: The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of medical vision-text data, stemming from data privacy concerns and high annotation costs. While pioneering approaches utilize PubMed's large-scale, de-i… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.18365  [pdf, other

    cs.CL

    Themis: Towards Flexible and Interpretable NLG Evaluation

    Authors: Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan

    Abstract: The evaluation of natural language generation (NLG) tasks is a significant and longstanding research issue. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the impro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.18326  [pdf, other

    cs.CL cs.AI

    PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models

    Authors: Huixuan Zhang, Yun Lin, Xiaojun Wan

    Abstract: Large language models (LLMs) are known to be trained on vast amounts of data, which may unintentionally or intentionally include data from commonly used benchmarks. This inclusion can lead to cheatingly high scores on model leaderboards, yet result in disappointing performance in real-world applications. To address this benchmark contamination problem, we first propose a set of requirements that p… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.18321  [pdf, other

    cs.CL cs.AI

    MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

    Authors: Meng Fang, Xiangpeng Wan, Fei Lu, Fei Xing, Kai Zou

    Abstract: Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset. The data… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.18193  [pdf, ps, other

    cs.CV cs.AI

    MammothModa: Multi-Modal Large Language Model

    Authors: Qi She, Junwen Pan, Xin Wan, Rui Zhang, Dawei Lu, Kai Huang

    Abstract: In this report, we introduce MammothModa, yet another multi-modal large language model (MLLM) designed to achieve state-of-the-art performance starting from an elementary baseline. We focus on three key design insights: (i) Integrating Visual Capabilities while Maintaining Complex Language Understanding: In addition to the vision encoder, we incorporated the Visual Attention Experts into the LLM t… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Technical report

  6. arXiv:2406.18034  [pdf, other

    cs.CL

    LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them

    Authors: Wenya Xie, Qingying Xiao, Yu Zheng, Xidong Wang, Junying Chen, Ke Ji, Anningzhe Gao, Xiang Wan, Feng Jiang, Benyou Wang

    Abstract: The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning th… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  8. arXiv:2406.15708  [pdf, other

    cs.CL cs.AI cs.LG

    Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

    Authors: Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Sercan O. Arik

    Abstract: Large language models have demonstrated remarkable capabilities, but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar selection, ES). Despite their shared objective, the… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  9. arXiv:2406.15699  [pdf, other

    cs.CV

    Self-Supervised Alignment Learning for Medical Image Segmentation

    Authors: Haofeng Li, Yiming Ouyang, Xiang Wan

    Abstract: Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by (ISBI 2024) 2024 IEEE International Symposium on Biomedical Imaging

  10. arXiv:2406.13219  [pdf, other

    cs.CV cs.CL

    MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

    Authors: Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

    Abstract: Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal kno… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.11370  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments

    Authors: Han Zhou, Xingchen Wan, Yinhong Liu, Nigel Collier, Ivan Vulić, Anna Korhonen

    Abstract: Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise LLM evaluators, which compare two generated texts and determine the preferred one, have been employed in a wide range of applications. However, LLMs exhibit preference biases and worrying sensitivity to prompt designs. In thi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, 1 table (12 pages, 4 figures, 6 tables including references and appendices)

  12. arXiv:2406.09950  [pdf, other

    cs.SD cs.CL eess.AS

    An efficient text augmentation approach for contextualized Mandarin speech recognition

    Authors: Naijun Zheng, Xucheng Wan, Kai Liu, Ziqing Du, Zhou Huan

    Abstract: Although contextualized automatic speech recognition (ASR) systems are commonly used to improve the recognition of uncommon words, their effectiveness is hindered by the inherent limitations of speech-text data availability. To address this challenge, our study proposes to leverage extensive text-only datasets and contextualize pre-trained ASR models using a straightforward text-augmentation (TA)… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: accepted to interspeech2024

  13. arXiv:2406.08842  [pdf, other

    cs.CL

    ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

    Authors: Xu Zhang, Xunjian Yin, Xiaojun Wan

    Abstract: While substantial advancements have been made in develo** large language models (LLMs), achieving control over their behavior can be difficult. Direct preference optimization (DPO) assumes the existence of a latent reward function to evaluate the responses of LLMs. This assumption indicates a strict preference ordering of different responses to the same input. However, there always exist contrad… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.07967  [pdf, other

    cs.CL cs.LG

    Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

    Authors: Jie Ruan, Xiao Pu, Mingqi Gao, Xiaojun Wan, Yuesheng Zhu

    Abstract: Human evaluation is viewed as a reliable evaluation method for NLG which is expensive and time-consuming. To save labor and costs, researchers usually perform human evaluation on a small subset of data sampled from the whole dataset in practice. However, different selection subsets will lead to different rankings of the systems. To give a more correct inter-system ranking and make the gold standar… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: With Appendix

  15. arXiv:2406.07935  [pdf, other

    cs.CL cs.LG

    Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation

    Authors: Jie Ruan, Wenqing Wang, Xiaojun Wan

    Abstract: Human evaluation serves as the gold standard for assessing the quality of Natural Language Generation (NLG) systems. Nevertheless, the evaluation guideline, as a pivotal element ensuring reliable and reproducible human assessment, has received limited attention.Our investigation revealed that only 29.84% of recent papers involving human evaluation at top conferences release their evaluation guidel… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  16. arXiv:2406.00606  [pdf, other

    cs.CL

    LLMs Could Autonomously Learn Without External Supervision

    Authors: Ke Ji, Junying Chen, Anningzhe Gao, Wenya Xie, Xiang Wan, Benyou Wang

    Abstract: In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper presents a transformative approach: Autonomous Learning for LLMs, a self-sufficient learning paradigm that frees models from the constraints of human supervisi… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: 20 pages, 8 figures

  17. arXiv:2405.21013  [pdf, other

    cs.CV

    StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

    Authors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, **gdong Wang

    Abstract: Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Langu… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  18. arXiv:2405.19765  [pdf, other

    cs.CV cs.AI

    Towards Unified Multi-granularity Text Detection with Interactive Attention

    Authors: Xingyu Wan, Chengquan Zhang, Pengyuan Lyu, Sen Fan, Zihan Ni, Kun Yao, Errui Ding, **gdong Wang

    Abstract: Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  19. arXiv:2405.15362  [pdf, other

    cs.LG cs.CL cs.DC

    Pipeline Parallelism with Controllable Memory

    Authors: Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin

    Abstract: Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules,… ▽ More

    Submitted 10 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  20. arXiv:2405.15119  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization of Functions over Node Subsets in Graphs

    Authors: Huidong Liang, Xingchen Wan, Xiaowen Dong

    Abstract: We address the problem of optimizing over functions defined on node subsets in a graph. The optimization of such functions is often a non-trivial task given their combinatorial, black-box and expensive-to-evaluate nature. Although various algorithms have been introduced in the literature, most are either task-specific or computationally inefficient and only utilize information about the graph stru… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 26 pages with 20 figures

  21. arXiv:2405.14524  [pdf, other

    cs.NI

    QoE-Aware and Secure UAV-Aided Rate-Splitting Multiple Access Based Communications

    Authors: Abuzar B. M. Adam, Xiaoyu Wan, Mohammed Saleh Ali Muthanna

    Abstract: In this work, we address the issue of quality of experience (QoE) in unmanned aerial vehicle (UAV) aided multiuser rate-splitting multiple access (RSMA) networks under secrecy constraints. The problem is formulated as maximization of sum mean opinion scores (MOSs) of the users. The problem is decomposed into two subproblems, beamforming and rate allocation and UAV trajectory subproblem. For, beamf… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  22. arXiv:2405.13517  [pdf, other

    cs.CR cs.CL

    WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

    Authors: Baizhou Huang, Xiaojun Wan

    Abstract: With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensurin… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 pages

  23. arXiv:2405.07429  [pdf, other

    cs.RO

    JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation

    Authors: Xubo Luo, Xue Wan, Yixing Gao, Yaolin Tian, Wei Zhang, Leizheng Shu

    Abstract: Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 8 pages

  24. arXiv:2405.04294  [pdf, other

    cs.AI

    Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework

    Authors: Xiangpeng Wan, Haicheng Deng, Kai Zou, Shiqi Xu

    Abstract: Structured finance, which involves restructuring diverse assets into securities like MBS, ABS, and CDOs, enhances capital market efficiency but presents significant due diligence challenges. This study explores the integration of artificial intelligence (AI) with traditional asset review processes to improve efficiency and accuracy in structured finance. Using both open-sourced and close-sourced l… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  25. arXiv:2405.03152  [pdf, other

    eess.AS cs.SD

    MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

    Authors: Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. H… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  26. arXiv:2405.00542  [pdf, other

    eess.IV cs.CV

    UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement

    Authors: Ruiquan Ge, Zhaojie Fang, Pengxue Wei, Zhanghao Chen, Hongyang Jiang, Ahmed Elazab, Wangting Li, Xiang Wan, Shaochong Zhang, Changmiao Wang

    Abstract: Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  27. arXiv:2404.18532  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    MileBench: Benchmarking MLLMs in Long Context

    Authors: Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang

    Abstract: Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task… ▽ More

    Submitted 15 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 31 pages, 13 figures, 14 tables; We add results of GPT-4o in this version

  28. arXiv:2404.11818  [pdf, other

    cs.IR

    Automated Similarity Metric Generation for Recommendation

    Authors: Liang Qu, Yun Lin, Wei Yuan, Xiaojun Wan, Yuhui Shi, Hongzhi Yin

    Abstract: The embedding-based architecture has become the dominant approach in modern recommender systems, map** users and items into a compact vector space. It then employs predefined similarity metrics, such as the inner product, to calculate similarity scores between user and item embeddings, thereby guiding the recommendation of items that align closely with a user's preferences. Given the critical ro… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  29. arXiv:2404.06795  [pdf, other

    cs.LG

    Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

    Authors: Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan

    Abstract: Real-world datasets usually are class-imbalanced and corrupted by label noise. To solve the joint issue of long-tailed distribution and label noise, most previous works usually aim to design a noise detector to distinguish the noisy and clean samples. Despite their effectiveness, they may be limited in handling the joint issue effectively in a unified way. In this work, we develop a novel pseudo l… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  30. arXiv:2404.05466  [pdf, other

    cs.CV

    Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

    Authors: He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie

    Abstract: Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video. Current mainstream lip-reading approaches only use a single visual encoder to model input videos of a single scale. In this paper, we propose to enhance lip-reading by incorporating multi-scale video data and multi-encoder. Specifically, we first propose a novel multi-s… ▽ More

    Submitted 30 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 6 pages, 3 figures, Accepted at ICMEW 2024

  31. arXiv:2403.03640  [pdf, other

    cs.CL cs.AI

    Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

    Authors: Xidong Wang, Nuo Chen, Junyin Chen, Yan Hu, Yidong Wang, Xiangbo Wu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang

    Abstract: Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: Preprint

  32. arXiv:2403.02962  [pdf, other

    cs.AI

    WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction

    Authors: Zheng Li, Xiang Chen, Xiaojun Wan

    Abstract: Tabular data, as a crucial form of data representation, exists in diverse formats on the Web. When confronted with complex and irregular tables, manual modification becomes a laborious task. This paper investigates the performance of Large Language Models (LLMs) in the context of table editing tasks. Existing research mainly focuses on regular-shaped tables, wherein instructions are used to genera… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  33. arXiv:2403.01798  [pdf, other

    cs.NI cs.LG

    Towards Fair and Efficient Learning-based Congestion Control

    Authors: Xudong Liao, Han Tian, Chaoliang Zeng, Xinchen Wan, Kai Chen

    Abstract: Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuiti… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  34. arXiv:2403.01373  [pdf, other

    cs.CL

    Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models

    Authors: Huixuan Zhang, Junzhe Zhang, Xiaojun Wan

    Abstract: Large-scale vision-language models have demonstrated impressive skill in handling tasks that involve both areas. Nevertheless, these models frequently experience significant issues with generating inaccurate information, which is hallucination. In this study, we concentrate on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain o… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: 10 pages

  35. arXiv:2403.00292  [pdf, other

    cs.CL

    DPP-Based Adversarial Prompt Searching for Lanugage Models

    Authors: Xu Zhang, Xiaojun Wan

    Abstract: Language models risk generating mindless and offensive content, which hinders their safe deployment. Therefore, it is crucial to discover and modify potential toxic outputs of pre-trained language models before deployment. In this work, we elicit toxic content by automatically searching for a prompt that directs pre-trained language models towards the generation of a specific target output. The pr… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  36. arXiv:2402.19404  [pdf, other

    cs.CV cs.CL

    EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning

    Authors: Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Xiaojun Wan

    Abstract: News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Though Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in addressing various vision-language tasks, our research finds that current MLLMs still bear limitations in handling entity information on news image captioning task.… ▽ More

    Submitted 6 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  37. arXiv:2402.15116  [pdf, other

    cs.CV cs.AI cs.CL

    Large Multimodal Agents: A Survey

    Authors: Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li

    Abstract: Large language models (LLMs) have achieved superior performance in powering text-based AI agents, endowing them with decision-making and reasoning abilities akin to humans. Concurrently, there is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain. This extension enables AI agents to interpret and respond to diverse multimodal user queries, thereb… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 15 pages, 4 figures

  38. arXiv:2402.12946  [pdf, other

    cs.CV

    Cell Graph Transformer for Nuclei Classification

    Authors: Wei Lou, Guanbin Li, Xiang Wan, Haofeng Li

    Abstract: Nuclei classification is a critical step in computer-aided diagnosis with histopathology images. In the past, various methods have employed graph neural networks (GNN) to analyze cell graphs that model inter-cell relationships by considering nuclei as vertices. However, they are limited by the GNN mechanism that only passes messages among local nodes via fixed edges. To address the issue, we devel… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: AAAI 2024, Code and models are available at https://github.com/lhaof/CGT

  39. arXiv:2402.12938  [pdf, other

    cs.CV

    UniCell: Universal Cell Nucleus Classification via Prompt Learning

    Authors: Junjia Huang, Haofeng Li, Xiang Wan, Guanbin Li

    Abstract: The recognition of multi-class cell nuclei can significantly facilitate the process of histopathological diagnosis. Numerous pathological datasets are currently available, but their annotations are inconsistent. Most existing methods require individual training on each dataset to deduce the relevant labels and lack the use of common knowledge across datasets, consequently restricting the quality o… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: AAAI 2024, Code and models are available at https://github.com/lhaof/UniCell

  40. arXiv:2402.12055  [pdf, other

    cs.CL

    Are LLM-based Evaluators Confusing NLG Quality Criteria?

    Authors: Xinyu Hu, Mingqi Gao, Sen Hu, Yang Zhang, Yicheng Chen, Teng Xu, Xiaojun Wan

    Abstract: Some prior work has shown that LLMs perform well in NLG evaluation for different tasks. However, we discover that LLMs seem to confuse different evaluation criteria, which reduces their reliability. For further verification, we first consider avoiding issues of inconsistent conceptualization and vague expression in existing NLG quality criteria themselves. So we summarize a clear hierarchical clas… ▽ More

    Submitted 28 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024

  41. arXiv:2402.11684  [pdf, other

    cs.CL cs.AI

    ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

    Authors: Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang

    Abstract: Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and deployment. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data.… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 22 pages

  42. arXiv:2402.11493  [pdf, other

    cs.CL

    Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation

    Authors: Xunjian Yin, Xu Zhang, Jie Ruan, Xiaojun Wan

    Abstract: In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or… ▽ More

    Submitted 29 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted as the main paper of ACL 2024

  43. arXiv:2402.11307  [pdf, other

    cs.CV

    ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network

    Authors: Xinlei Yu, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Taosheng Xu, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is the deadliest subtype of stroke, necessitating timely and accurate prognostic evaluation to reduce mortality and disability. However, the multi-factorial nature and complexity of ICH make methods based solely on computed tomography (CT) image features inadequate. Despite the capacity of cross-modal networks to fuse additional information, the effective combination… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 6 pages,4 figures, 4 tables, accepted by ISBI

  44. arXiv:2402.03526  [pdf, other

    cs.CV

    nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model

    Authors: Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, Haofeng Li

    Abstract: In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimens… ▽ More

    Submitted 10 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/lhaof/nnMamba

  45. arXiv:2402.02314  [pdf, other

    cs.LG cs.AI cs.CL

    Selecting Large Language Model to Fine-tune via Rectified Scaling Law

    Authors: Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang

    Abstract: The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options. Given constrained resources, fine-tuning all models and making selections afterward is unrealistic. In this work, we formulate this resource-constrained selection task into predicting fine-tuning performance and illustrate its natural connection with Sc… ▽ More

    Submitted 28 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Journal ref: ICML 2024

  46. arXiv:2402.01383  [pdf, other

    cs.CL

    LLM-based NLG Evaluation: Current Status and Challenges

    Authors: Mingqi Gao, Xinyu Hu, Jie Ruan, Xiao Pu, Xiaojun Wan

    Abstract: Evaluating natural language generation (NLG) is a vital but challenging problem in artificial intelligence. Traditional evaluation metrics mainly capturing content (e.g. n-gram) overlap between system outputs and references are far from satisfactory, and large language models (LLMs) such as ChatGPT have demonstrated great potential in NLG evaluation in recent years. Various automatic evaluation me… ▽ More

    Submitted 26 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  47. arXiv:2401.10241  [pdf, other

    cs.DC cs.AI cs.LG

    Zero Bubble Pipeline Parallelism

    Authors: Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin

    Abstract: Pipeline parallelism is one of the key components for large-scale distributed training, yet its efficiency suffers from pipeline bubbles which were deemed inevitable. In this work, we introduce a scheduling strategy that, to our knowledge, is the first to successfully achieve zero pipeline bubbles under synchronous training semantics. The key idea behind this improvement is to split the backward c… ▽ More

    Submitted 30 November, 2023; originally announced January 2024.

  48. arXiv:2401.04961  [pdf, other

    cs.CV

    ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic Polyp Detection

    Authors: Yuncheng Jiang, Zixun Zhang, Yiwen Hu, Guanbin Li, Xiang Wan, Song Wu, Shuguang Cui, Silin Huang, Zhen Li

    Abstract: Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: codes available at https://github.com/yuncheng97/ECC-PolypDet/tree/main

  49. arXiv:2401.03114  [pdf, other

    cs.LG

    GLISP: A Scalable GNN Learning System by Exploiting Inherent Structural Properties of Graphs

    Authors: Zhongshu Zhu, Bin **g, Xiaopei Wan, Zhizhen Liu, Lei Liang, Jun zhou

    Abstract: As a powerful tool for modeling graph data, Graph Neural Networks (GNNs) have received increasing attention in both academia and industry. Nevertheless, it is notoriously difficult to deploy GNNs on industrial scale graphs, due to their huge data size and complex topological structures. In this paper, we propose GLISP, a sampling based GNN learning system for industrial scale graphs. By exploiting… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  50. arXiv:2312.05497  [pdf, other

    cs.CL

    History Matters: Temporal Knowledge Editing in Large Language Model

    Authors: Xunjian Yin, ** Jiang, Liming Yang, Xiaojun Wan

    Abstract: The imperative task of revising or updating the knowledge stored within large language models arises from two distinct sources: intrinsic errors inherent in the model which should be corrected and outdated knowledge due to external shifts in the real world which should be updated. Prevailing efforts in model editing conflate these two distinct categories of edits arising from distinct reasons and… ▽ More

    Submitted 14 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: AAAI 2024. 9 pages, 3 figures