Skip to main content

Showing 1–50 of 207 results for author: Zhu, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00341  [pdf, other

    cs.CL

    Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis

    Authors: Haiyun Li, Qihuang Zhong, Ke Zhu, Juhua Liu, Bo Du, Dacheng Tao

    Abstract: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. However, current DA methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diver… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Work in process

  2. arXiv:2406.19859  [pdf, other

    cs.AI cs.HC cs.MM

    MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

    Authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, **gdong Sun, Qi He, Wangmeng Xiang, Hanyuan Chen, **-Peng Lan, Xianhui Lin, Kang Zhu, Bin Luo, Yifeng Geng, Xuansong Xie, Alexander G. Hauptmann

    Abstract: MetaDesigner revolutionizes artistic typography synthesis by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. At the core of this framework lies a multi-agent system comprising the Pipeline, Glyph, and Texture agents, which collectively enable the creation of customized WordArt, ranging from semantic enhancements to the imposition… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 18 pages, 16 figures, Project: https://modelscope.cn/studios/WordArt/WordArt

  3. arXiv:2406.19820  [pdf, other

    cs.CL cs.AI

    BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

    Authors: Zheng Chu, **gchang Chen, Qianglong Chen, Haotian Wang, Kun Zhu, Xiyuan Du, Weijiang Yu, Ming Liu, Bing Qin

    Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-sourc… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  4. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.12708  [pdf, other

    cs.CL

    AgentReview: Exploring Peer Review Dynamics with LLM Agents

    Authors: Yiqiao **, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, **dong Wang

    Abstract: Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 22 pages, 10 figures

  6. arXiv:2406.10774  [pdf, other

    cs.CL cs.LG

    Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

    Authors: Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han

    Abstract: As the demand for long-context large language models (LLMs) increases, models with context windows of up to 128K or 1M tokens are becoming increasingly prevalent. However, long-context LLM inference is challenging since the inference speed decreases significantly as the sequence length grows. This slowdown is primarily caused by loading a large KV cache during self-attention. Previous works have s… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  7. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  8. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  9. arXiv:2406.05250  [pdf, other

    cs.AI cs.AR cs.LG

    LLM-Enhanced Bayesian Optimization for Efficient Analog Layout Constraint Generation

    Authors: Guo** Chen, Keren Zhu, Seunggeun Kim, Hanqing Zhu, Yao Lai, Bei Yu, David Z. Pan

    Abstract: Analog layout synthesis faces significant challenges due to its dependence on manual processes, considerable time requirements, and performance instability. Current Bayesian Optimization (BO)-based techniques for analog layout synthesis, despite their potential for automation, suffer from slow convergence and extensive data needs, limiting their practical application. This paper presents the \text… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.04747  [pdf, ps, other

    cs.DC

    Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning

    Authors: Houming Qiu, Kun Zhu, Nguyen Cong Luong, Dusit Niyato

    Abstract: In a large-scale distributed machine learning system, coded computing has attracted wide-spread attention since it can effectively alleviate the impact of stragglers. However, several emerging problems greatly limit the performance of coded distributed systems. Firstly, an existence of colluding workers who collude results with each other leads to serious privacy leakage issues. Secondly, there ar… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  11. arXiv:2406.04197  [pdf, other

    cs.CL

    DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning

    Authors: Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li

    Abstract: The advancement of large language models (LLMs) relies on evaluation using public benchmarks, but data contamination can lead to overestimated performance. Previous researches focus on detecting contamination by determining whether the model has seen the exact same data during training. In this work, we argue that even training on data similar to benchmark data inflates performance on in-distribut… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  12. arXiv:2406.02787  [pdf, other

    cs.CL cs.AI cs.LG

    Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

    Authors: Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Shuhang Lin, Mingyu **, Haochen Xue, Zelong Li, **Dong Wang, Yongfeng Zhang

    Abstract: This study intends to systematically disentangle pure logic reasoning and text understanding by investigating the contrast across abstract and contextualized logical problems from a comprehensive set of domains. We explore whether LLMs demonstrate genuine reasoning capabilities across various domains when the underlying logical structure remains constant. We focus on two main questions (1) Can abs… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  13. arXiv:2406.01549  [pdf, other

    cs.CL cs.AI

    An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

    Authors: Kun Zhu, Xiaocheng Feng, Xiyuan Du, Yuxuan Gu, Weijiang Yu, Haotian Wang, Qianglong Chen, Zheng Chu, **gchang Chen, Bing Qin

    Abstract: Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottlenec… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ACL24 Main

  14. arXiv:2405.18842  [pdf, other

    cs.CV

    Descriptive Image Quality Assessment in the Wild

    Authors: Zhiyuan You, **** Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

    Abstract: With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-wor… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  15. arXiv:2405.11794  [pdf, other

    cs.CV

    ViViD: Video Virtual Try-on using Diffusion Models

    Authors: Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha

    Abstract: Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  16. arXiv:2405.10313  [pdf, other

    cs.AI cs.CL cs.CY cs.LG

    How Far Are We From AGI

    Authors: Tao Feng, Chuanyang **, **gyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You

    Abstract: The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiv… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  17. arXiv:2405.08418  [pdf, other

    cs.RO

    The Stiffness of 3-PRS PM Across Parasitic and Orientational Workspace

    Authors: Hassen Nigatu, Li Jihao, Keqi Zhu, Junhan Zhang, Haotian Guo, Guodong Lu, Doik Kim

    Abstract: This study investigates the stiffness characteristics of the Sprint Z3 head, also known as 3-PRS Parallel Kinematics Machines, which are among the most extensively researched and viably successful manipulators for precision machining applications. Despite the wealth of research on these robotic manipulators, no previous work has demonstrated their stiffness performance within the parasitic motion… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.18575

  18. arXiv:2404.11613  [pdf, other

    cs.CV

    InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

    Authors: Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao

    Abstract: 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant proper… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Project page: https://johanan528.github.io/Infusion

  19. arXiv:2404.10501  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Self-Supervised Visual Preference Alignment

    Authors: Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang

    Abstract: This paper makes the first attempt towards unsupervised preference alignment in Vision-Language Models (VLMs). We generate chosen and rejected responses with regard to the original and augmented image pairs, and conduct preference alignment with direct preference optimization. It is based on a core idea: properly designed augmentation to the image input will induce VLM to generate false but hard n… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  20. arXiv:2404.10260  [pdf, other

    q-bio.BM cs.AI

    HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

    Authors: Xiaomin Fang, Jie Gao, **g Hu, Lihang Liu, Yang Xue, Xiaonan Zhang, Kunrui Zhu

    Abstract: While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex predictio… ▽ More

    Submitted 17 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  21. arXiv:2403.17395  [pdf, other

    cs.AI

    An Open-source End-to-End Logic Optimization Framework for Large-scale Boolean Network with Reinforcement Learning

    Authors: Zhen Li, Kaixiang Zhu, Xuegong Zhou, Lingli Wang

    Abstract: We propose an open-source end-to-end logic optimization framework for large-scale boolean network with reinforcement learning.

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 5 pages, 4 figures, 1 table

  22. arXiv:2403.15075  [pdf, other

    cs.IR cs.AI

    Bilateral Unsymmetrical Graph Contrastive Learning for Recommendation

    Authors: Jiaheng Yu, **g Li, Yue He, Kai Zhu, Shuyi Zhang, Wen Hu

    Abstract: Recent methods utilize graph contrastive Learning within graph-structured user-item interaction data for collaborative filtering and have demonstrated their efficacy in recommendation tasks. However, they ignore that the difference relation density of nodes between the user- and item-side causes the adaptability of graphs on bilateral nodes to be different after multi-hop graph interaction calcula… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  23. arXiv:2403.12471  [pdf, other

    cs.RO

    Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

    Authors: Keqi Zhu, Haotian Guo, Wei Yu, Hassen Nigatu, Tong Li, Huixu Dong

    Abstract: Recent research on mobile robots has focused on increasing their adaptability to unpredictable and unstructured environments using soft materials and structures. However, the determination of key design parameters and control over these compliant robots are predominantly iterated through experiments, lacking a solid theoretical foundation. To improve their efficiency, this paper aims to provide ma… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages

  24. arXiv:2403.09194  [pdf, other

    cs.CV

    Intention-driven Ego-to-Exo Video Generation

    Authors: Hongchen Luo, Kai Zhu, Wei Zhai, Yang Cao

    Abstract: Ego-to-exo video generation refers to generating the corresponding exocentric video according to the egocentric video, providing valuable applications in AR/VR and embodied AI. Benefiting from advancements in diffusion model techniques, notable progress has been achieved in video generation. However, existing methods build upon the spatiotemporal consistency assumptions between adjacent frames, wh… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  25. arXiv:2403.07257  [pdf, other

    cs.AR cs.ET

    The Dawn of AI-Native EDA: Opportunities and Challenges of Large Circuit Models

    Authors: Lei Chen, Yiqi Chen, Zhufei Chu, Wenji Fang, Tsung-Yi Ho, Ru Huang, Yu Huang, Sadaf Khan, Min Li, Xingquan Li, Yu Li, Yun Liang, **wei Liu, Yi Liu, Yibo Lin, Guojie Luo, Zhengyuan Shi, Guangyu Sun, Dimitrios Tsaras, Runsheng Wang, Ziyi Wang, Xinming Wei, Zhiyao Xie, Qiang Xu, Chenhao Xue , et al. (14 additional authors not shown)

    Abstract: Within the Electronic Design Automation (EDA) domain, AI-driven solutions have emerged as formidable tools, yet they typically augment rather than redefine existing methodologies. These solutions often repurpose deep learning models from other domains, such as vision, text, and graph analytics, applying them to circuit design without tailoring to the unique complexities of electronic circuits. Suc… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: The authors are ordered alphabetically. Contact: qxu@cse[dot]cuhk[dot]edu[dot]hk, gluo@pku[dot]edu[dot]cn, yuan.mingxuan@huawei[dot]com

  26. arXiv:2403.05182  [pdf, other

    cs.HC cs.GR

    ViboPneumo: A Vibratory-Pneumatic Finger-Worn Haptic Device for Altering Perceived Texture Roughness in Mixed Reality

    Authors: Shaoyu Cai, Zhenlin Chen, Haichen Gao, Ya Huang, Qi Zhang, Xinge Yu, Kening Zhu

    Abstract: Extensive research has been done in haptic feedback for texture simulation in virtual reality (VR). However, it is challenging to modify the perceived tactile texture of existing physical objects which usually serve as anchors for virtual objects in mixed reality (MR). In this paper, we present ViboPneumo, a finger-worn haptic device that uses vibratory-pneumatic feedback to modulate (i.e., increa… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures

  27. arXiv:2403.05170  [pdf, other

    cs.CV

    DiffuLT: How to Make Diffusion Model Useful for Long-tail Recognition

    Authors: Jie Shao, Ke Zhu, Hanxiao Zhang, Jianxin Wu

    Abstract: This paper proposes a new pipeline for long-tail (LT) recognition. Instead of re-weighting or re-sampling, we utilize the long-tailed dataset itself to generate a balanced proxy that can be optimized through cross-entropy (CE). Specifically, a randomly initialized diffusion model, trained exclusively on the long-tailed dataset, is employed to synthesize new samples for underrepresented classes. Th… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  28. arXiv:2403.03172  [pdf, other

    cs.AI cs.LG

    Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination

    Authors: Liangzhou Wang, Kaiwen Zhu, Fengming Zhu, Xinghu Yao, Shujie Zhang, Deheng Ye, Haobo Fu, Qiang Fu, Wei Yang

    Abstract: Reaching consensus is key to multi-agent coordination. To accomplish a cooperative task, agents need to coherently select optimal joint actions to maximize the team reward. However, current cooperative multi-agent reinforcement learning (MARL) methods usually do not explicitly take consensus into consideration, which may cause miscoordination problem. In this paper, we propose a model-based consen… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  29. arXiv:2403.02708  [pdf, other

    cs.SI

    Backfire Effect Reveals Early Controversy in Online Media

    Authors: Songtao Peng, Chenbo Fua, Han Han, Ye Wu, Kailun Zhu, Qi Xuan, Yong Min

    Abstract: The rapid development of online media has significantly facilitated the public's information consumption, knowledge acquisition, and opinion exchange. However, it has also led to more violent conflicts in online discussions. Therefore, controversy detection becomes important for computational and social sciences. Previous research on detection methods has primarily focused on larger datasets and m… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 17 pages, 6 figures

  30. arXiv:2403.01777  [pdf, other

    cs.CL cs.CV

    NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

    Authors: Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu **, Lingyao Li, Haoyang Ling, **kui Chi, **dong Wang, Xin Ma, Yongfeng Zhang

    Abstract: Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is an important area of research. In this study, we introduce a dynamic benchmark, NPHardEval4V, aimed at addressing the existing gaps in evaluating the pure reasoning abilities of MLLMs. Our benchmark aims to provide a venue to disentangle the effect of various factors such as image recognition and instruction fo… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures, 2 tables

  31. arXiv:2402.18936  [pdf, ps, other

    cs.NI eess.SP

    Energy-Efficient UAV Swarm Assisted MEC with Dynamic Clustering and Scheduling

    Authors: Jialiuyuan Li, Jiayuan Chen, Changyan Yi, Tong Zhang, Kun Zhu, Jun Cai

    Abstract: In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically clust… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  32. arXiv:2402.18409  [pdf, other

    cs.AI cs.CL cs.CV

    A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models

    Authors: Xiujie Song, Mengyue Wu, Kenny Q. Zhu, Chunhao Zhang, Yanyi Chen

    Abstract: Large Vision-Language Models (LVLMs), despite their recent success, are hardly comprehensively tested for their cognitive abilities. Inspired by the prevalent use of the "Cookie Theft" task in human cognition test, we propose a novel evaluation benchmark to evaluate high-level cognitive ability of LVLMs using images with rich semantics. It defines eight reasoning capabilities and consists of an im… ▽ More

    Submitted 14 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  33. arXiv:2402.18062  [pdf, other

    cs.RO cs.AI

    Generative AI for Unmanned Vehicle Swarms: Challenges, Applications and Opportunities

    Authors: Guangyuan Liu, Nguyen Van Huynh, Hongyang Du, Dinh Thai Hoang, Dusit Niyato, Kun Zhu, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Dong In Kim

    Abstract: With recent advances in artificial intelligence (AI) and robotics, unmanned vehicle swarms have received great attention from both academia and industry due to their potential to provide services that are difficult and dangerous to perform by humans. However, learning and coordinating movements and actions for a large number of unmanned vehicles in complex and dynamic environments introduce signif… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 23 pages

  34. arXiv:2402.15985  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Phonetic and Lexical Discovery of a Canine Language using HuBERT

    Authors: Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

    Abstract: This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identifica… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  35. arXiv:2402.14865  [pdf, other

    cs.CL cs.AI cs.LG

    Dynamic Evaluation of Large Language Models by Meta Probing Agents

    Authors: Kaijie Zhu, **dong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie

    Abstract: Evaluation of large language models (LLMs) has raised great concerns in the community due to the issue of data contamination. Existing work designed evaluation protocols using well-defined algorithms for specific tasks, which cannot be easily extended to diverse scenarios. Moreover, current evaluation benchmarks can only provide the overall benchmark results and cannot support a fine-grained and m… ▽ More

    Submitted 7 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: International Conference on Machine Learning (ICML) 2024

  36. arXiv:2402.07033  [pdf, other

    cs.LG cs.AI cs.DC cs.OS

    Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

    Authors: Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci

    Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks. However, running them on resource-constrained settings, where GPU memory resources are not abundant, is challenging due to huge model sizes. Existing systems that offload model weights to CPU memory suffer from the significant overhead of frequently moving data between CP… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  37. arXiv:2402.06262  [pdf, other

    cs.CL cs.AI

    On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference

    Authors: Siyu Ren, Kenny Q. Zhu

    Abstract: Despite the recent success associated with Large Language Models (LLMs), they are notably cost-prohibitive to deploy in resource-constrained environments due to their excessive memory and computational demands. In addition to model parameters, the key-value cache is also stored in GPU memory, growing linearly with batch size and sequence length. As a remedy, recent works have proposed various evic… ▽ More

    Submitted 17 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  38. arXiv:2402.04247  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

    Authors: Xiangru Tang, Qiao **, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

    Abstract: Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents, called scientific LLM agents, also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notab… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  39. arXiv:2402.04009  [pdf, other

    cs.CV cs.AI

    Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning

    Authors: Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu

    Abstract: In finetuning a large pretrained model to downstream tasks, parameter-efficient fine-tuning (PEFT) methods can effectively finetune pretrained models with few trainable parameters, but suffer from high GPU memory consumption and slow training speed. Because learnable parameters from these methods are entangled with the pretrained model, gradients related to the frozen pretrained model's parameters… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  40. arXiv:2402.02319  [pdf

    cs.RO

    Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications

    Authors: Kefan Zhu, Bibhu Sharma, Phuoc Thien Phan, James Davies, Mai Thanh Thai, Trung Thien Hoang, Chi Cong Nguyen, Adrienne Ji, Emanuele Nicotra, Nigel H. Lovell, Thanh Nho Do

    Abstract: Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: 6 pages, 7 figures

  41. arXiv:2401.15885  [pdf, other

    cs.CV

    Rectify the Regression Bias in Long-Tailed Object Detection

    Authors: Ke Zhu, Minghao Fu, Jie Shao, Tianyu Liu, Jianxin Wu

    Abstract: Long-tailed object detection faces great challenges because of its extremely imbalanced class distribution. Recent methods mainly focus on the classification bias and its loss function design, while ignoring the subtle influence of the regression branch. This paper shows that the regression bias exists and does adversely and seriously impact the detection accuracy. While existing methods fail to h… ▽ More

    Submitted 31 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  42. arXiv:2401.13478  [pdf, other

    cs.IR cs.CL cs.CV cs.MM

    SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

    Authors: Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kai**g Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

    Abstract: Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in… ▽ More

    Submitted 11 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: camera-ready version for ACL 2024 Findings

  43. arXiv:2401.11944  [pdf, other

    cs.CL cs.AI cs.CV

    CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

    Authors: Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu

    Abstract: As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to e… ▽ More

    Submitted 18 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  44. arXiv:2312.11511  [pdf, other

    cs.CL cs.AI cs.LG

    ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity

    Authors: Henry Bae, Aghyad Deeb, Alex Fleury, Kehang Zhu

    Abstract: We present ComplexityNet, a streamlined language model designed for assessing task complexity. This model predicts the likelihood of accurate output by various language models, each with different capabilities. Our initial application of ComplexityNet involves the Mostly Basic Python Problems (MBPP) dataset. We pioneered the creation of the first set of labels to define task complexity. Complexity… ▽ More

    Submitted 29 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  45. arXiv:2312.11111  [pdf, other

    cs.AI cs.CL cs.HC

    The Good, The Bad, and Why: Unveiling Emotions in Generative AI

    Authors: Cheng Li, **dong Wang, Yixuan Zhang, Kaijie Zhu, Xinyi Wang, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

    Abstract: Emotion significantly impacts our daily behaviors and interactions. While recent generative AI models, such as large language models, have shown impressive performance in various tasks, it remains unclear whether they truly comprehend emotions. This paper aims to address this gap by incorporating psychological theories to gain a holistic understanding of emotions in generative AI models. Specifica… ▽ More

    Submitted 7 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: International Conference on Machine Learning (ICML) 2024; an extension to EmotionPrompt (arXiv:2307.11760)

  46. arXiv:2312.07910  [pdf, other

    cs.AI cs.CL cs.LG

    PromptBench: A Unified Library for Evaluation of Large Language Models

    Authors: Kaijie Zhu, Qinlin Zhao, Hao Chen, **dong Wang, Xing Xie

    Abstract: The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components that are easily used and extended by researchers: prompt construction, prompt engineering, dataset and model loading, adversarial prompt attack, dynamic evaluat… ▽ More

    Submitted 5 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: An extension to PromptBench (arXiv:2306.04528) for unified evaluation of LLMs using the same name; code: https://github.com/microsoft/promptbench

  47. arXiv:2312.07856  [pdf, other

    cs.CV cs.AI

    DTL: Disentangled Transfer Learning for Visual Recognition

    Authors: Minghao Fu, Ke Zhu, Jianxin Wu

    Abstract: When pre-trained models become rapidly larger, the cost of fine-tuning on downstream tasks steadily increases, too. To economically fine-tune these models, parameter-efficient transfer learning (PETL) is proposed, which only tunes a tiny subset of trainable parameters to efficiently learn quality representations. However, current PETL methods are facing the dilemma that during training the GPU mem… ▽ More

    Submitted 2 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  48. arXiv:2312.06971  [pdf, other

    cs.CV

    CCM: Adding Conditional Controls to Text-to-Image Consistency Models

    Authors: Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha

    Abstract: Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider alternative strategies for adding ControlNet-like conditional control to CMs and present three significant findings. 1) ControlNet trained for diffusion models… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project Page: https://swiftforce.github.io/CCM

  49. arXiv:2312.04854  [pdf, other

    cs.CL cs.AI

    Apollo's Oracle: Retrieval-Augmented Reasoning in Multi-Agent Debates

    Authors: Haotian Wang, Xiyuan Du, Weijiang Yu, Qianglong Chen, Kun Zhu, Zheng Chu, Lian Yan, Yi Guan

    Abstract: Multi-agent debate systems are designed to derive accurate and consistent conclusions through adversarial interactions among agents. However, these systems often encounter challenges due to cognitive constraints, manifesting as (1) agents' obstinate adherence to incorrect viewpoints and (2) their propensity to abandon correct viewpoints. These issues are primarily responsible for the ineffectivene… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 16 pages, 7 figures

  50. arXiv:2312.01732  [pdf, other

    cs.CV

    Likelihood-Aware Semantic Alignment for Full-Spectrum Out-of-Distribution Detection

    Authors: Fan Lu, Kai Zhu, Kecheng Zheng, Wei Zhai, Yang Cao

    Abstract: Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously. However, existing out-of-distribution (OOD) detectors tend to overfit the covariance information and ignore intrinsic semantic correlation, inadequate for adapting to complex domain transformations. To address this issue, we… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 16 pages, 7 figures