Skip to main content

Showing 1–50 of 1,145 results for author: Hu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye **, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00952  [pdf, other

    cs.LG cs.CL cs.DC

    SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao

    Abstract: The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently h… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  3. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiao**g Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2407.00082  [pdf, other

    cs.IR cs.AI cs.LG

    Adapting Job Recommendations to User Preference Drift with Behavioral-Semantic Fusion Learning

    Authors: Xiao Han, Chen Zhu, Xiao Hu, Chuan Qin, Xiangyu Zhao, Hengshu Zhu

    Abstract: Job recommender systems are crucial for aligning job opportunities with job-seekers in online job-seeking. However, users tend to adjust their job preferences to secure employment opportunities continually, which limits the performance of job recommendations. The inherent frequency of preference drift poses a challenge to promptly and precisely capture user preferences. To address this issue, we p… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Accepted by KDD 24 Research Track

  5. arXiv:2406.19783  [pdf, other

    cs.SE cs.CL

    NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations

    Authors: Junkai Chen, Zhenhao Li, Xing Hu, Xin Xia

    Abstract: Large language models (LLMs) achieve promising results in code generation based on a given natural language description. They have been integrated into open-source projects and commercial products to facilitate daily coding activities. The natural language description in the prompt is crucial for LLMs to comprehend users' requirements. Prior studies uncover that LLMs are sensitive to the changes i… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2406.19651  [pdf, other

    cs.DB cs.AI

    CANDY: A Benchmark for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion

    Authors: Xianzhi Zeng, Zhuoyan Wu, Xin**g Hu, Xuanhua Shi, Shixuan Sun, Shuhao Zhang

    Abstract: Approximate K Nearest Neighbor (AKNN) algorithms play a pivotal role in various AI applications, including information retrieval, computer vision, and natural language processing. Although numerous AKNN algorithms and benchmarks have been developed recently to evaluate their effectiveness, the dynamic nature of real-world data presents significant challenges that existing benchmarks fail to addres… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  7. arXiv:2406.19544  [pdf, other

    cs.SE

    Where Are Large Language Models for Code Generation on GitHub?

    Authors: Xiao Yu, Lei Liu, Xing Hu, Jacky Wai Keung, ** Liu, Xin Xia

    Abstract: The increasing use of Large Language Models (LLMs) in software development has garnered significant attention from researchers assessing the quality of the code they generate. However, much of the research focuses on controlled datasets such as HumanEval, which fail to adequately represent how developers actually utilize LLMs' code generation capabilities or clarify the characteristics of LLM-gene… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2406.18365  [pdf, other

    cs.CL

    Themis: Towards Flexible and Interpretable NLG Evaluation

    Authors: Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan

    Abstract: The evaluation of natural language generation (NLG) tasks is a significant and longstanding research issue. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the impro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  10. arXiv:2406.18021  [pdf, other

    cs.SD cs.LG eess.AS

    SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

    Authors: Shuaishuai Ye, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Cl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 2 figures

  11. arXiv:2406.16786  [pdf, other

    cs.CE

    Generalized and high-efficiency arbitrary-positioned buffer for smoothed particle hydrodynamics

    Authors: Shuoguo Zhang, Yu Fan, Yaru Ren, Bin Qian, Xiangyu Hu

    Abstract: This paper develops an arbitrary-positioned buffer for the smoothed particle hydrodynamics (SPH) method, whose generality and high efficiency are achieved through two techniques. First, with the local coordinate system established at each arbitrary-positioned in-/outlet, particle positions in the global coordinate system are transformed into those in it via coordinate transformation. Since one loc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 34 pages and 17 figures

  12. arXiv:2406.16743  [pdf, other

    cs.CL

    Adversarial Contrastive Decoding: Boosting Safety Alignment of Large Language Models via Opposite Prompt Optimization

    Authors: Zhengyue Zhao, Xiaoyun Zhang, Kaidi Xu, Xing Hu, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen

    Abstract: With the widespread application of Large Language Models (LLMs), it has become a significant concern to ensure their safety and prevent harmful responses. While current safe-alignment methods based on instruction fine-tuning and Reinforcement Learning from Human Feedback (RLHF) can effectively reduce harmful responses from LLMs, they often require high-quality datasets and heavy computational over… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.16694  [pdf, other

    cs.CL

    Task Oriented In-Domain Data Augmentation

    Authors: Xiao Liang, Xinyu Hu, Simiao Zuo, Yeyun Gong, Qiang Lou, Yi Liu, Shao-Lun Huang, Jian Jiao

    Abstract: Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for contin… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.16583  [pdf, other

    cs.LG cs.CV

    Personalized federated learning based on feature fusion

    Authors: Wolong Xing, Zhenkui Shi, Hongyan Peng, Xiantao Hu, Xianxian Li

    Abstract: Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. However, due to the heterogeneity of data, models, and devices, the final global model may need to perform better for tasks on each client. Communication bottlenecks, data heterogeneity, and model heterogeneity have been common challenges in federated learning. In t… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.15485  [pdf, other

    cs.CL cs.CV

    SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection

    Authors: Xingjian Hu, Baole Wei, Liangcai Gao

    Abstract: Text line detection is a key task in historical document analysis facing many challenges of arbitrary-shaped text lines, dense texts, and text lines with high aspect ratios, etc. In this paper, we propose a general framework for historical document text detection (SegHist), enabling existing segmentation-based text detection methods to effectively address the challenges, especially text lines with… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by ICDAR2024

  16. arXiv:2406.15477  [pdf

    cs.CL cs.AI

    CrisisSense-LLM: Instruction Fine-Tuned Large Language Model for Multi-label Social Media Text Classification in Disaster Informatics

    Authors: Kai Yin, Chengkai Liu, Ali Mostafavi, Xia Hu

    Abstract: In the field of crisis/disaster informatics, social media is increasingly being used for improving situational awareness to inform response and relief efforts. Efficient and accurate text classification tools have been a focal area of investigation in crisis informatics. However, current methods mostly rely on single-label text classification models, which fails to capture different insights embed… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  17. arXiv:2406.15245  [pdf, other

    cs.CL cs.LG

    Unsupervised Morphological Tree Tokenizer

    Authors: Qingyang Zhu, Xiang Hu, Pengyu Ji, Wei Wu, Kewei Tu

    Abstract: As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of word… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  18. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, **gbo Wang, Tai Wang, **kun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  19. arXiv:2406.14045  [pdf, other

    cs.LG cs.AI

    Understanding Different Design Choices in Training Large Time Series Models

    Authors: Yu-Neng Chuang, Songchen Li, Jiayi Yuan, Guanchu Wang, Kwei-Herng Lai, Leisheng Yu, Sirui Ding, Chia-Yuan Chang, Qiaoyu Tan, Daochen Zha, Xia Hu

    Abstract: Inspired by Large Language Models (LLMs), Time Series Forecasting (TSF), a long-standing task in time series analysis, is undergoing a transition towards Large Time Series Models (LTSMs), aiming to train universal transformer-based models for TSF. However, training LTSMs on heterogeneous time series data poses unique challenges, including diverse frequencies, dimensions, and patterns across datase… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.13919  [pdf, other

    cs.AI

    SPL: A Socratic Playground for Learning Powered by Large Language Model

    Authors: Liang Zhang, Jionghao Lin, Ziyi Kuang, Sheng Xu, Mohammed Yeasin, Xiangen Hu

    Abstract: Dialogue-based Intelligent Tutoring Systems (ITSs) have significantly advanced adaptive and personalized learning by automating sophisticated human tutoring strategies within interactive dialogues. However, replicating the nuanced patterns of expert human communication remains a challenge in Natural Language Processing (NLP). Recent advancements in NLP, particularly Large Language Models (LLMs) su… ▽ More

    Submitted 20 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  21. arXiv:2406.13219  [pdf, other

    cs.CV cs.CL

    MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

    Authors: Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

    Abstract: Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal kno… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  22. arXiv:2406.13185  [pdf, other

    cs.CL

    Learnable In-Context Vector for Visual Question Answering

    Authors: Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng

    Abstract: As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, applying ICL us… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  23. arXiv:2406.12757  [pdf, other

    cs.CV

    MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

    Authors: Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Bo Du, Yu Wu

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' n… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13pages,5figures

  24. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, **long Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11357  [pdf, other

    cs.CL cs.AI

    Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

    Authors: Zhonghao Li, Xuming Hu, Aiwei Liu, Kening Zheng, Sirui Huang, Hui Xiong

    Abstract: Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. To address this, Retrieval-Augmented Generation (RAG) incorporates external document chunks to expand LLM knowledge. Furthermore, compressing information from document chunks through extraction or summarization can improve LLM performance. Nonetheless, LLMs still struggle… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  26. arXiv:2406.11345  [pdf, other

    cs.CL cs.AI

    Full-ECE: A Metric For Token-level Calibration on Large Language Models

    Authors: Han Liu, Yupeng Zhang, Bingning Wang, Weipeng Chen, Xiaolin Hu

    Abstract: Deep Neural Networks (DNNs) excel in various domains but face challenges in providing accurate uncertainty estimates, which are crucial for high-stakes applications. Large Language Models (LLMs) have recently emerged as powerful tools, demonstrating exceptional performance in language tasks. However, traditional calibration metrics such as Expected Calibration Error (ECE) and classwise-ECE (cw-ECE… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  27. arXiv:2406.11309  [pdf, other

    cs.CV

    BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models

    Authors: Xuefeng Hu, Ke Zhang, Min Sun, Albert Chen, Cheng-Hao Kuo, Ram Nevatia

    Abstract: Large-scale pretrained vision-language models like CLIP have demonstrated remarkable zero-shot image classification capabilities across diverse domains. To enhance CLIP's performance while preserving the zero-shot paradigm, various test-time prompt tuning methods have been introduced to refine class embeddings through unsupervised learning objectives during inference. However, these methods often… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Preprint updated from our earlier manuscript submitted to ICLR 2024 (https://openreview.net/forum?id=KNtcoAM5Gy)

  28. arXiv:2406.11213  [pdf, other

    cs.SE

    A Survey of AIOps for Failure Management in the Era of Large Language Models

    Authors: Lingzhe Zhang, Tong Jia, Mengxi Jia, Yifan Wu, Aiwei Liu, Yong Yang, Zhonghai Wu, Xuming Hu, Philip S. Yu, Ying Li

    Abstract: As software systems grow increasingly intricate, Artificial Intelligence for IT Operations (AIOps) methods have been widely used in software system failure management to ensure the high availability and reliability of large-scale distributed software systems. However, these methods still face several challenges, such as lack of cross-platform generality and cross-task flexibility. Fortunately, rec… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 35 pages

  29. arXiv:2406.11193  [pdf, other

    cs.CL

    MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

    Authors: Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

    Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechan… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  30. arXiv:2406.09781  [pdf, other

    cs.CV

    GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding

    Authors: Yiqi Wu, Xiaodan Hu, Ziming Fu, Siling Zhou, Jiangong Li

    Abstract: Animal ethology is an crucial aspect of animal research, and animal behavior labeling is the foundation for studying animal behavior. This process typically involves labeling video clips with behavioral semantic tags, a task that is complex, subjective, and multimodal. With the rapid development of multimodal large language models(LLMs), new application have emerged for animal behavior understandi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  31. arXiv:2406.09723  [pdf, other

    cs.LG cs.AI

    When Will Gradient Regularization Be Harmful?

    Authors: Yang Zhao, Hao Zhang, Xiuyuan Hu

    Abstract: Gradient regularization (GR), which aims to penalize the gradient norm atop the loss function, has shown promising results in training modern over-parameterized deep neural networks. However, can we trust this powerful technique? This paper reveals that GR can cause performance degeneration in adaptive optimization scenarios, particularly with learning rate warmup. Our empirical and theoretical an… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: ICML 2024 paper

    MSC Class: 55N31 ACM Class: I.4.0

  32. arXiv:2406.09701  [pdf, other

    cs.SE

    Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models

    Authors: Qiheng Mao, Zhenhao Li, Xing Hu, Kui Liu, Xin Xia, Jianling Sun

    Abstract: Software vulnerabilities pose significant risks to the security and integrity of software systems. Prior studies have proposed a series of approaches to vulnerability detection using deep learning or pre-trained models. However, there is still a lack of vulnerability's detailed explanation for understanding apart from detecting its occurrence. Recently, large language models (LLMs) have shown a re… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  33. arXiv:2406.09393  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Autoregressive Training with Dynamic Oracles

    Authors: Jianing Yang, Harshine Visvanathan, Yilin Wang, Xinyi Hu, Matthew Gormley

    Abstract: Many tasks within NLP can be framed as sequential decision problems, ranging from sequence tagging to text generation. However, for many tasks, the standard training methods, including maximum likelihood (teacher forcing) and scheduled sampling, suffer from exposure bias and a mismatch between metrics employed during training and inference. DAgger provides a solution to mitigate these problems, ye… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  34. arXiv:2406.09089  [pdf, other

    cs.LG

    DiffPoGAN: Diffusion Policies with Generative Adversarial Networks for Offline Reinforcement Learning

    Authors: Xuemin Hu, Shen Li, Yingfen Xu, Bo Tang, Long Chen

    Abstract: Offline reinforcement learning (RL) can learn optimal policies from pre-collected offline datasets without interacting with the environment, but the sampled actions of the agent cannot often cover the action distribution under a given state, resulting in the extrapolation error issue. Recent works address this issue by employing generative adversarial networks (GANs). However, these methods often… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  35. arXiv:2406.08115  [pdf, other

    cs.DC cs.AI

    Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A Survey

    Authors: Feng Liang, Zhen Zhang, Haifeng Lu, Chengming Li, Victor C. M. Leung, Yanyi Guo, Xi** Hu

    Abstract: With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have become the key to high-performance deep learning. The large-scale environment with large volumes of datasets, models, and computational and communication resources raises various unique challenges for… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  36. arXiv:2406.07451  [pdf, other

    cs.LG

    An Optimism-based Approach to Online Evaluation of Generative Models

    Authors: Xiaoyan Hu, Ho-fung Leung, Farzan Farnia

    Abstract: Existing frameworks for evaluating and comparing generative models typically target an offline setting, where the evaluator has access to full batches of data produced by the models. However, in many practical scenarios, the goal is to identify the best model using the fewest generated samples to minimize the costs of querying data from the models. Such an online comparison is challenging with cur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: arXiv version

  37. arXiv:2406.07444  [pdf, other

    cs.CL

    On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations

    Authors: Shiao Meng, Xuming Hu, Aiwei Liu, Fukun Ma, Yawen Yang, Shuang Li, Lijie Wen

    Abstract: Driven by the demand for cross-sentence and large-scale relation extraction, document-level relation extraction (DocRE) has attracted increasing research interest. Despite the continuous improvement in performance, we find that existing DocRE models which initially perform well may make more mistakes when merely changing the entity names in the document, hindering the generalization to novel entit… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

    MSC Class: 68T50 ACM Class: I.2.7

  38. arXiv:2406.07147  [pdf

    cs.HC cs.AI cs.CY

    Wearable Device-Based Physiological Signal Monitoring: An Assessment Study of Cognitive Load Across Tasks

    Authors: Ling He, Yanxin Chen, Wenqi Wang, Shuting He, Xiaoqiang Hu

    Abstract: This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution cognitive load assessment on EEG data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students(SVS). By jointly analyzing these two critical physiological indicators, the research delves into their application value in assessing cognitive load among… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.06544  [pdf, other

    cs.AR cs.AI

    TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

    Authors: Yifan Qin, Zheyu Yan, Zixuan Pan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NV… ▽ More

    Submitted 8 May, 2024; originally announced June 2024.

  40. arXiv:2406.06374  [pdf, other

    cs.RO cs.CV

    Multicam-SLAM: Non-overlap** Multi-camera SLAM for Indirect Visual Localization and Navigation

    Authors: Shenghao Li, Luchao Pang, Xianglong Hu

    Abstract: This paper presents a novel approach to visual simultaneous localization and map** (SLAM) using multiple RGB-D cameras. The proposed method, Multicam-SLAM, significantly enhances the robustness and accuracy of SLAM systems by capturing more comprehensive spatial information from various perspectives. This method enables the accurate determination of pose relationships among multiple cameras with… ▽ More

    Submitted 23 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  41. arXiv:2406.05862  [pdf, other

    cs.CL cs.AI cs.CV

    II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xi** Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

    Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 100 pages, 82 figures, add citations

  42. arXiv:2406.05682  [pdf, other

    cs.LG cs.AI

    From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR

    Authors: Ran Xu, Yiwen Lu, Chang Liu, Yong Chen, Yan Sun, Xiao Hu, Joyce C Ho, Carl Yang

    Abstract: Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice. In recent years, deep learning models have been applied to EHRs, but they often rely on massive features, which may not be readily available for all patients. We propose HTP-Star, which leverages hypergraph structures with a pretrain-then-finetune framework for modeling EHR data, e… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: CHIL 2024

  43. arXiv:2406.05536  [pdf, other

    cs.DB

    Output-Optimal Algorithms for Join-Aggregate Queries

    Authors: Xiao Hu

    Abstract: The classic Yannakakis framework proposed in 1981 is still the state-of-the-art approach for tackling acyclic join-aggregate queries defined over commutative semi-rings. It has been shown that the time complexity of the Yannakakis framework is $O(N + \OUT)$ for any free-connex join-aggregate query, where $N$ is the input size of database and $\OUT$ is the output size of the query result. This is a… ▽ More

    Submitted 26 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  44. arXiv:2406.05247  [pdf, other

    cs.IR

    Measuring Fairness in Large-Scale Recommendation Systems with Missing Labels

    Authors: Yulong Dong, Kun **, Xinghai Hu, Yang Liu

    Abstract: In large-scale recommendation systems, the vast array of items makes it infeasible to obtain accurate user preferences for each product, resulting in a common issue of missing labels. Typically, only items previously recommended to users have associated ground truth data. Although there is extensive research on fairness concerning fully observed user-item interactions, the challenge of fairness in… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  45. arXiv:2406.03768  [pdf, other

    cs.LG cs.AI

    Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

    Authors: Xinhao Yao, Xiaolin Hu, Shenzhi Yang, Yong Liu

    Abstract: Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities. With a few demonstration input-label pairs, they can predict the label for an unseen input without any parameter updates. In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance, and more surprising, pruning weights in deep la… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  46. arXiv:2406.03283  [pdf, other

    cs.SE cs.AI

    Enhancing Repository-Level Code Generation with Integrated Contextual Information

    Authors: Zhiyuan Pan, Xing Hu, Xin Xia, Xiaohu Yang

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, repository-level code generation presents unique challenges, particularly due to the need to utilize information spread across multiple files within a repository. Existing retrieval-based approaches sometimes fall short as they are limited in obtaining a broader and deeper repository context.… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  47. arXiv:2406.03250  [pdf, other

    cs.CV cs.AI

    Prompt-based Visual Alignment for Zero-shot Policy Transfer

    Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, QiCheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen

    Abstract: Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issue… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ICML2024

  48. arXiv:2406.02148  [pdf, other

    cs.CL cs.AI

    Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

    Authors: Qingkai Min, Qipeng Guo, Xiangkun Hu, Songfang Huang, Zheng Zhang, Yue Zhang

    Abstract: Cross-document event coreference resolution (CDECR) involves clustering event mentions across multiple documents that refer to the same real-world events. Existing approaches utilize fine-tuning of small language models (SLMs) like BERT to address the compatibility among the contexts of event mentions. However, due to the complexity and diversity of contexts, these models are prone to learning sim… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-24 Main

  49. arXiv:2406.02096  [pdf, other

    cs.RO

    MS-Map**: Multi-session LiDAR Map** with Wasserstein-based Keyframe Selection

    Authors: Xiangcheng Hu, ** Wu, Jianhao Jiao, Wei Zhang, ** Tan

    Abstract: Large-scale multi-session LiDAR map** plays a crucial role in various applications but faces significant challenges in data redundancy and pose graph scalability. This paper present MS-Map**, a novel multi-session LiDAR map** system that combines an incremental map** scheme with support for various LiDAR-based odometry, enabling high-precision and consistent map assembly in large-scale env… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 5 pages, 4 figures

  50. arXiv:2406.01154  [pdf, other

    cs.CV

    UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation

    Authors: Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan

    Abstract: Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namel… ▽ More

    Submitted 20 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.