Skip to main content

Showing 1–50 of 3,966 results for author: Zhang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19592  [pdf, other

    cs.PL

    Dataflow-Based Optimization for Quantum Intermediate Representation Programs

    Authors: Junjie Luo, Haoyu Zhang, Jianjun Zhao

    Abstract: This paper proposes QDFO, a dataflow-based optimization approach to Microsoft QIR. QDFO consists of two main functions: one is to preprocess the QIR code so that the LLVM optimizer can capture more optimization opportunities, and the other is to optimize the QIR code so that duplicate loading and constructing of qubits and qubit arrays can be avoided. We evaluated our work on the IBM Challenge Dat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19156  [pdf, other

    cs.LG

    Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association Prediction

    Authors: Kexin Zhang, Feng Huang, Luotao Liu, Zhankun Xiong, Hongyu Zhang, Yuan Quan, Wen Zhang

    Abstract: The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  4. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Ya**g Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, **g Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  5. arXiv:2406.18914  [pdf, other

    eess.SY cs.RO

    Verification and Synthesis of Compatible Control Lyapunov and Control Barrier Functions

    Authors: Hongkai Dai, Chuanrui Jiang, Hongchao Zhang, Andrew Clark

    Abstract: Safety and stability are essential properties of control systems. Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) have been proposed to ensure safety and stability respectively. However, previous approaches typically verify and synthesize the CBFs and CLFs separately, satisfying their respective constraints, without proving that the CBFs and CLFs are compatible with each oth… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.18873  [pdf, other

    cs.AR

    LayoutCopilot: An LLM-powered Multi-agent Collaborative Framework for Interactive Analog Layout Design

    Authors: Bingyang Liu, Haoyi Zhang, Xiaohan Gao, Zichen Kong, Xiyuan Tang, Yibo Lin, Runsheng Wang, Ru Huang

    Abstract: Analog layout design heavily involves interactive processes between humans and design tools. The tools are usually designed to use scripting commands or visualized buttons for manipulation, especially for those interactive automation functionalities, which have a steep learning curve and cumbersome user experience, making a notable barrier to their adoption by designers. Aiming to address such a u… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 8pages, 8figures

  7. arXiv:2406.18836  [pdf, other

    cs.CV cs.IR

    Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs

    Authors: Huaying Zhang, Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper proposes a novel zero-shot composed image retrieval (CIR) method considering the query-target relationship by masked image-text pairs. The objective of CIR is to retrieve the target image using a query image and a query text. Existing methods use a textual inversion network to convert the query image into a pseudo word to compose the image and text and use a pre-trained visual-language… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper in IEEE ICIP 2024

  8. arXiv:2406.18562  [pdf, other

    cs.CV cs.LG

    Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation

    Authors: Kimia Hamidieh, Haoran Zhang, Swami Sankaranarayanan, Marzyeh Ghassemi

    Abstract: Supervised learning methods have been found to exhibit inductive biases favoring simpler features. When such features are spuriously correlated with the label, this can result in suboptimal performance on minority subgroups. Despite the growing popularity of methods which learn from unlabeled data, the extent to which these representations rely on spurious features for prediction is unclear. In th… ▽ More

    Submitted 28 May, 2024; originally announced June 2024.

  9. arXiv:2406.18542  [pdf, other

    cs.CV eess.SP

    Generative AI Empowered LiDAR Point Cloud Generation with Multimodal Transformer

    Authors: Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci

    Abstract: Integrated sensing and communications is a key enabler for the 6G wireless communication systems. The multiple sensing modalities will allow the base station to have a more accurate representation of the environment, leading to context-aware communications. Some widely equipped sensors such as cameras and RADAR sensors can provide some environmental perceptions. However, they are not enough to gen… ▽ More

    Submitted 20 May, 2024; originally announced June 2024.

    Comments: 6 pages, 4 figures, conference

  10. arXiv:2406.18539  [pdf, other

    cs.CV cs.GR

    TexPainter: Generative Mesh Texturing with Multi-view Consistency

    Authors: Hongkun Zhang, Zherong Pan, Congyi Zhang, Lifeng Zhu, Xifeng Gao

    Abstract: The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a multi-view consistent texture image poses a major obstacle to the output quality. In this paper, we propose a novel method to enforce multi-view consistency. Our meth… ▽ More

    Submitted 17 May, 2024; originally announced June 2024.

    Comments: accepted by Siggraph 2024

  11. arXiv:2406.18443  [pdf, other

    cs.CV

    Unveiling the Unknown: Conditional Evidence Decoupling for Unknown Rejection

    Authors: Zhaowei Wu, Binyi Su, Hua Zhang, Zhong Zhou

    Abstract: In this paper, we focus on training an open-set object detector under the condition of scarce training samples, which should distinguish the known and unknown categories. Under this challenging scenario, the decision boundaries of unknowns are difficult to learn and often ambiguous. To mitigate this issue, we develop a novel open-set object detection framework, which delves into conditional eviden… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  12. arXiv:2406.18326  [pdf, other

    cs.CL cs.AI

    PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models

    Authors: Huixuan Zhang, Yun Lin, Xiaojun Wan

    Abstract: Large language models (LLMs) are known to be trained on vast amounts of data, which may unintentionally or intentionally include data from commonly used benchmarks. This inclusion can lead to cheatingly high scores on model leaderboards, yet result in disappointing performance in real-world applications. To address this benchmark contamination problem, we first propose a set of requirements that p… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  13. arXiv:2406.18192  [pdf, other

    cs.CL cs.AI

    Methodology of Adapting Large English Language Models for Specific Cultural Contexts

    Authors: Wen**g Zhang, Siqi Xiao, Xuejiao Lei, Ning Wang, Huazheng Zhang, Meijuan An, Bikun Yang, Zhaoxiang Liu, Kai Wang, Shiguo Lian

    Abstract: The rapid growth of large language models(LLMs) has emerged as a prominent trend in the field of artificial intelligence. However, current state-of-the-art LLMs are predominantly based on English. They encounter limitations when directly applied to tasks in specific cultural domains, due to deficiencies in domain-specific knowledge and misunderstandings caused by differences in cultural values. To… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, 2 figures

  14. arXiv:2406.18102  [pdf

    eess.IV cs.CV

    A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

    Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wen**g Xu, Huixiang Zhi

    Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.16866  [pdf, other

    cs.CV

    Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

    Authors: Jierun Chen, Fangyun Wei, ****g Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S. -H. Gary Chan, Hongyang Zhang

    Abstract: Referring expression comprehension (REC) involves localizing a target instance based on a textual description. Recent advancements in REC have been driven by large multimodal models (LMMs) like CogVLM, which achieved 92.44% accuracy on RefCOCO. However, this study questions whether existing benchmarks such as RefCOCO, RefCOCO+, and RefCOCOg, capture LMMs' comprehensive capabilities. We begin with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  16. arXiv:2406.16858  [pdf, other

    cs.CL cs.LG

    EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

    Authors: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

    Abstract: Inference with modern Large Language Models (LLMs) is expensive and time-consuming, and speculative sampling has proven to be an effective solution. Most speculative sampling methods such as EAGLE use a static draft tree, implicitly assuming that the acceptance rate of draft tokens depends only on their position. Interestingly, we found that the acceptance rate of draft tokens is also context-depe… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  17. arXiv:2406.16279  [pdf, other

    cs.CV

    SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments

    Authors: Neng Wang, Ruibin Guo, Chenghao Shi, Hui Zhang, Huimin Lu, Zhiqiang Zheng, Xieyuanli Chen

    Abstract: 4D LiDAR semantic segmentation, also referred to as multi-scan semantic segmentation, plays a crucial role in enhancing the environmental understanding capabilities of autonomous vehicles. It entails identifying the semantic category of each point in the LiDAR scan and distinguishing whether it is dynamic, a critical aspect in downstream tasks such as path planning and autonomous navigation. Exist… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

  18. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  19. arXiv:2406.16221  [pdf, other

    cs.LG cs.AI cs.GR econ.EM stat.ME

    F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data

    Authors: Zexing Xu, Linjun Zhang, Sitan Yang, Rasoul Etesami, Hanghang Tong, Huan Zhang, Jiawei Han

    Abstract: Demand prediction is a crucial task for e-commerce and physical retail businesses, especially during high-stake sales events. However, the limited availability of historical data from these peak periods poses a significant challenge for traditional forecasting methods. In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns f… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    MSC Class: 68T07; 68T05; 62M10; 62M20; 90C90; 91B84

  20. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  21. arXiv:2406.16026  [pdf

    physics.med-ph cs.LG eess.IV

    CEST-KAN: Kolmogorov-Arnold Networks for CEST MRI Data Analysis

    Authors: Jiawen Wang, Pei Cai, Ziyan Wang, Huabin Zhang, Jianpan Huang

    Abstract: Purpose: This study aims to propose and investigate the feasibility of using Kolmogorov-Arnold Network (KAN) for CEST MRI data analysis (CEST-KAN). Methods: CEST MRI data were acquired from twelve healthy volunteers at 3T. Data from ten subjects were used for training, while the remaining two were reserved for testing. The performance of multi-layer perceptron (MLP) and KAN models with the same ne… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  22. arXiv:2406.15970  [pdf, ps, other

    cs.GT cs.AI cs.CC

    Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

    Authors: Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer

    Abstract: We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer se… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Long version of the paper that got accepted to the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024). 35 pages, 10 figures, 1 table

    MSC Class: 91A05; 91A06; 91A10; 91A11; 91A18; 91A35; 91A68; 68T37; 68Q17; 68Q25 ACM Class: I.2; J.4; F.2

  23. arXiv:2406.15778  [pdf, other

    cs.CV cs.AI

    ObjectNLQ @ Ego4D Episodic Memory Challenge 2024

    Authors: Yisen Feng, Haoyu Zhang, Yuquan Xie, Zai**g Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The solution for the Natural Language Query track and Goal Step track at CVPR EgoVis Workshop 2024

  24. arXiv:2406.15771  [pdf, other

    cs.CV cs.AI

    HCQA @ Ego4D EgoSchema Challenge 2024

    Authors: Haoyu Zhang, Yuquan Xie, Yisen Feng, Zai**g Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our champion solution for Ego4D EgoSchema Challenge in CVPR 2024. To deeply integrate the powerful egocentric captioning model and question reasoning model, we propose a novel Hierarchical Comprehension scheme for egocentric video Question Answering, named HCQA. It consists of three stages: Fine-grained Caption Generation, Context-driven Summarization, and Inference-guid… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The champion solution for Ego4D EgoSchema Challenge in CVPR EgoVis Workshop 2024

  25. arXiv:2406.15306  [pdf

    cs.LG cs.CL cs.CV

    Advanced Multimodal Deep Learning Architecture for Image-Text Matching

    Authors: **yin Wang, Hai**g Zhang, Yihao Zhong, Yingbin Liang, Rongwei Ji, Yiru Cang

    Abstract: Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17460 by other authors

  26. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  27. arXiv:2406.15187  [pdf, other

    cs.AI cs.IR

    UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis

    Authors: Yulong Hui, Yao Lu, Huanchen Zhang

    Abstract: The use of Retrieval-Augmented Generation (RAG) has improved Large Language Models (LLMs) in collaborating with external data, yet significant challenges exist in real-world scenarios. In areas such as academic literature and finance question answering, data are often found in raw text and tables in HTML or PDF formats, which can be lengthy and highly unstructured. In this paper, we introduce a be… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  28. arXiv:2406.15177  [pdf, other

    cs.MM

    EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot

    Authors: Hao Fei, Han Zhang, Bin Wang, Lizi Liao, Qian Liu, Erik Cambria

    Abstract: This paper introduces EmpathyEar, a pioneering open-source, avatar-based multimodal empathetic chatbot, to fill the gap in traditional text-only empathetic response generation (ERG) systems. Leveraging the advancements of a large language model, combined with multimodal encoders and generators, EmpathyEar supports user inputs in any combination of text, sound, and vision, and produces multimodal e… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Demonstration Paper

  29. arXiv:2406.15034  [pdf, other

    cs.CV

    SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition

    Authors: Liutao Yu, Liwei Huang, Chenlin Zhou, Han Zhang, Zhengyu Ma, Huihui Zhou, Yonghong Tian

    Abstract: Video action recognition (VAR) plays crucial roles in various domains such as surveillance, healthcare, and industrial automation, making it highly significant for the society. Consequently, it has long been a research spot in the computer vision field. As artificial neural networks (ANNs) are flourishing, convolution neural networks (CNNs), including 2D-CNNs and 3D-CNNs, as well as variants of th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024 workshop - Human Brain and Artificial Intelligence

  30. arXiv:2406.14962  [pdf, other

    cs.CV

    Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning

    Authors: Suyi Li, Chenyi Jiang, Shidong Wang, Yang Long, Zheng Zhang, Haofeng Zhang

    Abstract: Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addr… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  31. arXiv:2406.14898  [pdf, other

    cs.CR cs.CL

    Safely Learning with Private Data: A Federated Learning Framework for Large Language Model

    Authors: JiaYing Zheng, HaiNan Zhang, LingXiang Wang, WangJie Qiu, HongWei Zheng, ZhiMing Zheng

    Abstract: Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuit… ▽ More

    Submitted 26 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  32. arXiv:2406.14833  [pdf, other

    cs.CL

    Efficient Continual Pre-training by Mitigating the Stability Gap

    Authors: Yiduo Guo, Jie Fu, Huishuai Zhang, Dongyan Zhao, Yikang Shen

    Abstract: Continual pre-training has increasingly become the predominant approach for adapting Large Language Models (LLMs) to new domains. This process involves updating the pre-trained LLM with a corpus from a new domain, resulting in a shift in the training distribution. To study the behavior of LLMs during this shift, we measured the model's performance throughout the continual pre-training process. we… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  33. arXiv:2406.14455  [pdf, other

    cs.CV

    MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

    Authors: Luhui Cai, Weiming Zeng, Hongyu Chen, Hua Zhang, Yueyang Li, Hongjie Yan, Lingbin Bian, Nizhuan Wang

    Abstract: Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  34. arXiv:2406.14312  [pdf, other

    cs.CL cs.AI

    Infusing clinical knowledge into tokenisers for language models

    Authors: Abul Hasan, **ge Wu, Quang Ngoc Nguyen, Salomé Andres, Imane Guellil, Huayu Zhang, Arlene Casey, Beatrice Alex, Bruce Guthrie, Honghan Wu

    Abstract: This study introduces a novel knowledge enhanced tokenisation mechanism, K-Tokeniser, for clinical text processing. Technically, at initialisation stage, K-Tokeniser populates global representations of tokens based on semantic types of domain concepts (such as drugs or diseases) from either a domain ontology like Unified Medical Language System or the training data of the task related corpus. At t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures

  35. arXiv:2406.14232  [pdf, other

    cs.LG cs.AI

    Enhancing robustness of data-driven SHM models: adversarial training with circle loss

    Authors: Xiangli Yang, Xijie Deng, Hanwei Zhang, Yang Zou, Jianxi Yang

    Abstract: Structural health monitoring (SHM) is critical to safeguarding the safety and reliability of aerospace, civil, and mechanical infrastructure. Machine learning-based data-driven approaches have gained popularity in SHM due to advancements in sensors and computational power. However, machine learning models used in SHM are vulnerable to adversarial examples -- even small changes in input can lead to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  36. arXiv:2406.14217  [pdf, other

    cs.LG cs.CR

    Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning

    Authors: Yu**g Wang, Hainan Zhang, Sijia Wen, Wangjie Qiu, Binghui Guo

    Abstract: Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such adv… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  37. arXiv:2406.14088  [pdf, other

    cs.DC cs.AI cs.CL cs.LG

    ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

    Authors: Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 13 pages (15 pages with references), 13 figures

  38. arXiv:2406.14071  [pdf, other

    stat.ML cs.LG

    Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits

    Authors: Ziyi Huang, Henry Lam, Haofeng Zhang

    Abstract: Bayesian bandit algorithms with approximate Bayesian inference have been widely used in real-world applications. Nevertheless, their theoretical justification is less investigated in the literature, especially for contextual bandit problems. To fill this gap, we propose a general theoretical framework to analyze stochastic linear bandits in the presence of approximate inference and conduct regret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  39. arXiv:2406.14066  [pdf, other

    cs.AI cs.PF

    Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

    Authors: Xiaoxuan Liu, Cade Daniel, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang

    Abstract: Reducing the inference latency of large language models (LLMs) is crucial, and speculative decoding (SD) stands out as one of the most effective techniques. Rather than letting the LLM generate all tokens directly, speculative decoding employs effective proxies to predict potential outputs, which are then verified by the LLM without compromising the generation quality. Yet, deploying SD in real on… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  40. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, **gyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  41. arXiv:2406.13892  [pdf, other

    cs.CL

    Adaptable Logical Control for Large Language Models

    Authors: Honghua Zhang, Po-Nien Kung, Masahiro Yoshida, Guy Van den Broeck, Nanyun Peng

    Abstract: Despite the success of Large Language Models (LLMs) on various tasks following human instructions, controlling model generation at inference time poses a persistent challenge. In this paper, we introduce Ctrl-G, an adaptable framework that facilitates tractable and flexible control of LLM generation to reliably follow logical constraints. Ctrl-G combines any production-ready LLM with a Hidden Mark… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. SRL-VIC: A Variable Stiffness-Based Safe Reinforcement Learning for Contact-Rich Robotic Tasks

    Authors: Heng Zhang, Gokhan Solak, Gustavo J. G. Lahr, Arash Ajoudani

    Abstract: Reinforcement learning (RL) has emerged as a promising paradigm in complex and continuous robotic tasks, however, safe exploration has been one of the main challenges, especially in contact-rich manipulation tasks in unstructured environments. Focusing on this issue, we propose SRL-VIC: a model-free safe RL framework combined with a variable impedance controller (VIC). Specifically, safety critic… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE RA-L,video is available at https://youtu.be/ksWXR3vByoQ

    Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 6, pp. 5631-5638, June 2024

  43. arXiv:2406.13660  [pdf, other

    cs.CL cs.AI

    Towards Minimal Targeted Updates of Language Models with Targeted Negative Training

    Authors: Lily H. Zhang, Rajesh Ranganath, Arya Tafvizi

    Abstract: Generative models of language exhibit impressive capabilities but still place non-negligible probability mass over undesirable outputs. In this work, we address the task of updating a model to avoid unwanted outputs while minimally changing model behavior otherwise, a challenge we refer to as a minimal targeted update. We first formalize the notion of a minimal targeted update and propose a method… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Published in Transactions of Machine Learning Research

  44. arXiv:2406.13606  [pdf, other

    cs.CV

    DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning

    Authors: Xiaowen Ma, Jiawei Yang, Rui Che, Huanting Zhang, Wei Zhang

    Abstract: Remote sensing change detection (RSCD) aims to identify the changes of interest in a region by analyzing multi-temporal remote sensing images, and has an outstanding value for local development monitoring. Existing RSCD methods are devoted to contextual modeling in the spatial domain to enhance the changes of interest. Despite the satisfactory performance achieved, the lack of knowledge in the fre… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ICME 2024 Oral

  45. arXiv:2406.13233  [pdf, other

    cs.AI

    AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

    Authors: Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

    Abstract: Mixture of experts (MoE) has become the standard for constructing production-level large language models (LLMs) due to its promise to boost model capacity without causing significant overheads. Nevertheless, existing MoE methods usually enforce a constant top-k routing for all tokens, which is arguably restrictive because various tokens (e.g., "<EOS>" vs. "apple") may require various numbers of ex… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  46. arXiv:2406.13219  [pdf, other

    cs.CV cs.CL

    MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

    Authors: Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

    Abstract: Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal kno… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  47. arXiv:2406.13116  [pdf, ps, other

    cs.GT

    A Lower Bound on Swap Regret in Extensive-Form Games

    Authors: Constantinos Daskalakis, Gabriele Farina, Noah Golowich, Tuomas Sandholm, Brian Hu Zhang

    Abstract: Recent simultaneous works by Peng and Rubinstein [2024] and Dagan et al. [2024] have demonstrated the existence of a no-swap-regret learning algorithm that can reach $ε$ average swap regret against an adversary in any extensive-form game within $m^{\tilde{\mathcal O}(1/ε)}$ rounds, where $m$ is the number of nodes in the game tree. However, the question of whether a $\mathrm{poly}(m, 1/ε)$-round a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  48. Blitzcrank: Fast Semantic Compression for In-memory Online Transaction Processing

    Authors: Yiming Qiao, Yihan Gao, Huanchen Zhang

    Abstract: We present BLITZCRANK, a high-speed semantic compressor designed for OLTP databases. Previous solutions are inadequate for compressing row-stores: they suffer from either low compression factor due to a coarse compression granularity or suboptimal performance due to the inefficiency in handling dynamic data sets. To solve these problems, we first propose novel semantic models that support fast inf… ▽ More

    Submitted 28 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 19 figures

    Journal ref: PVLDB, 17(10): 2528 - 2540, 2024

  49. arXiv:2406.12442  [pdf, other

    cs.CL cs.AI

    Abstraction-of-Thought Makes Language Models Better Reasoners

    Authors: Ruixin Hong, Hongming Zhang, Xiaoman Pan, Dong Yu, Changshui Zhang

    Abstract: Abstract reasoning, the ability to reason from the abstract essence of a problem, serves as a key to generalization in human reasoning. However, eliciting language models to perform reasoning with abstraction remains unexplored. This paper seeks to bridge this gap by introducing a novel structured reasoning format called Abstraction-of-Thought (AoT). The uniqueness of AoT lies in its explicit requ… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in Process

  50. arXiv:2406.12367  [pdf, other

    cs.CV cs.LG cs.MM

    Competitive Learning for Achieving Content-specific Filters in Video Coding for Machines

    Authors: Honglei Zhang, Jukka I. Ahonen, Nam Le, Ruiying Yang, Francesco Cricri

    Abstract: This paper investigates the efficacy of jointly optimizing content-specific post-processing filters to adapt a human oriented video/image codec into a codec suitable for machine vision tasks. By observing that artifacts produced by video/image codecs are content-dependent, we propose a novel training strategy based on competitive learning principles. This strategy assigns training samples to filte… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to be preseneted in ICIP 2024