Search | arXiv e-print repository

Robust and Scalable Model Editing for Large Language Models

Authors: Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun

Abstract: Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context. In many scenarios, a desirable behavior is that LLMs give precedence to contextual knowledge when it conflicts with the parametric knowledge, and fall back to using their parametric knowledge when the context is irrelevan… ▽ More Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context. In many scenarios, a desirable behavior is that LLMs give precedence to contextual knowledge when it conflicts with the parametric knowledge, and fall back to using their parametric knowledge when the context is irrelevant. This enables updating and correcting the model's knowledge by in-context editing instead of retraining. Previous works have shown that LLMs are inclined to ignore contextual knowledge and fail to reliably fall back to parametric knowledge when presented with irrelevant context. In this work, we discover that, with proper prompting methods, instruction-finetuned LLMs can be highly controllable by contextual knowledge and robust to irrelevant context. Utilizing this feature, we propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. To better evaluate the robustness of model editors, we collect a new dataset, that contains irrelevant questions that are more challenging than the ones in existing datasets. Empirical results show that our method outperforms current state-of-the-art methods by a large margin. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs (and vice versa). The source code can be found at https://github.com/thunlp/EREN. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: LREC-COLING 2024 paper, 16 pages, 4 figures

arXiv:2403.17336 [pdf, other]

Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

Authors: Zhiyuan Yu, Xiaogeng Liu, Shunning Liang, Zach Cameron, Chaowei Xiao, Ning Zhang

Abstract: Recent advancements in generative AI have enabled ubiquitous access to large language models (LLMs). Empowered by their exceptional capabilities to understand and generate human-like text, these models are being increasingly integrated into our society. At the same time, there are also concerns on the potential misuse of this powerful technology, prompting defensive measures from service providers… ▽ More Recent advancements in generative AI have enabled ubiquitous access to large language models (LLMs). Empowered by their exceptional capabilities to understand and generate human-like text, these models are being increasingly integrated into our society. At the same time, there are also concerns on the potential misuse of this powerful technology, prompting defensive measures from service providers. To overcome such protection, jailbreaking prompts have recently emerged as one of the most effective mechanisms to circumvent security restrictions and elicit harmful content originally designed to be prohibited. Due to the rapid development of LLMs and their ease of access via natural languages, the frontline of jailbreak prompts is largely seen in online forums and among hobbyists. To gain a better understanding of the threat landscape of semantically meaningful jailbreak prompts, we systemized existing prompts and measured their jailbreak effectiveness empirically. Further, we conducted a user study involving 92 participants with diverse backgrounds to unveil the process of manually creating jailbreak prompts. We observed that users often succeeded in jailbreak prompts generation regardless of their expertise in LLMs. Building on the insights from the user study, we also developed a system using AI as the assistant to automate the process of jailbreak prompt generation. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted by USENIX Security 2024

arXiv:2403.16586 [pdf, other]

Intrinsic Dipole Hall effect in tMoTe$_2$ moiré: magnetoelectricity and contact-free signature of topological transitions

Authors: Feng-Ren Fan, Cong Xiao, Wang Yao

Abstract: We discover an intrinsic dipole Hall effect in a variety of magnetic insulating states at integer fillings of twisted MoTe$_2$ moiré superlattice, including topologically trivial and nontrivial ferro-, antiferro-, and ferri-magnetic configurations. The dipole Hall current, in linear response to in-plane electric field, generates an in-plane orbital magnetization $M_{\parallel}$ along the field, th… ▽ More We discover an intrinsic dipole Hall effect in a variety of magnetic insulating states at integer fillings of twisted MoTe$_2$ moiré superlattice, including topologically trivial and nontrivial ferro-, antiferro-, and ferri-magnetic configurations. The dipole Hall current, in linear response to in-plane electric field, generates an in-plane orbital magnetization $M_{\parallel}$ along the field, through which an AC field can drive magnetization oscillation up to THz range. Upon the continuous topological phase transitions from trivial to quantum anomalous Hall states in both ferromagnetic and antiferromagnetic configurations, the dipole Hall current and $M_{\parallel}$ have an abrupt sign change, enabling contact free detection of the transitions through the magnetic stray field. In configurations where the linear response is forbidden by symmetry, the dipole Hall current and $M_{\parallel}$ appear as a crossed nonlinear response to both in-plane and out-of-plane electric fields. These magnetoelectric phenomena showcase novel functionalities of insulators from the interplay between magnetism, topology and electrical polarization. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 5 pages, 4 figures

arXiv:2403.12497 [pdf, other]

Formation of Polar Crown Filaments Magnetic Fields by Supergranular Helicity Injection

Authors: Huanxin Chen, Chun Xia, Hechao Chen

Abstract: To understand the magnetic fields of the polar crown filaments (PCFs) at high latitudes near polar regions of the Sun, we perform magnetofrictional numerical simulations on the long-term magnetic evolution of bipolar fields with roughly east-west polarity inversion lines (PILs) in a three-dimensional (3D) spherical wedge domain near polar regions. The Coriolis effect induced vortical motions at th… ▽ More To understand the magnetic fields of the polar crown filaments (PCFs) at high latitudes near polar regions of the Sun, we perform magnetofrictional numerical simulations on the long-term magnetic evolution of bipolar fields with roughly east-west polarity inversion lines (PILs) in a three-dimensional (3D) spherical wedge domain near polar regions. The Coriolis effect induced vortical motions at the boundaries of several supergranular cells inject magnetic helicity from the photospheric boundary into the solar atmosphere. Supergranular-scale helicity injection, transfer, and condensation produce strongly sheared magnetic fields. Magnetic reconnections at footpoints of the sheared fields produce magnetic flux ropes (MFRs) with helicity signs consistent with the observed Hemispheric Helicity Rule (HHR). The cross-sectional area of MFRs exhibits an uneven distribution, resembling a "foot-node-foot" periodic configuration. Experiments with different tilt directions of PILs indicate that the PCFs preferably form along PILs with the western end close to the polar region. The bending of PILs caused by supergranular flows, forming S-shape (Z-shape) PIL segments, promotes the formation of dextral (sinistral) MFRs. The realistic magnetic models we got can serve as starting points for the study of the plasma formation and eruption of PCFs. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 16 pages, 12 figures

arXiv:2403.10351 [pdf, other]

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

Authors: Pengcheng Jiang, Cao Xiao, Zifeng Wang, Parminder Bhatia, Jimeng Sun, Jiawei Han

Abstract: The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abili… ▽ More The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: NAACL'24

arXiv:2403.09513 [pdf, other]

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Authors: Yu Wang, Xiaogeng Liu, Yu Li, Muhao Chen, Chaowei Xiao

Abstract: With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), the imperative to ensure their safety has become increasingly pronounced. However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e.g., "harmful text") has been injected into the images t… ▽ More With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), the imperative to ensure their safety has become increasingly pronounced. However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e.g., "harmful text") has been injected into the images to mislead MLLMs. In this work, we aim to defend against such threats. Specifically, we propose \textbf{Ada}ptive \textbf{Shield} Prompting (\textbf{AdaShield}), which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks without fine-tuning MLLMs or training additional modules (e.g., post-stage content detector). Initially, we present a manually designed static defense prompt, which thoroughly examines the image and instruction content step by step and specifies response methods to malicious queries. Furthermore, we introduce an adaptive auto-refinement framework, consisting of a target MLLM and a LLM-based defense prompt generator (Defender). These components collaboratively and iteratively communicate to generate a defense prompt. Extensive experiments on the popular structure-based jailbreak attacks and benign datasets show that our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks without compromising the model's general capabilities evaluated on standard benign tasks. Our code is available at https://github.com/rain305f/AdaShield. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Multimodal Large Language Models Defense, 25 Pages

arXiv:2403.08361 [pdf, other]

Search for cosmic-ray boosted sub-MeV dark matter-electron scatterings in PandaX-4T

Authors: Xiaofeng Shang, Abdusalam Abdukerim, Zihao Bo, Wei Chen, Xun Chen, Chen Cheng, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Lisheng Geng, Karl Giboni, Xuyuan Guo, Chencheng Han, Ke Han, Changda He, **rong He, Di Huang, Junting Huang, Zhou Huang, Ruquan Hou, Yu Hou, Xiangdong Ji, Yonglin Ju, Chenxiang Li , et al. (67 additional authors not shown)

Abstract: We report the first search for the elastic scatterings between cosmic-ray boosted sub-MeV dark matter and electrons in the PandaX-4T liquid xenon experiment. Sub-MeV dark matter particles can be accelerated by scattering with electrons in the cosmic rays and produce detectable electron recoil signals in the detector. Using the commissioning data from PandaX-4T of 0.63~tonne$\cdot$year exposure, we… ▽ More We report the first search for the elastic scatterings between cosmic-ray boosted sub-MeV dark matter and electrons in the PandaX-4T liquid xenon experiment. Sub-MeV dark matter particles can be accelerated by scattering with electrons in the cosmic rays and produce detectable electron recoil signals in the detector. Using the commissioning data from PandaX-4T of 0.63~tonne$\cdot$year exposure, we set new constraints on DM-electron scattering cross sections for DM masses ranging from 10~eV/$c^2$ to 3~keV/$c^2$. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 6 pages, 3 figures

arXiv:2403.08216 [pdf, other]

PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise

Authors: Qinglong Meng, Chongkun Xia, Xueqian Wang

Abstract: Normalizing flow is a generative modeling approach with efficient sampling. However, Flow-based models suffer two issues: 1) If the target distribution is manifold, due to the unmatch between the dimensions of the latent target distribution and the data distribution, flow-based models might perform badly. 2) Discrete data might make flow-based models collapse into a degenerate mixture of point mas… ▽ More Normalizing flow is a generative modeling approach with efficient sampling. However, Flow-based models suffer two issues: 1) If the target distribution is manifold, due to the unmatch between the dimensions of the latent target distribution and the data distribution, flow-based models might perform badly. 2) Discrete data might make flow-based models collapse into a degenerate mixture of point masses. To sidestep such two issues, we propose PaddingFlow, a novel dequantization method, which improves normalizing flows with padding-dimensional noise. To implement PaddingFlow, only the dimension of normalizing flows needs to be modified. Thus, our method is easy to implement and computationally cheap. Moreover, the padding-dimensional noise is only added to the padding dimension, which means PaddingFlow can dequantize without changing data distributions. Implementing existing dequantization methods needs to change data distributions, which might degrade performance. We validate our method on the main benchmarks of unconditional density estimation, including five tabular datasets and four image datasets for Variational Autoencoder (VAE) models, and the Inverse Kinematics (IK) experiments which are conditional density estimation. The results show that PaddingFlow can perform better in all experiments in this paper, which means PaddingFlow is widely suitable for various tasks. The code is available at: https://github.com/AdamQLMeng/PaddingFlow. △ Less

Submitted 23 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07392 [pdf, other]

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

Authors: Chunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, Yifeng Shi

Abstract: Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore,… ▽ More Although Vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT backbone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. (3) We evaluate the performance of ViT-CoMer across various dense prediction tasks, different frameworks, and multiple advanced pre-training. Notably, our ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val, both of which are comparable to state-of-the-art methods. We hope ViT-CoMer can serve as a new backbone for dense prediction tasks to facilitate future research. The code will be released at https://github.com/Traffic-X/ViT-CoMer. △ Less

Submitted 27 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: CVPR2024

arXiv:2403.07344 [pdf]

Electronic Structure of Superconducting Infinite-Layer Lanthanum Nickelates

Authors: Wenjie Sun, Zhicheng Jiang, Chengliang Xia, Bo Hao, Yueying Li, Shengjun Yan, Maosen Wang, Hongquan Liu, Jianyang Ding, Jiayu Liu, Zhengtai Liu, Jishan Liu, Hanghui Chen, Dawei Shen, Yuefeng Nie

Abstract: Revealing the momentum-resolved electronic structure of infinite-layer nickelates is essential for understanding this new class of unconventional superconductors, but has been hindered by the formidable challenges in improving the sample quality. In this work, we report for the first time the angle-resolved photoemission spectroscopy of superconducting La$_{0.8}$Sr$_{0.2}$NiO$_{2}$ films prepared… ▽ More Revealing the momentum-resolved electronic structure of infinite-layer nickelates is essential for understanding this new class of unconventional superconductors, but has been hindered by the formidable challenges in improving the sample quality. In this work, we report for the first time the angle-resolved photoemission spectroscopy of superconducting La$_{0.8}$Sr$_{0.2}$NiO$_{2}$ films prepared by molecular beam epitaxy and ${\mathrm{\textit{in situ}}}$ atomic-hydrogen reduction. The measured Fermi topology closely matches theoretical calculations, showing a large Ni-$d_{x^2-y^2}$ derived Fermi sheet that evolves from hole-like to electron-like along $k_{z}$, and a three-dimensional (3D) electron pocket centered at Brillouin zone corner. The Ni-$d_{x^2-y^2}$ derived bands show a mass enhancement ($m^*/m_{\rm{DFT}}$) of 2-3,while the 3D electron band shows negligible band renormalization. Moreover, the Ni-$d_{x^2-y^2}$ derived states also display a band dispersion anomaly at higher binding energy, reminiscent of the waterfall feature and kinks observed in cuprates. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 29 pages,13 figures

arXiv:2403.06974 [pdf, other]

Memory-based Adapters for Online 3D Scene Perception

Authors: Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie Zhou, Jiwen Lu

Abstract: In this paper, we propose a new framework for online 3D scene perception. Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input, which is not applicable in robotic applications where the input data is streaming RGB-D videos rather than a complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with online 3D scene perce… ▽ More In this paper, we propose a new framework for online 3D scene perception. Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input, which is not applicable in robotic applications where the input data is streaming RGB-D videos rather than a complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with online 3D scene perception tasks where data collection and perception should be performed simultaneously, the model should be able to process 3D scenes frame by frame and make use of the temporal information. To this end, we propose an adapter-based plug-and-play module for the backbone of 3D scene perception model, which constructs memory to cache and aggregate the extracted RGB-D features to empower offline models with temporal learning ability. Specifically, we propose a queued memory mechanism to cache the supporting point cloud and image features. Then we devise aggregation modules which directly perform on the memory and pass temporal information to current frame. We further propose 3D-to-2D adapter to enhance image features with strong global context. Our adapters can be easily inserted into mainstream offline architectures of different tasks and significantly boost their performance on online tasks. Extensive experiments on ScanNet and SceneNN datasets demonstrate our approach achieves leading performance on three 3D scene perception tasks compared with state-of-the-art online methods by simply finetuning existing offline models, without any model and task-specific designs. \href{https://xuxw98.github.io/Online3D/}{Project page}. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR24. Link: https://xuxw98.github.io/Online3D/

arXiv:2403.05145 [pdf]

doi 10.1080/10619127.2023.2198913

Investigating the Proton Structure: The FAMU experiment

Authors: A. Vacchi, A. Adamczak, D. Bakalov, G. Baldazzi, M. Baruzzo, R. Benocci, R. Bertoni, M. Bonesini, H. Cabrera, S. Carsi, D. Cirrincione, F. Chignoli, M. Clemenza, L. Colace, M. Danailov, P. Danev, A. de Bari, C. De Vecchi, M. De Vincenzi, E. Fasci, K. S. Gadedjisso-Tossou, L. Gianfrani, A. D. Hillier, K. Ishida, P. J. C. King , et al. (24 additional authors not shown)

Abstract: The article gives the motivations for the measurement of the hyperfine splitting (hfs) in the ground state of muonic hydrogen to explore the properties of the proton at low momentum transfer. It summarizes these proposed measurement methods and finally describes the FAMU experiment in more detail. The article gives the motivations for the measurement of the hyperfine splitting (hfs) in the ground state of muonic hydrogen to explore the properties of the proton at low momentum transfer. It summarizes these proposed measurement methods and finally describes the FAMU experiment in more detail. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Journal ref: Nuclear Physics News 33:4, 9-16, 2023

arXiv:2403.04957 [pdf, other]

Automatic and Universal Prompt Injection Attacks against Large Language Models

Authors: Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao

Abstract: Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prompt injection attacks. These attacks manipulate LLM-integrated applications into producing responses aligned with the attacker's injected content, deviating from the user's actual requests. The substan… ▽ More Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prompt injection attacks. These attacks manipulate LLM-integrated applications into producing responses aligned with the attacker's injected content, deviating from the user's actual requests. The substantial risks posed by these attacks underscore the need for a thorough understanding of the threats. Yet, research in this area faces challenges due to the lack of a unified goal for such attacks and their reliance on manually crafted prompts, complicating comprehensive assessments of prompt injection robustness. We introduce a unified framework for understanding the objectives of prompt injection attacks and present an automated gradient-based method for generating highly effective and universal prompt injection data, even in the face of defensive measures. With only five training samples (0.3% relative to the test data), our attack can achieve superior performance compared with baselines. Our findings emphasize the importance of gradient-based testing, which can avoid overestimation of robustness, especially for defense mechanisms. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Pre-print, code is available at https://github.com/SheltonLiu-N/Universal-Prompt-Injection

arXiv:2403.04213 [pdf, ps, other]

Representations of non-finitely graded Lie algebras related to Virasoro algebra

Authors: Chunguang Xia, Tianyu Ma, Xiao Dong, Ming**g Zhang

Abstract: In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and is… ▽ More In this paper, we study representations of non-finitely graded Lie algebras $\mathcal{W}(ε)$ related to Virasoro algebra, where $ε= \pm 1$. Precisely speaking, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{W}(ε)$,and find that these module structures are rather different from those of other graded Lie algebras. We also determine the simplicity and isomorphism classes of these modules. △ Less

Submitted 3 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2403.04192 [pdf, other]

doi 10.1103/PhysRevLett.132.106601

Orbital Magneto-Nonlinear Anomalous Hall Effect in Kagome Magnet Fe$_3$Sn$_2$

Authors: Lujunyu Wang, Jiaojiao Zhu, Haiyun Chen, Hui Wang, **** Liu, Yue-Xin Huang, Bingyan Jiang, Jiaji Zhao, Hengjie Shi, Guang Tian, Haoyu Wang, Yugui Yao, Dapeng Yu, Zhiwei Wang, Cong Xiao, Shengyuan A. Yang, Xiaosong Wu

Abstract: It has been theoretically predicted that perturbation of the Berry curvature by electromagnetic fields gives rise to intrinsic nonlinear anomalous Hall effects that are independent of scattering. Two types of nonlinear anomalous Hall effects are expected. The electric nonlinear Hall effect has recently begun to receive attention, while very few studies are concerned with the magneto-nonlinear Hall… ▽ More It has been theoretically predicted that perturbation of the Berry curvature by electromagnetic fields gives rise to intrinsic nonlinear anomalous Hall effects that are independent of scattering. Two types of nonlinear anomalous Hall effects are expected. The electric nonlinear Hall effect has recently begun to receive attention, while very few studies are concerned with the magneto-nonlinear Hall effect. Here, we combine experiment and first-principles calculations to show that the kagome ferromagnet Fe$_3$Sn$_2$ displays such a magneto-nonlinear Hall effect. By systematic field angular and temperature-dependent transport measurements, we unambiguously identify a large anomalous Hall current that is linear in both applied in-plane electric and magnetic fields, utilizing a unique in-plane configuration. We clarify its dominant orbital origin and connect it to the magneto-nonlinear Hall effect. The effect is governed by the intrinsic quantum geometric properties of Bloch electrons. Our results demonstrate the significance of the quantum geometry of electron wave functions from the orbital degree of freedom and open up a new direction in Hall transport effects. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 18 pages, 4 figures, featured in Physics: Viewpoint and Editors' suggestions

Journal ref: Phys. Rev. Lett. 132, 106601(2024)

arXiv:2403.01778 [pdf, other]

HOSCF: Efficient decoupling algorithms for finding the best rank-one approximation of higher-order tensors

Authors: Chuanfu Xiao, Zeyu Li, Chao Yang

Abstract: Best rank-one approximation is one of the most fundamental tasks in tensor computation. In order to fully exploit modern multi-core parallel computers, it is necessary to develop decoupling algorithms for computing the best rank-one approximation of higher-order tensors at large scales. In this paper, we first build a bridge between the rank-one approximation of tensors and the eigenvector-depende… ▽ More Best rank-one approximation is one of the most fundamental tasks in tensor computation. In order to fully exploit modern multi-core parallel computers, it is necessary to develop decoupling algorithms for computing the best rank-one approximation of higher-order tensors at large scales. In this paper, we first build a bridge between the rank-one approximation of tensors and the eigenvector-dependent nonlinear eigenvalue problem (NEPv), and then develop an efficient decoupling algorithm, namely the higher-order self-consistent field (HOSCF) algorithm, inspired by the famous self-consistent field (SCF) iteration frequently used in computational chemistry. The convergence theory of the HOSCF algorithm and an estimation of the convergence speed are further presented. In addition, we propose an improved HOSCF (iHOSCF) algorithm that incorporates the Rayleigh quotient iteration, which can significantly accelerate the convergence of HOSCF. Numerical experiments show that the proposed algorithms can efficiently converge to the best rank-one approximation of both synthetic and real-world tensors and can scale with high parallel scalability on a modern parallel computer. △ Less

Submitted 4 March, 2024; originally announced March 2024.

MSC Class: 15A18; 15A69; 15A72; 65F15; 68W10

arXiv:2402.18667 [pdf, other]

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

Authors: Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu, Wenpeng Yin, Caiming Xiong

Abstract: This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and in… ▽ More This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and instructions, developed through an AI-Human collaborative method. Our evaluation across both open-source (e.g., Llama 2, WizardLM) and closed-source (e.g., GPT-4, PALM2, Gemini) LLMs highlights three key findings: open-source models significantly lag behind closed-source ones in format adherence; LLMs' format-following performance is independent of their content generation quality; and LLMs' format proficiency varies across different domains. These insights suggest the need for specialized tuning for format-following skills and highlight FoFo's role in guiding the selection of domain-specific AI agents. FoFo is released here at https://github.com/SalesforceAIResearch/FoFo. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: The first two authors contributed equally

arXiv:2402.18649 [pdf, other]

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems

Authors: Fangzhou Wu, Ning Zhang, Somesh Jha, Patrick McDaniel, Chaowei Xiao

Abstract: Large Language Model (LLM) systems are inherently compositional, with individual LLM serving as the core foundation with additional layers of objects such as plugins, sandbox, and so on. Along with the great potential, there are also increasing concerns over the security of such probabilistic intelligent systems. However, existing studies on LLM security often focus on individual LLM, but without… ▽ More Large Language Model (LLM) systems are inherently compositional, with individual LLM serving as the core foundation with additional layers of objects such as plugins, sandbox, and so on. Along with the great potential, there are also increasing concerns over the security of such probabilistic intelligent systems. However, existing studies on LLM security often focus on individual LLM, but without examining the ecosystem through the lens of LLM systems with other objects (e.g., Frontend, Webtool, Sandbox, and so on). In this paper, we systematically analyze the security of LLM systems, instead of focusing on the individual LLMs. To do so, we build on top of the information flow and formulate the security of LLM systems as constraints on the alignment of the information flow within LLM and between LLM and other objects. Based on this construction and the unique probabilistic nature of LLM, the attack surface of the LLM system can be decomposed into three key components: (1) multi-layer security analysis, (2) analysis of the existence of constraints, and (3) analysis of the robustness of these constraints. To ground this new attack surface, we propose a multi-layer and multi-step approach and apply it to the state-of-art LLM system, OpenAI GPT4. Our investigation exposes several security issues, not just within the LLM model itself but also in its integration with other components. We found that although the OpenAI GPT4 has designed numerous safety constraints to improve its safety features, these safety constraints are still vulnerable to attackers. To further demonstrate the real-world threats of our discovered vulnerabilities, we construct an end-to-end attack where an adversary can illicitly acquire the user's chat history, all without the need to manipulate the user's input or gain direct access to OpenAI GPT4. Our demo is in the link: https://fzwark.github.io/LLM-System-Attack-Demo/ △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17980 [pdf, other]

Emergence of Large-Scale Structures in Holographic Superfluid Turbulence

Authors: Wei-Can Yang, Chuan-Yin Xia, Yu Tian, Makoto Tsubota, Hua-Bi Zeng

Abstract: In two-dimensional turbulence systems, the emergence of large-scale structures holds profound physical implications, particularly as it indicates the occurrence of inverse energy cascades, thereby garnering significant attention. In this paper, we report a novel vortex clusters formation in the background of near-extreme Reissner-Nordstr$\ddot{o}$m black hole holographic model. At temperatures nea… ▽ More In two-dimensional turbulence systems, the emergence of large-scale structures holds profound physical implications, particularly as it indicates the occurrence of inverse energy cascades, thereby garnering significant attention. In this paper, we report a novel vortex clusters formation in the background of near-extreme Reissner-Nordstr$\ddot{o}$m black hole holographic model. At temperatures nearing absolute zero, we observe not only the formation of vortex clusters but also the emergence of an inverse energy cascade. Distinct from typical quantum systems, the genesis of holographic vortex clusters is rooted in unique quantum dissipation properties, characterized by the near immobilization of vortex dipoles at low temperatures. Through a comparative analysis with the dynamics of the Gross-Pitaevskii equation, our investigation enhances the understanding of inverse energy cascades under these extreme conditions, thereby broadening our comprehension of quantum turbulence. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 8 pages, 6 figures

arXiv:2402.17624 [pdf, other]

CustomSketching: Sketch Concept Extraction for Sketch-based Image Synthesis and Editing

Authors: Chufeng Xiao, Hongbo Fu

Abstract: Personalization techniques for large text-to-image (T2I) models allow users to incorporate new concepts from reference images. However, existing methods primarily rely on textual descriptions, leading to limited control over customized images and failing to support fine-grained and local editing (e.g., shape, pose, and details). In this paper, we identify sketches as an intuitive and versatile rep… ▽ More Personalization techniques for large text-to-image (T2I) models allow users to incorporate new concepts from reference images. However, existing methods primarily rely on textual descriptions, leading to limited control over customized images and failing to support fine-grained and local editing (e.g., shape, pose, and details). In this paper, we identify sketches as an intuitive and versatile representation that can facilitate such control, e.g., contour lines capturing shape information and flow lines representing texture. This motivates us to explore a novel task of sketch concept extraction: given one or more sketch-image pairs, we aim to extract a special sketch concept that bridges the correspondence between the images and sketches, thus enabling sketch-based image synthesis and editing at a fine-grained level. To accomplish this, we introduce CustomSketching, a two-stage framework for extracting novel sketch concepts. Considering that an object can often be depicted by a contour for general shapes and additional strokes for internal details, we introduce a dual-sketch representation to reduce the inherent ambiguity in sketch depiction. We employ a shape loss and a regularization loss to balance fidelity and editability during optimization. Through extensive experiments, a user study, and several applications, we show our method is effective and superior to the adapted baselines. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17584 [pdf, other]

Triangle singularity in the $J/ψ\to φπ^+ a_0^-(π^- η),\; φπ^- a_0^+(π^+ η)$ decays

Authors: C. W. Xiao, J. M. Dias, L. R. Dai, W. H. Liang, E. Oset

Abstract: We study the $J/ψ\to φπ^+ a_0(980)^- (a_0^- \to π^- η)$ decay, evaluating the double mass distribution in terms of the $π^- η$ and $π^+ a^-_0$ invariant masses. We show that the $π^- η$ mass distribution exhibits the typical cusp structure of the $a_0(980)$ seen in recent high statistics experiments, and the $π^+ a^-_0$ spectrum shows clearly a peak around… ▽ More We study the $J/ψ\to φπ^+ a_0(980)^- (a_0^- \to π^- η)$ decay, evaluating the double mass distribution in terms of the $π^- η$ and $π^+ a^-_0$ invariant masses. We show that the $π^- η$ mass distribution exhibits the typical cusp structure of the $a_0(980)$ seen in recent high statistics experiments, and the $π^+ a^-_0$ spectrum shows clearly a peak around $M_{\rm inv}(π^+ a^-_0)=1420 \,{\rm MeV}$, corresponding to a triangle singularity. When integrating over the two invariant masses we find a branching ratio for this decay of the order of $10^{-5}$, which is easily accessible in present laboratories. We also call attention to the fact that the signal obtained is compatible with a bump experimentally observed in the $ηπ^+π^-$ mass distribution in the $J/ψ\to φηπ^+π^-$ decay and encourage further analysis to extract from there the $φπ^+ a_0^-$ and $φπ^- a_0^+$ decay modes. △ Less

Submitted 24 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: 8 pages, 3 figures; V2: discussion added, references added, version to appear in Phys. Rev. D

arXiv:2402.17166 [pdf, other]

Layer Coherence Origin of Intrinsic Planar Hall Effect in 2D Limit

Authors: Huiyuan Zheng, Dawei Zhai, Cong Xiao, Wang Yao

Abstract: The intrinsic planar Hall effect has attracted intensive interest inspired by recent experiments. Existing theories of this effect require three dimensional orbital motion, or strong spin-orbit coupling of certain forms, which do not exist in van der Waals thin films. Here, we uncover a new origin of the planar Hall effect - as an intrinsic property of layer coherent electrons - that allows its pr… ▽ More The intrinsic planar Hall effect has attracted intensive interest inspired by recent experiments. Existing theories of this effect require three dimensional orbital motion, or strong spin-orbit coupling of certain forms, which do not exist in van der Waals thin films. Here, we uncover a new origin of the planar Hall effect - as an intrinsic property of layer coherent electrons - that allows its presence even in bilayer and trilayer atomically thin limit. As examples, we show that the effect can be triggered by strain and interlayer sliding respectively in twisted bilayer graphene and trilayer transition metal dichalcogenides, where the effect features rich tunability and even stronger magnitude than those induced by topological nodal structures in bulk materials. The layer mechanism also provides a new route towards quantized Hall response upon a topological phase transition induced by in-plane magnetic field. These results unveil the unexplored potential of quantum layertronics and moiré flat band for planar Hall transport. △ Less

Submitted 12 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 6 pages, 5 figures

arXiv:2402.16965 [pdf, other]

WIPI: A New Web Threat for LLM-Driven Web Agents

Authors: Fangzhou Wu, Shutong Wu, Yulong Cao, Chaowei Xiao

Abstract: With the fast development of large language models (LLMs), LLM-driven Web Agents (Web Agents for short) have obtained tons of attention due to their superior capability where LLMs serve as the core part of making decisions like the human brain equipped with multiple web tools to actively interact with external deployed websites. As uncountable Web Agents have been released and such LLM systems are… ▽ More With the fast development of large language models (LLMs), LLM-driven Web Agents (Web Agents for short) have obtained tons of attention due to their superior capability where LLMs serve as the core part of making decisions like the human brain equipped with multiple web tools to actively interact with external deployed websites. As uncountable Web Agents have been released and such LLM systems are experiencing rapid development and drawing closer to widespread deployment in our daily lives, an essential and pressing question arises: "Are these Web Agents secure?". In this paper, we introduce a novel threat, WIPI, that indirectly controls Web Agent to execute malicious instructions embedded in publicly accessible webpages. To launch a successful WIPI works in a black-box environment. This methodology focuses on the form and content of indirect instructions within external webpages, enhancing the efficiency and stealthiness of the attack. To evaluate the effectiveness of the proposed methodology, we conducted extensive experiments using 7 plugin-based ChatGPT Web Agents, 8 Web GPTs, and 3 different open-source Web Agents. The results reveal that our methodology achieves an average attack success rate (ASR) exceeding 90% even in pure black-box scenarios. Moreover, through an ablation study examining various user prefix instructions, we demonstrated that the WIPI exhibits strong robustness, maintaining high performance across diverse prefix instructions. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.16679 [pdf, other]

Unveiling the Initiation Route of Coronal Mass Ejections through their Slow Rise Phase

Authors: Chen Xing, Guillaume Aulanier, Xin Cheng, Chun Xia, Mingde Ding

Abstract: Understanding the early evolution of coronal mass ejections (CMEs), in particular their initiation, is the key to forecasting solar eruptions and induced disastrous space weather. Although many initiation mechanisms have been proposed, a full understanding of CME initiation, which is identified as a slow rise of CME progenitors in kinematics before the impulsive acceleration, remains elusive. Here… ▽ More Understanding the early evolution of coronal mass ejections (CMEs), in particular their initiation, is the key to forecasting solar eruptions and induced disastrous space weather. Although many initiation mechanisms have been proposed, a full understanding of CME initiation, which is identified as a slow rise of CME progenitors in kinematics before the impulsive acceleration, remains elusive. Here, with a state-of-the-art thermal-magnetohydrodynamics simulation, we determine a complete CME initiation route in which multiple mainstream mechanisms occur in sequence yet are tightly coupled. The slow rise is first triggered and driven by the develo** hyperbolic flux tube (HFT) reconnection. Subsequently, the slow rise continues as driven by the coupling of the HFT reconnection and the early development of torus instability. The end of the slow rise, i.e., the onset of the impulsive acceleration, is induced by the start of the fast magnetic reconnection coupled with the torus instability. These results unveil that the CME initiation is a complicated process involving multiple physical mechanisms, thus being hardly resolved by a single initiation mechanism. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 35 pages, 15 figures, accepted for publication in ApJ

arXiv:2402.14968 [pdf, other]

Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

Authors: Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao

Abstract: Despite the general capabilities of Large Language Models (LLM), these models still request fine-tuning or adaptation with customized data when meeting specific business demands. However, this process inevitably introduces new threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been s… ▽ More Despite the general capabilities of Large Language Models (LLM), these models still request fine-tuning or adaptation with customized data when meeting specific business demands. However, this process inevitably introduces new threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been significantly compromised by fine-tuning users' uploaded examples contain just a few harmful examples. Though potential defenses have been proposed that the service providers can integrate safety examples into the fine-tuning dataset to reduce safety issues, such approaches require incorporating a substantial amount of data, making it inefficient. To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger". By integrating prefixed safety examples into the fine-tuning dataset, the subsequent fine-tuning process effectively acts as the "backdoor attack", establishing a strong correlation between the secret prompt and safety generations. Consequently, safe responses are ensured once service providers prepend this secret prompt ahead of any user input during inference. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models without harming the benign performance. Furthermore, we also present the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data. △ Less

Submitted 20 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14744 [pdf, other]

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation

Authors: Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Noboru Koshizuka, Chuan Xiao

Abstract: This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks. Our approach addresses three research questions: aligning LLMs with real-world urban mobility… ▽ More This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks. Our approach addresses three research questions: aligning LLMs with real-world urban mobility data, develo** reliable activity generation strategies, and exploring LLM applications in urban mobility. The key technical contribution is a novel LLM agent framework that accounts for individual activity patterns and motivations, including a self-consistency approach to align LLMs with real-world activity data and a retrieval-augmented strategy for interpretable activity generation. We evaluate our LLM agent framework and compare it with state-of-the-art personal mobility generation approaches, demonstrating the effectiveness of our approach and its potential applications in urban mobility. Overall, this study marks the pioneering work of designing an LLM agent framework for activity generation based on real-world human activity data, offering a promising tool for urban mobility analysis. △ Less

Submitted 23 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Source codes are available at https://github.com/Wangjw6/LLMob/

arXiv:2402.14167 [pdf, other]

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Authors: Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling tra… ▽ More Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. Our key insight is that different diffusion models learn similar encodings under the same training data distribution and smaller models are capable of generating good global structures in the early steps. Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. Code is released at https://github.com/NVlabs/T-Stitch △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13720 [pdf, other]

Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding

Authors: Weilin Zhao, Yuxiang Huang, Xu Han, Wang Xu, Chaojun Xiao, Xinrong Zhang, Yewei Fang, Kaihuo Zhang, Zhiyuan Liu, Maosong Sun

Abstract: Speculative decoding is a widely used method that accelerates the generation process of large language models (LLMs) with no compromise in model performance. It achieves this goal by using an existing smaller model for drafting and then employing the target LLM to verify the draft in a low-cost parallel manner. Under such a drafting-verification framework, drafting efficiency has become a bottlene… ▽ More Speculative decoding is a widely used method that accelerates the generation process of large language models (LLMs) with no compromise in model performance. It achieves this goal by using an existing smaller model for drafting and then employing the target LLM to verify the draft in a low-cost parallel manner. Under such a drafting-verification framework, drafting efficiency has become a bottleneck in the final speedup of speculative decoding. Therefore, generating longer drafts at less cost can lead to better decoding speedup. To achieve this, we introduce Ouroboros, which can generate draft phrases to parallelize the drafting process and meanwhile lengthen drafts in a training-free manner. The experimental results on various typical text generation tasks show that Ouroboros can achieve speedups of up to $2.4\times$ over speculative decoding and $3.9\times$ over vanilla decoding, without fine-tuning draft and target models. △ Less

Submitted 26 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12327 [pdf, other]

Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents

Authors: Zengqing Wu, Run Peng, Shuyuan Zheng, Qianying Liu, Xu Han, Brian Inhyuk Kwon, Makoto Onizuka, Shaojie Tang, Chuan Xiao

Abstract: Large Language Models (LLMs) have increasingly been utilized in social simulations, where they are often guided by carefully crafted instructions to stably exhibit human-like behaviors during simulations. Nevertheless, we doubt the necessity of sha** agents' behaviors for accurate social simulations. Instead, this paper emphasizes the importance of spontaneous phenomena, wherein agents deeply en… ▽ More Large Language Models (LLMs) have increasingly been utilized in social simulations, where they are often guided by carefully crafted instructions to stably exhibit human-like behaviors during simulations. Nevertheless, we doubt the necessity of sha** agents' behaviors for accurate social simulations. Instead, this paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions. We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation, findings that align closely with human behavioral data. This approach not only aids the computational social science community in bridging the gap between simulations and real-world dynamics but also offers the AI community a novel method to assess LLMs' capability of deliberate reasoning. △ Less

Submitted 2 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Source codes available at https://github.com/wuzengqing001225/SABM_ShallWeTeamUp

arXiv:2402.11354 [pdf, other]

Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

Authors: Ke**g Lu, Chuan Xiao, Yoshiharu Ishikawa

Abstract: Approximate nearest neighbor search (ANNS) in high-dimensional spaces is a pivotal challenge in the field of machine learning. In recent years, graph-based methods have emerged as the superior approach to ANNS, establishing a new state of the art. Although various optimizations for graph-based ANNS have been introduced, they predominantly rely on heuristic methods that lack formal theoretical back… ▽ More Approximate nearest neighbor search (ANNS) in high-dimensional spaces is a pivotal challenge in the field of machine learning. In recent years, graph-based methods have emerged as the superior approach to ANNS, establishing a new state of the art. Although various optimizations for graph-based ANNS have been introduced, they predominantly rely on heuristic methods that lack formal theoretical backing. This paper aims to enhance routing within graph-based ANNS by introducing a method that offers a probabilistic guarantee when exploring a node's neighbors in the graph. We formulate the problem as probabilistic routing and develop two baseline strategies by incorporating locality-sensitive techniques. Subsequently, we introduce PEOs, a novel approach that efficiently identifies which neighbors in the graph should be considered for exact distance computation, thus significantly improving efficiency in practice. Our experiments demonstrate that equip** PEOs can increase throughput on a commonly utilized graph index (HNSW) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: Source code will be released at GitHub soon

arXiv:2402.10196 [pdf, other]

A Trembling House of Cards? Map** Adversarial Attacks against Language Agents

Authors: Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun

Abstract: Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment,… ▽ More Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment, etc. Many believe an unprecedentedly powerful automation technology is emerging. However, new automation technologies come with new safety risks, especially for intricate systems like language agents. There is a surprisingly large gap between the speed and scale of their development and deployment and our understanding of their safety risks. Are we building a house of cards? In this position paper, we present the first systematic effort in map** adversarial attacks against language agents. We first present a unified conceptual framework for agents with three major components: Perception, Brain, and Action. Under this framework, we present a comprehensive discussion and propose 12 potential attack scenarios against different components of an agent, covering different attack strategies (e.g., input manipulation, adversarial demonstrations, jailbreaking, backdoors). We also draw connections to successful attack strategies previously applied to LLMs. We emphasize the urgency to gain a thorough understanding of language agent risks before their widespread deployment. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08183 [pdf, other]

Pixel Sentence Representation Learning

Authors: Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li, Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed

Abstract: Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs… ▽ More Pretrained language models are long known to be subpar in capturing sentence and document-level semantics. Though heavily investigated, transferring perturbation-based methods from unsupervised visual representation learning to NLP remains an unsolved problem. This is largely due to the discreteness of subword units brought by tokenization of language models, limiting small perturbations of inputs to form semantics-preserved positive pairs. In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. Drawing from cognitive and linguistic sciences, we introduce an unsupervised visual sentence representation learning framework, employing visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to texts to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision, achieving comparable performance in semantic textual similarity (STS) to existing state-of-the-art NLP methods. Additionally, we unveil our method's inherent zero-shot cross-lingual transferability and a unique leapfrogging pattern across languages during iterative training. To our knowledge, this is the first representation learning method devoid of traditional language models for understanding sentence and document semantics, marking a stride closer to human-like textual comprehension. Our code is available at https://github.com/gowitheflow-1998/Pixel-Linguist △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07756 [pdf, other]

Extrinsic Contribution to Nonlinear Current Induced Spin Polarization

Authors: Ruda Guo, Yue-Xin Huang, Xiaoxin Yang, Yi Liu, Cong Xiao, Zhe Yuan

Abstract: Nonlinear spin polarization occurring in the second order of driving electric current is the dominant source of nonequilibrium magnetization in centrosymmetric or weakly noncentrosymmetric nonmagnetic materials, and induces nonlinear spin-orbit torque in magnets. Up to now, only the intrinsic mechanism based on anomalous spin polarizability dipole, which is the spin counterpart of Berry curvature… ▽ More Nonlinear spin polarization occurring in the second order of driving electric current is the dominant source of nonequilibrium magnetization in centrosymmetric or weakly noncentrosymmetric nonmagnetic materials, and induces nonlinear spin-orbit torque in magnets. Up to now, only the intrinsic mechanism based on anomalous spin polarizability dipole, which is the spin counterpart of Berry curvature dipole, has been studied, while disorder induced mechanisms are still missing. Here, we derive these contributions, which include not only the anomalous distribution function due to skew scattering and coordinate shift, but also interband coherence effects given by disorder induced spin shift and electric field induced anomalous scattering amplitude. We demonstrate these terms and show their importance in a minimal model. A scaling law for nonlinear current-induced spin polarization is constructed, which may help analyze experimental data in the future. △ Less

Submitted 7 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07587 [pdf, other]

Magnetized strangelets with anomalous magnetic moment and Coulomb interactions

Authors: Huai-Min Chen, Xiao-Wei Li, Cheng-Jun Xia, **g-Tao Wang, Guang-Xiong Peng

Abstract: We study the magnetized strangelets in the baryon density-dependent quark mass model, including the effects of both confinement and lead-order perturbation interactions. The properties of magnetized strangelets are investigated under the the field strength 2*10^17 G, where the anisotropy caused by the strong magnetic field is insignificant can be treated approximately as an isotropic system. The c… ▽ More We study the magnetized strangelets in the baryon density-dependent quark mass model, including the effects of both confinement and lead-order perturbation interactions. The properties of magnetized strangelets are investigated under the the field strength 2*10^17 G, where the anisotropy caused by the strong magnetic field is insignificant can be treated approximately as an isotropic system. The consideration of anomalous magnetic moments in the energy spectrum naturally solves the difficulty of infrared divergence encountered in integrating the density of states. The Coulomb interaction is accounted for a self-consistent treatment. The energy per baryon, mechanically stable radius, strangeness and electric charge of magnetized strangelets are presented, where their dependence on the field strength and parameter of confinement and perturbation are investigated. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.04617 [pdf, other]

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Authors: Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun

Abstract: Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introdu… ▽ More Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process long sequences with a limited context window and well capture long-distance dependencies. Without any training, InfLLM enables LLMs that are pre-trained on sequences consisting of a few thousand tokens to achieve comparable performance with competitive baselines that continually train these LLMs on long sequences. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies. Our code can be found in \url{https://github.com/thunlp/InfLLM}. △ Less

Submitted 28 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03804 [pdf, other]

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

Authors: Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun

Abstract: Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skip** the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of sparse LLMs beyond zero activation values. We introduce a general method that defines neuron acti… ▽ More Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skip** the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of sparse LLMs beyond zero activation values. We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold, demonstrating that non-ReLU LLMs also exhibit sparse activation. To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity. We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$^2$. The results indicate that models employing ReLU$^2$ excel across all three evaluation aspects, highlighting its potential as an efficient activation function for sparse LLMs. We will release the code to facilitate future research. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02539 [pdf, other]

$a_0(1710)$-$f_0(1710)$ mixing effect in the $D_{s}^{+} \rightarrow K_S^{0} K_S^{0} π^{+}$ decay

Authors: Yu-Wen Peng, Wei Liang, Xiaonu Xiong, Chu-Wen Xiao

Abstract: With the measurements of the decay $D^+_s \rightarrow K^0_S K^0_S π^+$ by the BESIII Collaboration, we investigate this three-body weak decay via the chiral unitary approach for the final state interaction, where the resonances $S(980)$ and $S(1710)$ are dynamically reproduced with the interaction of eleven coupled channels, and the $W$-external and -internal emission mechanisms are considered at… ▽ More With the measurements of the decay $D^+_s \rightarrow K^0_S K^0_S π^+$ by the BESIII Collaboration, we investigate this three-body weak decay via the chiral unitary approach for the final state interaction, where the resonances $S(980)$ and $S(1710)$ are dynamically reproduced with the interaction of eleven coupled channels, and the $W$-external and -internal emission mechanisms are considered at the quark level. Besides, we also take into account the contribution from the $P$-wave resonance $K^*(892)^+$ and make a combined fit of the $K^0_S K^0_S$ and $K^0_S π^+$ invariant mass spectra measured by the BESIII Collaboration. The fitted results show that the enhancement around 1.7 GeV in $K^0_S K^0_S$ mass spectrum is overlapped with two visible peaks, indicating the mixing signal originated from the resonances $a_0(1710)$ and $f_0(1710)$ due to their different poles (masses). Thus, the decay $D^+_s \rightarrow K^0_S K^0_S π^+$ is helpful to reveal their molecular nature with the mixing signal, which can be more precisely measured in the future. △ Less

Submitted 8 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: 17 pages, 7 figures, 2 tables

arXiv:2402.01920 [pdf, other]

Preference Poisoning Attacks on Reward Model Learning

Authors: Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik

Abstract: Learning utility, or reward, models from pairwise comparisons is a fundamental component in a number of application domains. These approaches inherently entail collecting preference information from people, with feedback often provided anonymously. Since preferences are subjective, there is no gold standard to compare against; yet, reliance of high-impact systems on preference learning creates a s… ▽ More Learning utility, or reward, models from pairwise comparisons is a fundamental component in a number of application domains. These approaches inherently entail collecting preference information from people, with feedback often provided anonymously. Since preferences are subjective, there is no gold standard to compare against; yet, reliance of high-impact systems on preference learning creates a strong motivation for malicious actors to skew data collected in this fashion to their ends. We investigate the nature and extent of this vulnerability systematically by considering a threat model in which an attacker can flip a small subset of preference comparisons with the goal of either promoting or demoting a target outcome. First, we propose two classes of algorithmic approaches for these attacks: a principled gradient-based framework, and several variants of rank-by-distance methods. Next, we demonstrate the efficacy of best attacks in both these classes in successfully achieving malicious goals on datasets from three diverse domains: autonomous control, recommendation system, and textual prompt-response preference learning. We find that the best attacks are often highly successful, achieving in the most extreme case 100% success rate with only 0.3% of the data poisoned. However, which attack is best can vary significantly across domains, demonstrating the value of our comprehensive vulnerability analysis that involves several classes of attack algorithms. In addition, we observe that the simpler and more scalable rank-by-distance approaches are often competitive with the best, and on occasion significantly outperform gradient-based methods. Finally, we show that several state-of-the-art defenses against other classes of poisoning attacks exhibit, at best, limited efficacy in our setting. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01077 [pdf, ps, other]

Recent Advances in Predictive Modeling with Electronic Health Records

Authors: Jiaqi Wang, Junyu Luo, Muchao Ye, Xiaochen Wang, Yuan Zhong, Aofei Chang, Guanjie Huang, Ziyi Yin, Cao Xiao, Jimeng Sun, Fenglong Ma

Abstract: The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This su… ▽ More The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This survey systematically reviews recent advances in deep learning-based predictive models using EHR data. Specifically, we begin by introducing the background of EHR data and providing a mathematical definition of the predictive modeling task. We then categorize and summarize predictive deep models from multiple perspectives. Furthermore, we present benchmarks and toolkits relevant to predictive modeling in healthcare. Finally, we conclude this survey by discussing open challenges and suggesting promising directions for future research. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00532 [pdf, other]

Quantum Metric Nonlinear Spin-Orbit Torque Enhanced by Topological Bands

Authors: Xukun Feng, Weikang Wu, Hui Wang, Weibo Gao, Lay Kee Ang, Y. X. Zhao, Cong Xiao, Shengyuan A. Yang

Abstract: Effects manifesting quantum geometry have been a focus of physics research. Here, we reveal that quantum metric plays a crucial role in nonlinear electric spin response, leading to a quantum metric spin-orbit torque. We argue that enhanced quantum metric can occur at band (anti)crossings, so the nonlinear torque could be amplified in topological metals with nodal features close to Fermi level. By… ▽ More Effects manifesting quantum geometry have been a focus of physics research. Here, we reveal that quantum metric plays a crucial role in nonlinear electric spin response, leading to a quantum metric spin-orbit torque. We argue that enhanced quantum metric can occur at band (anti)crossings, so the nonlinear torque could be amplified in topological metals with nodal features close to Fermi level. By applying our theory to magnetic Kane-Mele model and monolayer CrSBr, which feature nodal lines and Weyl points, we demonstrate that the quantum metric torque dominates the response, and its magnitude is significantly enhanced by topological band structures, which even surpasses the previously reported linear torques and is sufficient to drive magnetic switching by itself. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.17634 [pdf, ps, other]

The global well-posedness and Newtonian limit for the relativistic Boltzmann equation in a periodic box

Authors: Chuqi Cao, **g Ouyang, Yong Wang, Changguo Xiao

Abstract: In this paper, we study the Newtonian limit for relativistic Boltzmann equation in a periodic box $\mathbb{T}^3$. We first establish the global-in-time mild solutions of relativistic Boltzmann equation with uniform-in-$\mathfrak{c}$ estimates and time decay rate. Then we rigorously justify the global-in-time Newtonian limits from the relativistic Boltzmann solutions to the solution of Newtonian Bo… ▽ More In this paper, we study the Newtonian limit for relativistic Boltzmann equation in a periodic box $\mathbb{T}^3$. We first establish the global-in-time mild solutions of relativistic Boltzmann equation with uniform-in-$\mathfrak{c}$ estimates and time decay rate. Then we rigorously justify the global-in-time Newtonian limits from the relativistic Boltzmann solutions to the solution of Newtonian Boltzmann equation in $L^1_pL^{\infty}_x$. Moreover, if the initial data of Newtonian Boltzmann equation belong to $W^{1,\infty}(\mathbb{T}^3\times\mathbb{R}^3)$, based on a decomposition and $L^2-L^\infty$ argument, the global-in-time Newtonian limit is proved in $L^{\infty}_{x,p}$. The convergence rates of Newtonian limit are obtained both in $L^1_pL^{\infty}_x$ and $L^{\infty}_{x,p}$. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 56 pages, All comments are welcome

arXiv:2401.15770 [pdf, other]

PILOT: Legal Case Outcome Prediction with Case Law

Authors: Lang Cao, Zifeng Wang, Cao Xiao, Jimeng Sun

Abstract: Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary… ▽ More Machine learning shows promise in predicting the outcome of legal cases, but most research has concentrated on civil law cases rather than case law systems. We identified two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts. In this paper, we proposed a new framework named PILOT (PredictIng Legal case OuTcome) for case outcome prediction. It comprises two modules for relevant case retrieval and temporal pattern handling, respectively. To benchmark the performance of existing legal case outcome prediction models, we curated a dataset from a large-scale case law database. We demonstrate the importance of accurately identifying precedent cases and mitigating the temporal shift when making predictions for case law, as our method shows a significant improvement over the prior methods that focus on civil law case outcome predictions. △ Less

Submitted 12 April, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.14302 [pdf, ps, other]

Correlation function and the inverse problem in the $BD$ interaction

Authors: Hai-Peng Li, **g-Yu Yi, Chu-Wen Xiao, De-Liang Yao, Wei-Hong Liang, Eulogio Oset

Abstract: We study the correlation functions of the $B^0 D^+, B^+ D^0$ system, which develops a bound state of approximately $40$ MeV, using inputs consistent with the $T_{cc}(3875)$ state. Then we address the inverse problem starting from these correlation functions to determine the scattering observables related to the system, including the existence of the bound state and its molecular nature. The import… ▽ More We study the correlation functions of the $B^0 D^+, B^+ D^0$ system, which develops a bound state of approximately $40$ MeV, using inputs consistent with the $T_{cc}(3875)$ state. Then we address the inverse problem starting from these correlation functions to determine the scattering observables related to the system, including the existence of the bound state and its molecular nature. The important output of the approach is the uncertainty with which these observables can be obtained, considering errors in the $B^0 D^+, B^+ D^0$ correlation functions typical of current values in present correlation functions. We find that it is possible to obtain scattering lengths and effective ranges with relative high precision and the existence of a bound state. Although the pole position is obtained with errors of the order of $50 \%$ of the binding energy, the molecular probability of the state is obtained with a very small error of the order of $6\%$. All these findings can serve as motivation to perform such measurements in future runs of high energy hadron collisions. △ Less

Submitted 28 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 16 pages, 3 figures, 7 tables; V2: version to be published in Chinese Physics C

arXiv:2401.13958 [pdf]

doi 10.1063/5.0183701

Unveiling a Novel Metal-to-Metal Transition in LuH2: Critically Challenging Superconductivity Claims in Lutetium Hydrides

Authors: Dong Wang, Ningning Wang, Caoshun Zhang, Chunsheng Xia, Weicheng Guo, Xia Yin, Kejun Bu, Takeshi Nakagawa, Jianbo Zhang, Federico Gorelli, Philip Dalladay-Simpson, Thomas Meier, Xujie Lü, Liling Sun, **guang Cheng, Qiaoshi Zeng, Yang Ding, Ho-kwang Mao

Abstract: Following the recent report by Dasenbrock-Gammon et al. (2023) of near-ambient superconductivity in nitrogen-doped lutetium trihydride (LuH3-δNε), significant debate has emerged surrounding the composition and interpretation of the observed sharp resistance drop. Here, we meticulously revisit these claims through comprehensive characterization and investigations. We definitively identify the repor… ▽ More Following the recent report by Dasenbrock-Gammon et al. (2023) of near-ambient superconductivity in nitrogen-doped lutetium trihydride (LuH3-δNε), significant debate has emerged surrounding the composition and interpretation of the observed sharp resistance drop. Here, we meticulously revisit these claims through comprehensive characterization and investigations. We definitively identify the reported material as lutetium dihydride (LuH2), resolving the ambiguity surrounding its composition. Under similar conditions (270-295 K and 1-2 GPa), we replicate the reported sharp decrease in electrical resistance with a 30% success rate, aligning with Dasenbrock-Gammon et al.'s observations. However, our extensive investigations reveal this phenomenon to be a novel, pressure-induced metal-to-metal transition intrinsic to LuH2, distinct from superconductivity. Intriguingly, nitrogen do** exerts minimal impact on this transition. Our work not only elucidates the fundamental properties of LuH2 and LuH3 but also critically challenges the notion of superconductivity in these lutetium hydride systems. These findings pave the way for future research on lutetium hydride systems while emphasizing the crucial importance of rigorous verification in claims of ambient temperature superconductivity. △ Less

Submitted 28 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Journal ref: Matter Radiat. Extremes 9, 037401 (2024)

arXiv:2401.13478 [pdf, other]

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Authors: Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kai**g Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

Abstract: Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in… ▽ More Multi-modal information retrieval (MMIR) is a rapidly evolving field, where significant progress, particularly in image-text pairing, has been made through advanced representation learning and cross-modality alignment research. However, current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap, where chart and table images described in scholarly language usually do not play a significant role. To bridge this gap, we develop a specialised scientific MMIR (SciMMIR) benchmark by leveraging open-access paper collections to extract data relevant to the scientific domain. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents. We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines. We conducted zero-shot and fine-tuning evaluations on prominent multi-modal image-captioning and visual language models, such as CLIP and BLIP. Our analysis offers critical insights for MMIR in the scientific domain, including the impact of pre-training and fine-tuning settings and the influence of the visual and textual encoders. All our data and checkpoints are publicly available at https://github.com/Wusiwei0410/SciMMIR. △ Less

Submitted 11 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: camera-ready version for ACL 2024 Findings

arXiv:2401.13278 [pdf, other]

doi 10.1007/s44214-024-00059-z

Dynamical Chiral Nernst Effect in Twisted Van der Waals Few Layers

Authors: Juncheng Li, Dawei Zhai, Cong Xiao, Wang Yao

Abstract: The Nernst effect is a fundamental thermoelectric conversion phenomenon that was deemed to be possible only in systems with magnetic field or magnetization. In this work, we propose a novel dynamical chiral Nernst effect that can appear in two-dimensional van der Waals materials with chiral structural symmetry in the absence of any magnetic degree of freedom. This unconventional effect is triggere… ▽ More The Nernst effect is a fundamental thermoelectric conversion phenomenon that was deemed to be possible only in systems with magnetic field or magnetization. In this work, we propose a novel dynamical chiral Nernst effect that can appear in two-dimensional van der Waals materials with chiral structural symmetry in the absence of any magnetic degree of freedom. This unconventional effect is triggered by time variation of an out-of-plane electric field, and has an intrinsic quantum geometric origin linked to not only the intralayer center-of-mass motion but also the interlayer coherence of electronic states. We demonstrate the effect in twisted homobilayer and homotrilayer transition metal dichalcogenides, where the strong twisted interlayer coupling leads to sizable intrinsic Nernst conductivities well within the experimental capacity. This work suggests a new route for electric control of thermoelectric conversion. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Journal ref: Quantum Front 3, 11 (2024)

arXiv:2401.12393 [pdf, other]

A Learning-based Declarative Privacy-Preserving Framework for Federated Data Management

Authors: Hong Guan, Summer Gautier, Deepti Gupta, Rajan Hari Ambrish, Yancheng Wang, Harsha Lakamsani, Dhanush Giriyan, Saajan Maslanka, Chaowei Xiao, Yingzhen Yang, Jia Zou

Abstract: It is challenging to balance the privacy and accuracy for federated query processing over multiple private data silos. In this work, we will demonstrate an end-to-end workflow for automating an emerging privacy-preserving technique that uses a deep learning model trained using the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm to replace portions of actual data to answer a q… ▽ More It is challenging to balance the privacy and accuracy for federated query processing over multiple private data silos. In this work, we will demonstrate an end-to-end workflow for automating an emerging privacy-preserving technique that uses a deep learning model trained using the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm to replace portions of actual data to answer a query. Our proposed novel declarative privacy-preserving workflow allows users to specify "what private information to protect" rather than "how to protect". Under the hood, the system automatically chooses query-model transformation plans as well as hyper-parameters. At the same time, the proposed workflow also allows human experts to review and tune the selected privacy-preserving mechanism for audit/compliance, and optimization purposes. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.12255 [pdf, other]

Instructional Fingerprinting of Large Language Models

Authors: Jiashu Xu, Fei Wang, Mingyu Derek Ma, Pang Wei Koh, Chaowei Xiao, Muhao Chen

Abstract: The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (e.g. restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tu… ▽ More The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (e.g. restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License. Code is available in https://cnut1648.github.io/Model-Fingerprint/. △ Less

Submitted 3 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: Accepted at NAACL 2024; 30 pages

arXiv:2401.09819 [pdf, other]

PPNet: A Two-Stage Neural Network for End-to-end Path Planning

Authors: Qinglong Meng, Chongkun Xia, Xueqian Wang, Song** Mai, Bin Liang

Abstract: The classical path planners, such as sampling-based path planners, can provide probabilistic completeness guarantees in the sense that the probability that the planner fails to return a solution if one exists, decays to zero as the number of samples approaches infinity. However, finding a near-optimal feasible solution in a given period is challenging in many applications such as the autonomous ve… ▽ More The classical path planners, such as sampling-based path planners, can provide probabilistic completeness guarantees in the sense that the probability that the planner fails to return a solution if one exists, decays to zero as the number of samples approaches infinity. However, finding a near-optimal feasible solution in a given period is challenging in many applications such as the autonomous vehicle. To achieve an end-to-end near-optimal path planner, we first divide the path planning problem into two subproblems, which are path space segmentation and waypoints generation in the given path's space. We further propose a two-stage neural network named Path Planning Network (PPNet) each stage solves one of the subproblems abovementioned. Moreover, we propose a novel efficient data generation method for path planning named EDaGe-PP. EDaGe-PP can generate data with continuous-curvature paths with analytical expression while satisfying the clearance requirement. The results show the total computation time of generating random 2D path planning data is less than 1/33 and the success rate of PPNet trained by the dataset that is generated by EDaGe-PP is about 2 times compared to other methods. We validate PPNet against state-of-the-art path planning methods. The results show that PPNet can find a near-optimal solution in 15.3ms, which is much shorter than the state-of-the-art path planners. △ Less

Submitted 23 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.09189 [pdf, other]

doi 10.1103/PhysRevD.109.106022

Interface Dynamics of Strongly interacting Binary Superfluids

Authors: Yu-** An, Li Li, Chuan-Yin Xia, Hua-Bi Zeng

Abstract: Understanding the interface dynamics in non-equilibrium quantum systems remains a challenge. We study the interface dynamics of strongly coupled immiscible binary superfluids by using holographic duality. The full nonlinear evolution of the binary superfluids with a relative velocity shows rich nonlinear patterns toward quantum turbulence, which is reminiscent of the quantum Kelvin-Helmholtz insta… ▽ More Understanding the interface dynamics in non-equilibrium quantum systems remains a challenge. We study the interface dynamics of strongly coupled immiscible binary superfluids by using holographic duality. The full nonlinear evolution of the binary superfluids with a relative velocity shows rich nonlinear patterns toward quantum turbulence, which is reminiscent of the quantum Kelvin-Helmholtz instability. The wave number of the fast growing modes $k_0$ extracted from the interface pattern yields a non-monotonic dependence of the relative velocity, independent of the temperature and interaction. The value of $k_0$ first increases with the velocity difference and then decreases, which stands in sharp contrast to the results of mean-field theory described by the Gross-Pitaevskii equation and is confirmed by using the linear analyses on top of the stationary configuration. We uncover that the critical velocity associated with the maximum correspond to the case when the mean separation of vortices generated by interface instabilities becomes comparable to the vortex size, which could be a universal physical mechanism at strongly interacting superfluids and is directly testable in laboratory experiments. △ Less

Submitted 29 May, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted version, 10 pages, 5 figures

Showing 51–100 of 1,014 results for author: Xia, C