Search | arXiv e-print repository

On the near soliton dynamics for the 2D cubic Zakharov-Kuznetsov equations

Abstract: In this article, we consider the Cauchy problem for the cubic (mass-critical) Zakharov-Kuznetsov equations in dimension two: $$\partial_t u+\partial_{x_1}(Δu+u^3)=0,\quad (t,x)\in [0,\infty)\times \mathbb{R}^{2}.$$ For initial data in $H^1$ close to the soliton with a suitable space-decay property, we fully describe the asymptotic behavior of the corresponding solution. More precisely, for such in… ▽ More In this article, we consider the Cauchy problem for the cubic (mass-critical) Zakharov-Kuznetsov equations in dimension two: $$\partial_t u+\partial_{x_1}(Δu+u^3)=0,\quad (t,x)\in [0,\infty)\times \mathbb{R}^{2}.$$ For initial data in $H^1$ close to the soliton with a suitable space-decay property, we fully describe the asymptotic behavior of the corresponding solution. More precisely, for such initial data, we show that only three possible behaviors can occur: 1) The solution leaves a tube near soliton in finite time; 2) the solution blows up in finite time; 3) the solution is global and locally converges to a soliton. In addition, we show that for initial data near a soliton with non-positive energy and above the threshold mass, the corresponding solution will blow up as described in Case 2. Our proof is inspired by the techniques developed for mass-critical generalized Korteweg-de Vries equation (gKdV) equation in a similar context by Martel-Merle-Raphaël. More precisely, our proof relies on refined modulation estimates and a modified energy-virial Lyapunov functional. The primary challenge in our problem is the lack of coercivity of the Schrödinger operator which appears in the virial-type estimate. To overcome the difficulty, we apply a transform, which was first introduced in Kenig-Martel [13], to perform the virial computations after converting the original problem to the adjoint one. Th coercivity of the Schrödinger operator in the adjoint problem has been numerically verified by Farah-Holmer-Roudenko-Yang [9]. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 65 pages

arXiv:2406.17797 [pdf, other]

MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

Authors: Shikun Feng, Jiaxin Zheng, Yinjun Jia, Yanwen Huang, Fengfeng Zhou, Wei-Ying Ma, Yanyan Lan

Abstract: Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address th… ▽ More Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address these issues, we construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules, meticulously designed to capture an extensive array of chemical, physical, and biological properties, derived through a robust computational ligand-target binding analysis pipeline. We conduct extensive experiments on various deep learning models, demonstrating that our dataset offers significant physicochemical interpretability to guide model development and design. Notably, the dataset's properties are linked to binding affinity metrics, providing additional insights into model performance in drug-target interaction tasks. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning, thereby expediting progress in the field of artificial intelligence-driven drug discovery. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.10584 [pdf, other]

Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

Authors: Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, Chao Shen

Abstract: Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep la… ▽ More Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep layers are more generalizable and (ii) Prompts with more stable attention distributions in PLMs' deep layers are more generalizable. Thus, we offer a fresh objective towards domain-generalizable prompts optimization named "Concentration", which represents the "lookback" attention from the current decoding token to the prompt tokens, to increase the attention strength on prompts and reduce the fluctuation of attention distribution. We adapt this new objective to popular soft prompt and hard prompt optimization methods, respectively. Extensive experiments demonstrate that our idea improves comparison prompt optimization methods by 1.42% for soft prompt generalization and 2.16% for hard prompt generalization in accuracy on the multi-source domain generalization setting, while maintaining satisfying in-domain performance. The promising results validate the effectiveness of our proposed prompt optimization objective and provide key insights into domain-generalizable prompts. △ Less

Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: Submitted to NeurIPS 2024, Preprint, Under review

arXiv:2406.08980 [pdf, other]

From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability of the Vina docking score, the current standard for assessing binding abilities, is increasingly questioned due to its susceptibility to overfitting. To address these limitations, we propose a comprehensive evaluation framework that includes assessing the similarity of generated molecules to known active compounds, introducing a virtual screening-based metric for practical deployment capabilities, and re-evaluating binding affinity more rigorously. Our experiments reveal that while current SBDD models achieve high Vina scores, they fall short in practical usability metrics, highlighting a significant gap between theoretical predictions and real-world applicability. Our proposed metrics and dataset aim to bridge this gap, enhancing the practical applicability of future SBDD models and aligning them more closely with the needs of pharmaceutical research and development. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08961 [pdf, other]

SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or toxic pharmacological outcomes of small molecules, rendering accurate bioactivity prediction crucial for the development of safe and effective drugs. However, existing structural datasets of small molecule-protein interactions are often limited in scale and lack systematically organized bioactivity labels, thereby impeding our understanding of these interactions and precise bioactivity prediction. In this study, we introduce a comprehensive dataset of small molecule-protein interactions, consisting of over a million binding structures, each annotated with real biological activity labels. This dataset is designed to facilitate unbiased bioactivity prediction. We evaluated several classical models on this dataset, and the results demonstrate that the task of unbiased bioactivity prediction is challenging yet essential. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.03238 [pdf, ps, other]

The parity of Lusztig's restriction functor and Green's formula for a quiver with automorphism

Authors: Jiepeng Fang, Yixin Lan, Yumeng Wu

Abstract: In [8], Fang-Lan-Xiao proved a formula about Lusztig's induction and restriction functors which can induce Green's formula for the path algebra of a quiver over a finite field via the trace map. In this paper, we generalize their formula to that for the mixed semisimple perverse sheaves for a quiver with an automorphism. By applying the trace map, we obtain Green's formula for any finite-dimension… ▽ More In [8], Fang-Lan-Xiao proved a formula about Lusztig's induction and restriction functors which can induce Green's formula for the path algebra of a quiver over a finite field via the trace map. In this paper, we generalize their formula to that for the mixed semisimple perverse sheaves for a quiver with an automorphism. By applying the trace map, we obtain Green's formula for any finite-dimensional hereditary algebra over a finite field. △ Less

Submitted 5 June, 2024; originally announced June 2024.

MSC Class: 16G20; 17B37

arXiv:2405.19909 [pdf, other]

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

Authors: Tenglong Liu, Yang Li, Yixing Lan, Hao Gao, Wei Pan, Xin Xu

Abstract: In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that genera… ▽ More In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain the learned policy through policy regularization. However, these methods often suffer from the issue of unnecessary conservativeness, hampering policy improvement. This occurs due to the indiscriminate use of all actions from the behavior policy that generates the offline dataset as constraints. The problem becomes particularly noticeable when the quality of the dataset is suboptimal. Thus, we propose Adaptive Advantage-guided Policy Regularization (A2PR), obtaining high-advantage actions from an augmented behavior policy combined with VAE to guide the learned policy. A2PR can select high-advantage actions that differ from those present in the dataset, while still effectively maintaining conservatism from OOD actions. This is achieved by harnessing the VAE capacity to generate samples matching the distribution of the data points. We theoretically prove that the improvement of the behavior policy is guaranteed. Besides, it effectively mitigates value overestimation with a bounded performance gap. Empirically, we conduct a series of experiments on the D4RL benchmark, where A2PR demonstrates state-of-the-art performance. Furthermore, experimental results on additional suboptimal mixed datasets reveal that A2PR exhibits superior performance. Code is available at https://github.com/ltlhuuu/A2PR. △ Less

Submitted 1 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: ICML 2024, 19 pages

arXiv:2405.17802 [pdf, other]

Multi-level Interaction Modeling for Protein Mutational Effect Prediction

Authors: Yuanle Mo, Xin Hong, Bowen Gao, Yinjun Jia, Yanyan Lan

Abstract: Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different si… ▽ More Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different sidechain conformations, which lead to changes in the backbone conformation, eventually affecting the binding affinity between proteins. However, existing methods typically focus only on sidechain-level interaction modeling, resulting in suboptimal predictions. In this work, we propose a self-supervised multi-level pre-training framework, ProMIM, to fully capture all three levels of interactions with well-designed pretraining objectives. Experiments show ProMIM outperforms all the baselines on the standard benchmark, especially on mutations where significant changes in backbone conformations may occur. In addition, leading results from zero-shot evaluations for SARS-CoV-2 mutational effect prediction and antibody optimization underscore the potential of ProMIM as a powerful next-generation tool for develo** novel therapeutic approaches and new drugs. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16457 [pdf, other]

Entanglement island and Page curve for one-sided charged black hole

Authors: Yun-Feng Qu, Yi-Ling Lan, Hongwei Yu, Wen-Cong Gan, Fu-Wen Shu

Abstract: In this paper, we extend the method of calculating the entanglement entropy of Hawking radiation of black holes using the "in" vacuum state, which describes one-sided asymptotically flat neutral black hole formed by gravitational collapse, to dynamic charged black holes. We explore the influence of charge on the position of the boundary of island $\partial I$ and the Page time. Due to their distin… ▽ More In this paper, we extend the method of calculating the entanglement entropy of Hawking radiation of black holes using the "in" vacuum state, which describes one-sided asymptotically flat neutral black hole formed by gravitational collapse, to dynamic charged black holes. We explore the influence of charge on the position of the boundary of island $\partial I$ and the Page time. Due to their distinct geometric structures, we discuss non-extremal and extremal charged black holes separately. In non-extremal cases, the emergence of island saves the bound of entropy at late times, and the entanglement entropy of Hawking radiation satisfies the Page curve. Moreover, we also find that the position of the boundary of island $\partial I$ depends on the position of the cutoff surface (observers), differing from the behavior in eternal charged black holes. In extremal black holes, when the island exists, the entanglement entropy is approximately equal to the Bekenstein-Hawking entropy, while the entanglement entropy becomes ill-defined when island is absent. Our analysis underscores how different geometric configurations significantly influence the behavior of entropy. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16150 [pdf, other]

5W1H Extraction With Large Language Models

Authors: Yang Cao, Yangsong Lan, Feiyan Zhai, Piji Li

Abstract: The extraction of essential news elements through the 5W1H framework (\textit{What}, \textit{When}, \textit{Where}, \textit{Why}, \textit{Who}, and \textit{How}) is critical for event extraction and text summarization. The advent of Large language models (LLMs) such as ChatGPT presents an opportunity to address language-related tasks through simple prompts without fine-tuning models with much time… ▽ More The extraction of essential news elements through the 5W1H framework (\textit{What}, \textit{When}, \textit{Where}, \textit{Why}, \textit{Who}, and \textit{How}) is critical for event extraction and text summarization. The advent of Large language models (LLMs) such as ChatGPT presents an opportunity to address language-related tasks through simple prompts without fine-tuning models with much time. While ChatGPT has encountered challenges in processing longer news texts and analyzing specific attributes in context, especially answering questions about \textit{What}, \textit{Why}, and \textit{How}. The effectiveness of extraction tasks is notably dependent on high-quality human-annotated datasets. However, the absence of such datasets for the 5W1H extraction increases the difficulty of fine-tuning strategies based on open-source LLMs. To address these limitations, first, we annotate a high-quality 5W1H dataset based on four typical news corpora (\textit{CNN/DailyMail}, \textit{XSum}, \textit{NYT}, \textit{RA-MDS}); second, we design several strategies from zero-shot/few-shot prompting to efficient fine-tuning to conduct 5W1H aspects extraction from the original news documents. The experimental results demonstrate that the performance of the fine-tuned models on our labelled dataset is superior to the performance of ChatGPT. Furthermore, we also explore the domain adaptation capability by testing the source-domain (e.g. NYT) models on the target domain corpus (e.g. CNN/DailyMail) for the task of 5W1H extraction. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: IJCNN 2024

arXiv:2405.10481 [pdf, other]

Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

Authors: Yuqing Lan, Zhenghao Liu, Yu Gu, Xiaoyuan Yi, Xiaohua Li, Liner Yang, Ge Yu

Abstract: Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the… ▽ More Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the edges of the reasoning graph, which misleads the semantic representations of other nodes and amplifies the noise signals. To mitigate the propagation of noisy semantic information, we introduce a Confidential Graph Attention Network (CO-GAT), which proposes a node masking mechanism for modeling the nodes. Specifically, CO-GAT calculates the node confidence score by estimating the relevance between the claim and evidence pieces. Then, the node masking mechanism uses the node confidence scores to control the noise information flow from the vanilla node to the other graph nodes. CO-GAT achieves a 73.59% FEVER score on the FEVER dataset and shows the generalization ability by broadening the effectiveness to the science-specific domain. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 12pages

arXiv:2405.10343 [pdf, other]

UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

Authors: Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Abstract: Recently, a noticeable trend has emerged in develo** pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound un… ▽ More Recently, a noticeable trend has emerged in develo** pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.19335 [pdf, other]

StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which… ▽ More Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which limits its extension to more real-world applications. To tackle this issue, we propose to treat the hard prompt and soft prompt as separate inputs to mitigate noise brought by the prompt initialization. Furthermore, we optimize soft prompts with contrastive learning for utilizing class-aware information in the training process to maintain model performance. Experimental results demonstrate that \sysname outperforms state-of-the-art methods by 7.20% in accuracy and reduces the standard deviation by 2.02 on average. Furthermore, extensive experiments underscore its robustness and stability across 7 datasets covering various tasks. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Submitted to ACL 2024

arXiv:2404.11467 [pdf, other]

A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems

Authors: Xiaoyan Zhou, Feiran Liang, Zhaojie Xie, Yang Lan, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

Abstract: Package managers such as NPM, Maven, and PyPI play a pivotal role in open-source software (OSS) ecosystems, streamlining the distribution and management of various freely available packages. The fine-grained details within software packages can unveil potential risks within existing OSS ecosystems, offering valuable insights for detecting malicious packages. In this study, we undertake a large-sca… ▽ More Package managers such as NPM, Maven, and PyPI play a pivotal role in open-source software (OSS) ecosystems, streamlining the distribution and management of various freely available packages. The fine-grained details within software packages can unveil potential risks within existing OSS ecosystems, offering valuable insights for detecting malicious packages. In this study, we undertake a large-scale empirical analysis focusing on fine-grained information (FGI): the metadata, static, and dynamic functions. Specifically, we investigate the FGI usage across a diverse set of 50,000+ legitimate and 1,000+ malicious packages. Based on this diverse data collection, we conducted a comparative analysis between legitimate and malicious packages. Our findings reveal that (1) malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones; (2) malicious packages demonstrate a higher tendency to invoke HTTP/URL functions as opposed to other application services, such as FTP or SMTP; (3) FGI serves as a distinguishable indicator between legitimate and malicious packages; and (4) one dimension in FGI has sufficient distinguishable capability to detect malicious packages, and combining all dimensions in FGI cannot significantly improve overall performance. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.03205 [pdf, other]

Optimal Dynamical Gauge in the Quantum Rabi Model

Authors: Yuqi Qing, Wen-Long You, Yueheng Lan, Maoxin Liu

Abstract: In this paper, we investigate the gauge dependence of various physical observables in the quantum Rabi model (QRM) under different potential fields, arising from the Hilbert-space truncation of the atomic degree of freedom. We discover that in both the square-well potential and oscillator potential,the optimal gauges for the ground-state energy of the QRM vary with respect to the cavity frequency,… ▽ More In this paper, we investigate the gauge dependence of various physical observables in the quantum Rabi model (QRM) under different potential fields, arising from the Hilbert-space truncation of the atomic degree of freedom. We discover that in both the square-well potential and oscillator potential,the optimal gauges for the ground-state energy of the QRM vary with respect to the cavity frequency, with the dipole gauge being optimal in the low-frequency limit and the Coulomb gauge in the high-frequency limit of the cavity frequency. Additionally, for higher energy levels, the optimal gauge asymptotically approaches the dipole gauge. However, for the dynamical quantity out-time-order correlator (OTOC), we find the necessity to introduce an optimal dynamical gauge. We determine the optimal dynamical gauge by minimizing the mean error between the two-level OTOC and the full Hamiltonian one. We expect that this study will contribute to a more profound understanding of the subtle relation between gauge choice and the dynamics of QED systems. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.17411 [pdf, other]

PCToolkit: A Unified Plug-and-Play Prompt Compression Toolkit of Large Language Models

Authors: **yi Li, Yihuai Lan, Lei Wang, Hao Wang

Abstract: Prompt compression is an innovative method for efficiently condensing input prompts while preserving essential information. To facilitate quick-start services, user-friendly interfaces, and compatibility with common datasets and metrics, we present the Prompt Compression Toolkit (PCToolkit). This toolkit is a unified plug-and-play solution for compressing prompts in Large Language Models (LLMs), f… ▽ More Prompt compression is an innovative method for efficiently condensing input prompts while preserving essential information. To facilitate quick-start services, user-friendly interfaces, and compatibility with common datasets and metrics, we present the Prompt Compression Toolkit (PCToolkit). This toolkit is a unified plug-and-play solution for compressing prompts in Large Language Models (LLMs), featuring cutting-edge prompt compressors, diverse datasets, and metrics for comprehensive performance evaluation. PCToolkit boasts a modular design, allowing for easy integration of new datasets and metrics through portable and user-friendly interfaces. In this paper, we outline the key components and functionalities of PCToolkit. We conducted evaluations of the compressors within PCToolkit across various natural language tasks, including reconstruction, summarization, mathematical problem-solving, question answering, few-shot learning, synthetic tasks, code completion, boolean expressions, multiple choice questions, and lies recognition. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: For open-source repository, see https://github.com/3DAgentWorld/Toolkit-for-Prompt-Compression

arXiv:2403.14736 [pdf, other]

NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks

Authors: Yi-Shan Lan, Pin-Yu Chen, Tsung-Yi Ho

Abstract: Protein classification tasks are essential in drug discovery. Real-world protein structures are dynamic, which will determine the properties of proteins. However, the existing machine learning methods, like ProNet (Wang et al., 2022a), only access limited conformational characteristics and protein side-chain features, leading to impractical protein structure and inaccuracy of protein classes in th… ▽ More Protein classification tasks are essential in drug discovery. Real-world protein structures are dynamic, which will determine the properties of proteins. However, the existing machine learning methods, like ProNet (Wang et al., 2022a), only access limited conformational characteristics and protein side-chain features, leading to impractical protein structure and inaccuracy of protein classes in their predictions. In this paper, we propose novel semantic data augmentation methods, Novel Augmentation of New Node Attributes (NaNa), and Molecular Interactions and Geometric Upgrading (MiGu) to incorporate backbone chemical and side-chain biophysical information into protein classification tasks and a co-embedding residual learning framework. Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, and ionic features of proteins to facilitate protein classification tasks. Furthermore, our semantic augmentation methods and the co-embedding residual learning framework can improve the performance of GIN (Xu et al., 2019) on EC and Fold datasets (Bairoch, 2000; Andreeva et al., 2007) by 16.41% and 11.33% respectively. Our code is available at https://github.com/r08b46009/Code_for_MIGU_NANA/tree/main. △ Less

Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.12987 [pdf, other]

Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

Authors: Bowen Gao, Minsi Ren, Yuyan Ni, Yanwen Huang, Bo Qiang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

Abstract: In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. T… ▽ More In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score. However, further study shows that the existing molecular generative methods and docking scores both have lacked consideration in terms of specificity, which means that generated molecules bind to almost every protein pocket with high affinity. To address this, we introduce the Delta Score, a new metric for evaluating the specificity of molecular binding. To further incorporate this insight for generation, we develop an innovative energy-guided approach using contrastive learning, with active compounds as decoys, to direct generative models toward creating molecules with high specificity. Our empirical results show that this method not only enhances the delta score but also maintains or improves traditional docking scores, successfully bridging the gap between SBDD and real-world needs. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.12019 [pdf, other]

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

Authors: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

Abstract: The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harn… ▽ More The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: project webpage: https://nirvanalan.github.io/projects/ln3diff/

arXiv:2402.17971 [pdf, other]

All in an Aggregated Image for In-Image Learning

Authors: Lei Wang, Wanyu Xu, Zhiqiang Hu, Yihuai Lan, Shan Dong, Hao Wang, Roy Ka-Wei Lee, Ee-Peng Lim

Abstract: This paper introduces a new in-context learning (ICL) mechanism called In-Image Learning (I$^2$L) that combines demonstration examples, visual cues, and chain-of-thought reasoning into an aggregated image to enhance the capabilities of Large Multimodal Models (e.g., GPT-4V) in multimodal reasoning tasks. Unlike previous approaches that rely on converting images to text or incorporating visual inpu… ▽ More This paper introduces a new in-context learning (ICL) mechanism called In-Image Learning (I$^2$L) that combines demonstration examples, visual cues, and chain-of-thought reasoning into an aggregated image to enhance the capabilities of Large Multimodal Models (e.g., GPT-4V) in multimodal reasoning tasks. Unlike previous approaches that rely on converting images to text or incorporating visual input into language models, I$^2$L consolidates all information into an aggregated image and leverages image processing, understanding, and reasoning abilities. This has several advantages: it reduces inaccurate textual descriptions of complex images, provides flexibility in positioning demonstration examples, and avoids multiple input images and lengthy prompts. We also introduce I$^2$L-Hybrid, a method that combines the strengths of I$^2$L with other ICL methods. Specifically, it uses an automatic strategy to select the most suitable method (I$^2$L or another certain ICL method) for a specific task instance. We conduct extensive experiments to assess the effectiveness of I$^2$L and I$^2$L-Hybrid on MathVista, which covers a variety of complex multimodal reasoning tasks. Additionally, we investigate the influence of image resolution, the number of demonstration examples in a single image, and the positions of these demonstrations in the aggregated image on the effectiveness of I$^2$L. Our code is publicly available at https://github.com/AGI-Edgerunners/IIL. △ Less

Submitted 2 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Preprint

arXiv:2402.16567 [pdf, other]

Aligning Large Language Models to a Domain-specific Graph Database

Authors: Yuanyuan Liang, Keren Tan, Tingyu Xie, Wenbiao Tao, Siyuan Wang, Yunshi Lan, Weining Qian

Abstract: Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like… ▽ More Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nevertheless, when it comes to NL2GQL taskson a particular domain, the absence of domain-specific NL-GQL data pairs makes it difficult to establish alignment between LLMs and the graph DB. To address this challenge, we propose a well-defined pipeline. Specifically, we utilize ChatGPT to create NL-GQL data pairs based on the given graph DB with self-instruct. Then, we use the created data to fine-tune LLMs, thereby achieving alignment between LLMs and the graph DB. Additionally, during inference, we propose a method that extracts relevant schema to the queried NL as the input context to guide LLMs for generating accurate GQLs.We evaluate our method on two constructed datasets deriving from graph DBs in finance domain and medicine domain, namely FinGQL and MediGQL. Experimental results demonstrate that our method significantly outperforms a set of baseline methods, with improvements of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on EX, respectively. △ Less

Submitted 28 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 13 pages,2 figures

arXiv:2402.14704 [pdf, other]

An LLM-Enhanced Adversarial Editing System for Lexical Simplification

Authors: Keren Tan, Kangyang Luo, Yunshi Lan, Zheng Yuan, **long Shu

Abstract: Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original s… ▽ More Lexical Simplification (LS) aims to simplify text at the lexical level. Existing methods rely heavily on annotated data, making it challenging to apply in low-resource scenarios. In this paper, we propose a novel LS method without parallel corpora. This method employs an Adversarial Editing System with guidance from a confusion loss and an invariance loss to predict lexical edits in the original sentences. Meanwhile, we introduce an innovative LLM-enhanced loss to enable the distillation of knowledge from Large Language Models (LLMs) into a small-size LS system. From that, complex words within sentences are masked and a Difficulty-aware Filling module is crafted to replace masked positions with simpler words. At last, extensive experimental results and analyses on three benchmark LS datasets demonstrate the effectiveness of our proposed method. △ Less

Submitted 22 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted by COLING 2024 main conference

arXiv:2402.13779 [pdf, other]

Contextual Molecule Representation Learning from Chemical Reaction Knowledge

Authors: Han Tang, Shikun Feng, Bicheng Lin, Yuyan Ni, JIng**g Liu, Wei-Ying Ma, Yanyan Lan

Abstract: In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms… ▽ More In recent years, self-supervised learning has emerged as a powerful tool to harness abundant unlabelled data for representation learning and has been broadly adopted in diverse areas. However, when applied to molecular representation learning (MRL), prevailing techniques such as masked sub-unit reconstruction often fall short, due to the high degree of freedom in the possible combinations of atoms within molecules, which brings insurmountable complexity to the masking-reconstruction paradigm. To tackle this challenge, we introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry. Specifically, REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature. We propose two pre-training objectives: Masked Reaction Centre Reconstruction (MRCR) and Reaction Centre Identification (RCI). REMO offers a novel solution to MRL by exploiting the underlying shared patterns in chemical reactions as \textit{context} for pre-training, which effectively infers meaningful representations of common chemistry knowledge. Such contextual representations can then be utilized to support diverse downstream molecular tasks with minimum finetuning, such as affinity prediction and drug-drug interaction prediction. Extensive experimental results on MoleculeACE, ACNet, drug-drug interaction (DDI), and reaction type classification show that across all tested downstream tasks, REMO outperforms the standard baseline of single-molecule masked modeling used in current MRL. Remarkably, REMO is the pioneering deep learning model surpassing fingerprint-based methods in activity cliff benchmarks. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Preprint. Under Review

arXiv:2402.13647 [pdf, other]

Unsupervised Text Style Transfer via LLMs and Attention Masking with Multi-way Interactions

Authors: Lei Pan, Yunshi Lan, Yang Li, Weining Qian

Abstract: Unsupervised Text Style Transfer (UTST) has emerged as a critical task within the domain of Natural Language Processing (NLP), aiming to transfer one stylistic aspect of a sentence into another style without changing its semantics, syntax, or other attributes. This task is especially challenging given the intrinsic lack of parallel text pairings. Among existing methods for UTST tasks, attention ma… ▽ More Unsupervised Text Style Transfer (UTST) has emerged as a critical task within the domain of Natural Language Processing (NLP), aiming to transfer one stylistic aspect of a sentence into another style without changing its semantics, syntax, or other attributes. This task is especially challenging given the intrinsic lack of parallel text pairings. Among existing methods for UTST tasks, attention masking approach and Large Language Models (LLMs) are deemed as two pioneering methods. However, they have shortcomings in generating unsmooth sentences and changing the original contents, respectively. In this paper, we investigate if we can combine these two methods effectively. We propose four ways of interactions, that are pipeline framework with tuned orders; knowledge distillation from LLMs to attention masking model; in-context learning with constructed parallel examples. We empirically show these multi-way interactions can improve the baselines in certain perspective of style strength, content preservation and text fluency. Experiments also demonstrate that simply conducting prompting followed by attention masking-based revision can consistently surpass the other systems, including supervised text style transfer systems. On Yelp-clean and Amazon-clean datasets, it improves the previously best mean metric by 0.5 and 3.0 absolute percentages respectively, and achieves new SOTA results. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13125 [pdf, other]

TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning

Authors: Xiang Li, Yunshi Lan, Chao Yang

Abstract: Recently, numerous new benchmarks have been established to evaluate the performance of large language models (LLMs) via either computing a holistic score or employing another LLM as a judge. However, these approaches suffer from data leakage due to the open access of the benchmark and inflexible evaluation process. To address this issue, we introduce $\textbf{TreeEval}$, a benchmark-free evaluatio… ▽ More Recently, numerous new benchmarks have been established to evaluate the performance of large language models (LLMs) via either computing a holistic score or employing another LLM as a judge. However, these approaches suffer from data leakage due to the open access of the benchmark and inflexible evaluation process. To address this issue, we introduce $\textbf{TreeEval}$, a benchmark-free evaluation method for LLMs that let a high-performance LLM host an irreproducible evaluation session and essentially avoids the data leakage. Moreover, this LLM performs as an examiner to raise up a series of questions under a topic with a tree planing strategy, which considers the current evaluation status to decide the next question generation and ensures the completeness and efficiency of the evaluation process. We evaluate $6$ models of different parameter sizes, including $7$B, $13$B, and $33$B, and ultimately achieved the highest correlation coefficient with AlpacaEval2.0 using only around $45$ questions. We also conduct more analysis to show the robustness and reliability of TreeEval. Our code can be accessed via the provided https://github.com/Ashura5/TreeEval. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.00357 [pdf, other]

Safety of Multimodal Large Language Models on Images and Texts

Authors: Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao

Abstract: Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defens… ▽ More Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the vulnerabilities of MLLMs to unsafe instructions bring huge safety risks when these models are deployed in real-world scenarios. In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We begin with introducing the overview of MLLMs on images and text and understanding of safety, which helps researchers know the detailed scope of our survey. Then, we review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety. Finally, we analyze several unsolved issues and discuss promising research directions. The latest papers are continually collected at https://github.com/isXinLiu/MLLM-Safety-Collection. △ Less

Submitted 20 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted at IJCAI2024

arXiv:2402.00263 [pdf, other]

Does DetectGPT Fully Utilize Perturbation? Bridge Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better

Authors: Shengchao Liu, Xiaoming Liu, Yichen Wang, Zehua Cheng, Chengzhengxu Li, Zhaohan Zhang, Yu Lan, Chao Shen

Abstract: The burgeoning generative capabilities of large language models (LLMs) have raised growing concerns about abuse, demanding automatic machine-generated text detectors. DetectGPT, a zero-shot metric-based detector, first introduces perturbation and shows great performance improvement. However, in DetectGPT, random perturbation strategy could introduce noise, and logit regression depends on threshold… ▽ More The burgeoning generative capabilities of large language models (LLMs) have raised growing concerns about abuse, demanding automatic machine-generated text detectors. DetectGPT, a zero-shot metric-based detector, first introduces perturbation and shows great performance improvement. However, in DetectGPT, random perturbation strategy could introduce noise, and logit regression depends on threshold, harming the generalizability and applicability of individual or small-batch inputs. Hence, we propose a novel fine-tuned detector, Pecola, bridging metric-based and fine-tuned detectors by contrastive learning on selective perturbation. Selective strategy retains important tokens during perturbation and weights for multi-pair contrastive learning. The experiments show that Pecola outperforms the state-of-the-art by 1.20% in accuracy on average on four public datasets. And we further analyze the effectiveness, robustness, and generalization of the method. △ Less

Submitted 24 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.07518 [pdf, other]

Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends

Authors: Yunshi Lan, Xinyuan Li, Hanyue Du, Xuesong Lu, Ming Gao, Weining Qian, Aoying Zhou

Abstract: Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves the applications in domains of healthcare, commerce, education and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with the focus… ▽ More Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves the applications in domains of healthcare, commerce, education and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with the focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education where NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain. At last, we conclude with six promising directions for future research, including more datasets in education domain, controllable usage of LLMs, intervention of difficulty-level control, interpretable educational NLP, methods with adaptive learning, and integrated systems for education. We organize all relevant datasets and papers in the open-available Github Link for better review~\url{https://github.com/LiXinyuan1015/NLP-for-Education}. △ Less

Submitted 15 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.11057 [pdf, other]

DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

Authors: Jiachen Zhou, Peizhuo Lv, Yibing Lan, Guozhu Meng, Kai Chen, Hualong Ma

Abstract: Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanit… ▽ More Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Specifically, with multiple iterations of the forward and reverse process, we extract intermediary images and their predicted labels for each sample in the original dataset. Then, we identify anomalous samples in terms of the presence of label transition of the intermediary images, detect the target label by quantifying distribution discrepancy, select their purified images considering pixel and feature distance, and determine their ground-truth labels by training a benign model. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy, surpassing the performance of baseline defense methods. △ Less

Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.10422 [pdf, other]

Learning Dense Correspondence for NeRF-Based Face Reenactment

Authors: Songlin Yang, Wei Wang, Yushi Lan, Xiangyu Fan, Bo Peng, Lei Yang, **g Dong

Abstract: Face reenactment is challenging due to the need to establish dense correspondence between various face representations for motion transfer. Recent studies have utilized Neural Radiance Field (NeRF) as fundamental representation, which further enhanced the performance of multi-view face reenactment in photo-realism and 3D consistency. However, establishing dense correspondence between different fac… ▽ More Face reenactment is challenging due to the need to establish dense correspondence between various face representations for motion transfer. Recent studies have utilized Neural Radiance Field (NeRF) as fundamental representation, which further enhanced the performance of multi-view face reenactment in photo-realism and 3D consistency. However, establishing dense correspondence between different face NeRFs is non-trivial, because implicit representations lack ground-truth correspondence annotations like mesh-based 3D parametric models (e.g., 3DMM) with index-aligned vertexes. Although aligning 3DMM space with NeRF-based face representations can realize motion control, it is sub-optimal for their limited face-only modeling and low identity fidelity. Therefore, we are inspired to ask: Can we learn the dense correspondence between different NeRF-based face representations without a 3D parametric model prior? To address this challenge, we propose a novel framework, which adopts tri-planes as fundamental NeRF representation and decomposes face tri-planes into three components: canonical tri-planes, identity deformations, and motion. In terms of motion control, our key contribution is proposing a Plane Dictionary (PlaneDict) module, which efficiently maps the motion conditions to a linear weighted addition of learnable orthogonal plane bases. To the best of our knowledge, our framework is the first method that achieves one-shot multi-view face reenactment without a 3D parametric model prior. Extensive experiments demonstrate that we produce better results in fine-grained motion control and identity preservation than previous methods. △ Less

Submitted 18 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: Accepted by Proceedings of the AAAI Conference on Artificial Intelligence, 2024

arXiv:2312.10389 [pdf, other]

ElasticLaneNet: An Efficient Geometry-Flexible Approach for Lane Detection

Authors: Yaxin Feng, Yuan Lan, Luchan Zhang, Yang Xiang

Abstract: The task of lane detection involves identifying the boundaries of driving areas in real-time. Recognizing lanes with variable and complex geometric structures remains a challenge. In this paper, we explore a novel and flexible way of implicit lanes representation named \textit{Elastic Lane map (ELM)}, and introduce an efficient physics-informed end-to-end lane detection framework, namely, ElasticL… ▽ More The task of lane detection involves identifying the boundaries of driving areas in real-time. Recognizing lanes with variable and complex geometric structures remains a challenge. In this paper, we explore a novel and flexible way of implicit lanes representation named \textit{Elastic Lane map (ELM)}, and introduce an efficient physics-informed end-to-end lane detection framework, namely, ElasticLaneNet (Elastic interaction energy-informed Lane detection Network). The approach considers predicted lanes as moving zero-contours on the flexibly shaped \textit{ELM} that are attracted to the ground truth guided by an elastic interaction energy-loss function (EIE loss). Our framework well integrates the global information and low-level features. The method performs well in complex lane scenarios, including those with large curvature, weak geometry features at intersections, complicated cross lanes, Y-shapes lanes, dense lanes, etc. We apply our approach on three datasets: SDLane, CULane, and TuSimple. The results demonstrate exceptional performance of our method, with the state-of-the-art results on the structurally diverse SDLane, achieving F1-score of 89.51, Recall rate of 87.50, and Precision of 91.61 with fast inference speed. △ Less

Submitted 3 April, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.09888 [pdf, other]

doi 10.1145/3624062.3624159

Scaling Computational Fluid Dynamics: In Situ Visualization of NekRS using SENSEI

Authors: Victor A. Mateevitsi, Mathis Bode, Nicola Ferrier, Paul Fischer, Jens Henrik Göbbert, Joseph A. Insley, Yu-Hsiang Lan, Misun Min, Michael E. Papka, Saumil Patel, Silvio Rizzi, Jonathan Windgassen

Abstract: In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and G… ▽ More In the realm of Computational Fluid Dynamics (CFD), the demand for memory and computation resources is extreme, necessitating the use of leadership-scale computing platforms for practical domain sizes. This intensive requirement renders traditional checkpointing methods ineffective due to the significant slowdown in simulations while saving state data to disk. As we progress towards exascale and GPU-driven High-Performance Computing (HPC) and confront larger problem sizes, the choice becomes increasingly stark: to compromise data fidelity or to reduce resolution. To navigate this challenge, this study advocates for the use of in situ analysis and visualization techniques. These allow more frequent data "snapshots" to be taken directly from memory, thus avoiding the need for disruptive checkpointing. We detail our approach of instrumenting NekRS, a GPU-focused thermal-fluid simulation code employing the spectral element method (SEM), and describe varied in situ and in transit strategies for data rendering. Additionally, we provide concrete scientific use-cases and report on runs performed on Polaris, Argonne Leadership Computing Facility's (ALCF) 44 Petaflop supercomputer and Jülich Wizard for European Leadership Science (JUWELS) Booster, Jülich Supercomputing Centre's (JSC) 71 Petaflop High Performance Computing (HPC) system, offering practical insight into the implications of our methodology. △ Less

Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.07586 [pdf, other]

Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale

Authors: Candi Zheng, Yuan Lan

Abstract: Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction f… ▽ More Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction for classifier-free guidance. Such correction forces the guided DDPMs to respect the Fokker-Planck (FP) equation of diffusion process, in a way that is training-free and compatible with existing sampling methods. Experiments show that characteristic guidance enhances semantic characteristics of prompts and mitigate irregularities in image generation, proving effective in diverse applications ranging from simulating magnet phase transitions to latent space sampling. △ Less

Submitted 3 June, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: 8 pages, 7 figures

arXiv:2312.07168 [pdf, other]

Equivariant Flow Matching with Hybrid Probability Transport

Authors: Yuxuan Song, **g**g Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, Wei-Ying Ma

Abstract: The generation of 3D molecules requires simultaneously deciding the categorical features~(atom types) and continuous features~(atom coordinates). Deep generative models, especially Diffusion Models (DMs), have demonstrated effectiveness in generating feature-rich geometries. However, existing DMs typically suffer from unstable probability dynamics with inefficient sampling speed. In this paper, we… ▽ More The generation of 3D molecules requires simultaneously deciding the categorical features~(atom types) and continuous features~(atom coordinates). Deep generative models, especially Diffusion Models (DMs), have demonstrated effectiveness in generating feature-rich geometries. However, existing DMs typically suffer from unstable probability dynamics with inefficient sampling speed. In this paper, we introduce geometric flow matching, which enjoys the advantages of both equivariant modeling and stabilized probability dynamics. More specifically, we propose a hybrid probability path where the coordinates probability path is regularized by an equivariant optimal transport, and the information between different modalities is aligned. Experimentally, the proposed method could consistently achieve better performance on multiple molecule generation benchmarks with 4.75$\times$ speed up of sampling on average. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023

arXiv:2312.03763 [pdf, other]

Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing

Authors: Yushi Lan, Feitong Tan, Di Qiu, Qiangeng Xu, Kyle Genova, Zeng Huang, Sean Fanello, Rohit Pandey, Thomas Funkhouser, Chen Change Loy, Yinda Zhang

Abstract: We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-pla… ▽ More We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-plane payload within each Gaussian rather than directly storing color and opacity. Additionally, we parameterize the Gaussians in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with fine-grained editing over facial features and expressions. Extensive experiments demonstrate the effectiveness of our method. △ Less

Submitted 19 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: project webpage: https://nirvanalan.github.io/projects/gaussian3diff/

arXiv:2312.01972 [pdf, other]

Do** dependence of superconductivity on a honeycomb lattice within the framework of kinetic-energy-driven superconductivity

Authors: Yu Lan, Xian-Feng Yu, Li-Ting Zhang

Abstract: Unconventional superconductivity on a honeycomb lattice has received increasing interest since the discovery of graphene primarily due to the similarities between materials with a honeycomb lattice and cuprate superconductors. Many theoretical studies have been conducted on superconductivity on a honeycomb lattice, however, a consistent picture is still lacking. In this article we have extended th… ▽ More Unconventional superconductivity on a honeycomb lattice has received increasing interest since the discovery of graphene primarily due to the similarities between materials with a honeycomb lattice and cuprate superconductors. Many theoretical studies have been conducted on superconductivity on a honeycomb lattice, however, a consistent picture is still lacking. In this article we have extended the theory of kinetic-energy-driven superconductivity, which has been developed to investigate unconventional superconductivity in cuprate superconductors, to explore superconductivity on a honeycomb lattice within the $t$-$J$ model. Our results demonstrate that the charge-carrier pair gap parameter with $d_{x^{2}-y^{2}}+{\rm i}d_{xy}$-wave symmetry exhibits a dome-like shape as a function of do**, with superconductivity emerging at a certain do** concentration and disappearing at high do** levels, similar to what has been observed in cuprate and cobaltate superconductors. Furthermore, the charge-carrier pair gap parameter decreases with increasing the value of $J/t$ (the antiferromagnetic exchange coupling constant relative to the nearest-neighbor hop** integral), and approaches zero when $J/t$ reaches a sufficiently large value. This indicates that the antiferromagnetic order will suppress the superconducting state and a sufficiently strong exchange coupling will completely destroy the superconductivity. Taking into account our present results together with the corresponding results of cuprate and cobaltate superconductors, it appears that the dome-like shape of the do** dependence of the charge-carrier pair gap parameter may be a common feature in doped Mott insulators. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 8 pages, 2 figures

arXiv:2311.17600 [pdf, other]

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Authors: Xin Liu, Yichen Zhu, **dong Gu, Yunshi Lan, Chao Yang, Yu Qiao

Abstract: The security concerns surrounding Large Language Models (LLMs) have been extensively explored, yet the safety of Multimodal Large Language Models (MLLMs) remains understudied. In this paper, we observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images, as if the text query itself were malicious. To address this, we introduce MM-SafetyBench, a comprehe… ▽ More The security concerns surrounding Large Language Models (LLMs) have been extensively explored, yet the safety of Multimodal Large Language Models (MLLMs) remains understudied. In this paper, we observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images, as if the text query itself were malicious. To address this, we introduce MM-SafetyBench, a comprehensive framework designed for conducting safety-critical evaluations of MLLMs against such image-based manipulations. We have compiled a dataset comprising 13 scenarios, resulting in a total of 5,040 text-image pairs. Our analysis across 12 state-of-the-art models reveals that MLLMs are susceptible to breaches instigated by our approach, even when the equipped LLMs have been safety-aligned. In response, we propose a straightforward yet effective prompting strategy to enhance the resilience of MLLMs against these types of attacks. Our work underscores the need for a concerted effort to strengthen and enhance the safety measures of open-source MLLMs against potential malicious exploits. The resource is available at https://github.com/isXinLiu/MM-SafetyBench △ Less

Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.16160 [pdf, other]

Protein-ligand binding representation learning from fine-grained interactions

Authors: Shikun Feng, Minghao Li, Yinjun Jia, Weiying Ma, Yanyan Lan

Abstract: The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different f… ▽ More The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different from existing pre-training approaches which treat proteins and ligands individually, we emphasize to discern the intricate binding patterns from fine-grained interactions. Specifically, this self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure given a pocket and ligand with a Transformer based interaction module, which naturally emulates the binding process. To ensure the representation of rich binding information, we introduce two pre-training tasks, i.e.~atomic pairwise distance map prediction and mask ligand reconstruction, which comprehensively model the fine-grained interactions from both structure and feature space. Extensive experiments have demonstrated the superiority of our method across various binding tasks, including protein-ligand affinity prediction, virtual screening and protein-ligand docking. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.12035 [pdf, other]

Delta Score: Improving the Binding Assessment of Structure-Based Drug Design Methods

Authors: Minsi Ren, Bowen Gao, Bo Qiang, Yanyan Lan

Abstract: Structure-based drug design (SBDD) stands at the forefront of drug discovery, emphasizing the creation of molecules that target specific binding pockets. Recent advances in this area have witnessed the adoption of deep generative models and geometric deep learning techniques, modeling SBDD as a conditional generation task where the target structure serves as context. Historically, evaluation of th… ▽ More Structure-based drug design (SBDD) stands at the forefront of drug discovery, emphasizing the creation of molecules that target specific binding pockets. Recent advances in this area have witnessed the adoption of deep generative models and geometric deep learning techniques, modeling SBDD as a conditional generation task where the target structure serves as context. Historically, evaluation of these models centered on docking scores, which quantitatively depict the predicted binding affinity between a molecule and its target pocket. Though state-of-the-art models purport that a majority of their generated ligands exceed the docking score of ground truth ligands in test sets, it begs the question: Do these scores align with real-world biological needs? In this paper, we introduce the delta score, a novel evaluation metric grounded in tangible pharmaceutical requisites. Our experiments reveal that molecules produced by current deep generative models significantly lag behind ground truth reference ligands when assessed with the delta score. This novel metric not only complements existing benchmarks but also provides a pivotal direction for subsequent research in the domain. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2311.09050 [pdf, other]

Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts

Authors: Yunshi Lan, Xiang Li, Xin Liu, Yang Li, Wei Qin, Weining Qian

Abstract: Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions, information across multi-modalities is bridged and Large Language Models (LLMs) can apply their strong zero-shot generalization capability to unseen questions. T… ▽ More Zero-shot Visual Question Answering (VQA) is a prominent vision-language task that examines both the visual and textual understanding capability of systems in the absence of training data. Recently, by converting the images into captions, information across multi-modalities is bridged and Large Language Models (LLMs) can apply their strong zero-shot generalization capability to unseen questions. To design ideal prompts for solving VQA via LLMs, several studies have explored different strategies to select or generate question-answer pairs as the exemplar prompts, which guide LLMs to answer the current questions effectively. However, they totally ignore the role of question prompts. The original questions in VQA tasks usually encounter ellipses and ambiguity which require intermediate reasoning. To this end, we present Reasoning Question Prompts for VQA tasks, which can further activate the potential of LLMs in zero-shot scenarios. Specifically, for each question, we first generate self-contained questions as reasoning question prompts via an unsupervised question edition module considering sentence fluency, semantic integrity and syntactic invariance. Each reasoning question prompt clearly indicates the intent of the original question. This results in a set of candidate answers. Then, the candidate answers associated with their confidence scores acting as answer heuristics are fed into LLMs and produce the final answer. We evaluate reasoning question prompts on three VQA challenges, experimental results demonstrate that they can significantly improve the results of LLMs on zero-shot setting and outperform existing state-of-the-art zero-shot methods on three out of four data sets. Our source code is publicly released at \url{https://github.com/ECNU-DASE-NLP/RQP}. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.04906 [pdf, other]

doi 10.1145/3583780.3615119

FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation

Authors: Hanyue Du, Yike Zhao, Qingyuan Tian, Jiani Wang, Lei Wang, Yunshi Lan, Xuesong Lu

Abstract: Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is critical for interpreting and diagnosing CGEC approaches. To address this limitation, we introduce… ▽ More Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is critical for interpreting and diagnosing CGEC approaches. To address this limitation, we introduce FlaCGEC, which is a new CGEC dataset featured with fine-grained linguistic annotation. Specifically, we collect raw corpus from the linguistic schema defined by Chinese language experts, conduct edits on sentences via rules, and refine generated samples manually, which results in 10k sentences with 78 instantiated grammar points and 3 types of edits. We evaluate various cutting-edge CGEC methods on the proposed FlaCGEC dataset and their unremarkable results indicate that this dataset is challenging in covering a large range of grammatical errors. In addition, we also treat FlaCGEC as a diagnostic dataset for testing generalization skills and conduct a thorough evaluation of existing CGEC models. △ Less

Submitted 26 September, 2023; originally announced November 2023.

arXiv:2311.03955 [pdf]

Elastic Information Bottleneck

Authors: Yuyan Ni, Yanyan Lan, Ao Liu, Zhiming Ma

Abstract: Information bottleneck is an information-theoretic principle of representation learning that aims to learn a maximally compressed representation that preserves as much information about labels as possible. Under this principle, two different methods have been proposed, i.e., information bottleneck (IB) and deterministic information bottleneck (DIB), and have gained significant progress in explaini… ▽ More Information bottleneck is an information-theoretic principle of representation learning that aims to learn a maximally compressed representation that preserves as much information about labels as possible. Under this principle, two different methods have been proposed, i.e., information bottleneck (IB) and deterministic information bottleneck (DIB), and have gained significant progress in explaining the representation mechanisms of deep learning algorithms. However, these theoretical and empirical successes are only valid with the assumption that training and test data are drawn from the same distribution, which is clearly not satisfied in many real-world applications. In this paper, we study their generalization abilities within a transfer learning scenario, where the target error could be decomposed into three components, i.e., source empirical error, source generalization gap (SG), and representation discrepancy (RD). Comparing IB and DIB on these terms, we prove that DIB's SG bound is tighter than IB's while DIB's RD is larger than IB's. Therefore, it is difficult to tell which one is better. To balance the trade-off between SG and the RD, we propose an elastic information bottleneck (EIB) to interpolate between the IB and DIB regularizers, which guarantees a Pareto frontier within the IB framework. Additionally, simulations and real data experiments show that EIB has the ability to achieve better domain adaptation results than IB and DIB, which validates the correctness of our theories. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.02124 [pdf, other]

Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Authors: Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

Abstract: While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their… ▽ More While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their accuracy is often compromised by ad-hoc noise design, leading to inaccurate learned force fields. To address this limitation, this paper proposes a new method for molecular pre-training, called sliced denoising (SliDe), which is based on the classical mechanical intramolecular potential theory. SliDe utilizes a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. Additionally, it introduces a random slicing approach that circumvents the computationally expensive calculation of the Jacobian matrix, which is otherwise essential for estimating the force field. By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.18682 [pdf, ps, other]

Lusztig sheaves and tensor products of integrable highest weight modules

Authors: Jiepeng Fang, Yixin Lan

Abstract: By introducing $N$-framed quivers, we define the localization of Lusztig's sheaves for $N$-framed quivers and functors $E^{(n)}_{i}, F^{(n)}_{i}, K^{\pm}_i$ for localizations. This gives a categorical realization of tensor products of integrable highest weight modules of the quantized envelo** algebra. The simple perverse sheaves in the localization provide the canonical basis of tensor products… ▽ More By introducing $N$-framed quivers, we define the localization of Lusztig's sheaves for $N$-framed quivers and functors $E^{(n)}_{i}, F^{(n)}_{i}, K^{\pm}_i$ for localizations. This gives a categorical realization of tensor products of integrable highest weight modules of the quantized envelo** algebra. The simple perverse sheaves in the localization provide the canonical basis of tensor products. It is bar-invariant and has positivity. Moreover, we give a categorical interpretation of the Yang-Baxter equation. △ Less

Submitted 30 October, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: 35 pages, arXiv admin note: text overlap with arXiv:2307.16131

MSC Class: 16G20; 17B37

arXiv:2310.16535 [pdf, other]

R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context

Authors: Qingyuan Tian, Hanlun Zhu, Lei Wang, Yang Li, Yunshi Lan

Abstract: With the help of Chain-of-Thought (CoT) prompting, Large Language Models (LLMs) have achieved remarkable performance on various reasoning tasks. However, most of them have been evaluated under noise-free context and the dilemma for LLMs to produce inaccurate results under the noisy context has not been fully investigated. Existing studies utilize trigger sentences to encourage LLMs to concentrate… ▽ More With the help of Chain-of-Thought (CoT) prompting, Large Language Models (LLMs) have achieved remarkable performance on various reasoning tasks. However, most of them have been evaluated under noise-free context and the dilemma for LLMs to produce inaccurate results under the noisy context has not been fully investigated. Existing studies utilize trigger sentences to encourage LLMs to concentrate on the relevant information but the trigger has limited effect on final answer prediction. Inspired by interactive CoT method, where intermediate reasoning steps are promoted by multiple rounds of interaction between users and LLMs, we propose a novel prompting method, namely R$^3$ prompting, for CoT reasoning under noisy context. Specifically, R$^3$ prompting interacts with LLMs to perform key sentence extraction, variable declaration and answer prediction, which corresponds to a thought process of reviewing, rephrasing and resolving. The responses generated at the last interaction will perform as hints to guide toward the responses of the next interaction. Our experiments show that R$^3$ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context. With GPT-3.5-turbo, we observe 3.7% accuracy improvement on average on the reasoning tasks under noisy context compared to the most competitive prompting baseline. More analyses and ablation studies show the robustness and generalization of R$^3$ prompting method in solving reasoning tasks in LLMs under noisy context. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.14985 [pdf, other]

LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

Authors: Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang

Abstract: This paper aims to investigate the open research problem of uncovering the social behaviors of LLM-based agents. To achieve this goal, we adopt Avalon, a representative communication game, as the environment and use system prompts to guide LLM agents to play the game. While previous studies have conducted preliminary investigations into gameplay with LLM agents, there lacks research on their socia… ▽ More This paper aims to investigate the open research problem of uncovering the social behaviors of LLM-based agents. To achieve this goal, we adopt Avalon, a representative communication game, as the environment and use system prompts to guide LLM agents to play the game. While previous studies have conducted preliminary investigations into gameplay with LLM agents, there lacks research on their social behaviors. In this paper, we present a novel framework designed to seamlessly adapt to Avalon gameplay. The core of our proposed framework is a multi-agent system that enables efficient communication and interaction among agents. We evaluate the performance of our framework based on metrics from two perspectives: winning the game and analyzing the social behaviors of LLM agents. Our results demonstrate the effectiveness of our framework in generating adaptive and intelligent agents and highlight the potential of LLM-based agents in addressing the challenges associated with dynamic social environment interaction. By analyzing the social behaviors of LLM agents from the aspects of both collaboration and confrontation, we provide insights into the research and applications of this domain. Our code is publicly available at https://github.com/3DAgentWorld/LLM-Game-Agent △ Less

Submitted 7 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14530 [pdf, other]

doi 10.3847/1538-4357/ad05ca

Mock Observations: Formation and Evolution of diffuse light in Galaxy Groups and Clusters in the IllustrisTNG Simulations

Authors: Lin Tang, Weipeng Lin, Yang Wang, **g Li, Yanyao Lan

Abstract: In this paper, by analyzing mock images from the IllustrisTNG100-1 simulation, we examine the properties of the diffuse light and compare them to those of central and satellite galaxies. Our findings suggest that the majority of the diffuse light originates from satellites. This claim is supported by the similarity between the age and metallicity distributions of the diffuse light and those of the… ▽ More In this paper, by analyzing mock images from the IllustrisTNG100-1 simulation, we examine the properties of the diffuse light and compare them to those of central and satellite galaxies. Our findings suggest that the majority of the diffuse light originates from satellites. This claim is supported by the similarity between the age and metallicity distributions of the diffuse light and those of the satellites. Notably, the color distribution of the diffuse light gradually evolves to resemble that of the centrals at lower redshifts, suggesting a coevolution or passive process. The radial profiles of the diffuse light reveal distinct trends, with the inner regions displaying a relatively flat distribution and the outer regions showing a descending pattern. This finding suggests that the formation of the diffuse light is influenced by both major mergers and stellar tidal strip**. Moreover, strong correlations are found between the stellar mass of the diffuse light and the overall stellar mass of the satellites, as well as between the stellar mass of the diffuse light and the number of satellites within groups or clusters. These relationships can be described by power-law and logarithmic functions. Overall, the diffuse light components predominantly originate from satellites with intermediate ages and metallicities. These satellites typically fall within the stellar mass range of $\rm 8<\log_{10}M_{star}/M_{\odot}< 10$ and the color range of $\rm -1<[g-r]^{0.1}< 0$. As the redshift decreases, the growth of the diffuse light is primarily influenced by the redder satellites, while the most massive and reddest satellites have minimal roles in its growth. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 8 figures, 2 tables, Accepted for publication in ApJ

Journal ref: The Astrophysical Journal, 2023, Volume 959, Number 2

arXiv:2310.14216 [pdf, other]

UniMAP: Universal SMILES-Graph Representation Learning

Authors: Shikun Feng, Lixin Yang, Weiying Ma, Yanyan Lan

Abstract: Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may le… ▽ More Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may lead to contrary molecular properties. In this paper, we propose a universal SMILE-graph representation learning model, namely UniMAP. Firstly, an embedding layer is employed to obtain the token and node/edge representation in SMILES and graph, respectively. A multi-layer Transformer is then utilized to conduct deep cross-modality fusion. Specially, four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global (i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to achieve comprehensive cross-modality fusion. We evaluate UniMAP on various downstream tasks, i.e. molecular property prediction, drug-target affinity prediction and drug-drug interaction. Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.We also visualize the learned representations to demonstrate the effect of multi-modality integration. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.11295 [pdf, other]

CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

Authors: Zhaojie Chu, Kailing Guo, Xiaofen Xing, Yilin Lan, Bolun Cai, Xiangmin Xu

Abstract: Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly map** single-level speech features to the entire facial animation, which… ▽ More Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly map** single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity metric is defined to distinguish between strong and weak facial activity, obtained by computing the short-time Fourier transform of facial vertex displacements. Based on the variances in facial activity, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/ △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.08395 [pdf, other]

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation

Authors: Yuanyuan Liang, Jianing Wang, Hanlun Zhu, Lei Wang, Weining Qian, Yunshi Lan

Abstract: The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generati… ▽ More The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively. △ Less

Submitted 23 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP 2023 main conference

Showing 1–50 of 314 results for author: Lan, Y