Search | arXiv e-print repository

Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

Authors: Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Hao Yang, Tong Xiao

Abstract: With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required by training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being… ▽ More With contributions from the open-source community, a vast amount of instruction tuning (IT) data has emerged. Given the significant resource allocation required by training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data. However, existing methods for instruction data selection have limitations such as relying on fragile external APIs, being affected by biases in GPT models, or reducing the diversity of the selected instruction dataset. In this paper, we propose an industrial-friendly, expert-aligned and diversity-preserved instruction data selection method: Clustering and Ranking (CaR). CaR consists of two steps. The first step involves ranking instruction pairs using a scoring model that is well aligned with expert preferences (achieving an accuracy of 84.25%). The second step involves preserving dataset diversity through a clustering process.In our experiment, CaR selected a subset containing only 1.96% of Alpaca's IT data, yet the underlying AlpaCaR model trained on this subset outperforms Alpaca by an average of 32.1% in GPT-4 evaluations. Furthermore, our method utilizes small models (355M parameters) and requires only 11.2% of the monetary cost compared to existing methods, making it easily deployable in industrial scenarios. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.18166 [pdf, other]

Sequence-level Semantic Representation Fusion for Recommender Systems

Authors: Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, **peng Wang, Mingchen Cai, Wayne Xin Zhao

Abstract: With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated \emph{textual data} of items (eg product title) and study how text features can be effectively fused with ID features in sequential recommendation. However, there exists distinct data charact… ▽ More With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated \emph{textual data} of items (eg product title) and study how text features can be effectively fused with ID features in sequential recommendation. However, there exists distinct data characteristics for the two kinds of item features, making a direct fusion method (eg adding text and ID embeddings as item representation) become less effective. To address this issue, we propose a novel {\ul \emph{Te}}xt-I{\ul \emph{D}} semantic fusion approach for sequential {\ul \emph{Rec}}ommendation, namely \textbf{\our}. The core idea of our approach is to conduct a sequence-level semantic fusion approach by better integrating global contexts. The key strategy lies in that we transform the text embeddings and ID embeddings by Fourier Transform from \emph{time domain} to \emph{frequency domain}. In the frequency domain, the global sequential characteristics of the original sequences are inherently aggregated into the transformed representations, so that we can employ simple multiplicative operations to effectively fuse the two kinds of item features. Our fusion approach can be proved to have the same effects of contextual convolution, so as to achieving sequence-level semantic fusion. In order to further improve the fusion performance, we propose to enhance the discriminability of the text embeddings from the text encoder, by adaptively injecting positional information via a mixture-of-experts~(MoE) modulation method. Our implementation is available at this repository: \textcolor{magenta}{\url{https://github.com/RUCAIBox/TedRec}}. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 8 pages, 5 figures

arXiv:2402.18144 [pdf]

Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic Information

Authors: Seungjong Sun, Eungu Lee, Dongyan Nan, Xiangying Zhao, Wonbyung Lee, Bernard J. Jansen, Jang Hyun Kim

Abstract: Large language models exhibit societal biases associated with demographic information, including race, gender, and others. Endowing such language models with personalities based on demographic data can enable generating opinions that align with those of humans. Building on this idea, we propose "random silicon sampling," a method to emulate the opinions of the human population sub-group. Our study… ▽ More Large language models exhibit societal biases associated with demographic information, including race, gender, and others. Endowing such language models with personalities based on demographic data can enable generating opinions that align with those of humans. Building on this idea, we propose "random silicon sampling," a method to emulate the opinions of the human population sub-group. Our study analyzed 1) a language model that generates the survey responses that correspond with a human group based solely on its demographic distribution and 2) the applicability of our methodology across various demographic subgroups and thematic questions. Through random silicon sampling and using only group-level demographic information, we discovered that language models can generate response distributions that are remarkably similar to the actual U.S. public opinion polls. Moreover, we found that the replicability of language models varies depending on the demographic group and topic of the question, and this can be attributed to inherent societal biases in the models. Our findings demonstrate the feasibility of mirroring a group's opinion using only demographic distribution and elucidate the effect of social biases in language models on such simulations. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 25 pages, 4 figures, 19 Tables

ACM Class: I.2.7

arXiv:2402.18099 [pdf, other]

Editing Factual Knowledge and Explanatory Ability of Medical Large Language Models

Authors: Derong Xu, Ziheng Zhang, Zhihong Zhu, Zhenxi Lin, Qidong Liu, Xian Wu, Tong Xu, Wanyu Wang, Yuyang Ye, Xiangyu Zhao, Yefeng Zheng, Enhong Chen

Abstract: Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge, while leaving unrelated knowledge intact. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored, even though resol… ▽ More Model editing aims to precisely alter the behaviors of large language models (LLMs) in relation to specific knowledge, while leaving unrelated knowledge intact. This approach has proven effective in addressing issues of hallucination and outdated information in LLMs. However, the potential of using model editing to modify knowledge in the medical field remains largely unexplored, even though resolving hallucination is a pressing need in this area. Our observations indicate that current methods face significant challenges in dealing with specialized and complex knowledge in medical domain. Therefore, we propose MedLaSA, a novel Layer-wise Scalable Adapter strategy for medical model editing. MedLaSA harnesses the strengths of both adding extra parameters and locate-then-edit methods for medical model editing. We utilize causal tracing to identify the association of knowledge in neurons across different layers, and generate a corresponding scale set from the association value for each piece of knowledge. Subsequently, we incorporate scalable adapters into the dense layers of LLMs. These adapters are assigned scaling values based on the corresponding specific knowledge, which allows for the adjustment of the adapter's weight and rank. The more similar the content, the more consistent the scale between them. This ensures precise editing of semantically identical knowledge while avoiding impact on unrelated knowledge. To evaluate the editing impact on the behaviours of LLMs, we propose two model editing studies for medical domain: (1) editing factual knowledge for medical specialization and (2) editing the explanatory ability for complex knowledge. We build two novel medical benchmarking datasets and introduce a series of challenging and comprehensive metrics. Extensive experiments on medical LLMs demonstrate the editing efficiency of MedLaSA, without affecting unrelated knowledge. △ Less

Submitted 4 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.17564 [pdf, other]

Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers

Authors: Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Siyuan Lu, Yaliang Li, Ji-Rong Wen

Abstract: Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradie… ▽ More Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via iterative refinement. In this paper, we propose a novel perspective to investigate the design of LLM-based prompt optimizers, by drawing an analogy with gradient-based model optimizers. To connect these two approaches, we identify two pivotal factors in model parameter learning: update direction and update method. Focused on the two aspects, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies for LLM-based prompt optimizers. By systematically analyzing a rich set of improvement strategies, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO. At each step, it first retrieves relevant prompts from the optimization trajectory as the update direction. Then, it utilizes the generation-based refinement strategy to perform the update, while controlling the edit distance through a cosine-based decay strategy. Extensive experiments demonstrate the effectiveness and efficiency of GPO. In particular, GPO brings an additional improvement of up to 56.8% on Big-Bench Hard and 55.3% on MMLU compared to baseline methods. △ Less

Submitted 16 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17505 [pdf, other]

BASES: Large-scale Web Search User Simulation with Large Language Model based Agents

Authors: Ruiyang Ren, Peng Qiu, Yingqi Qu, **g Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang

Abstract: Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation. Considering the scarcity and limit (e.g., privacy issues) of real user data, in this paper, we conduct large-scale user simulation for web search, to improve the analysis and modeling of user search behavior. Specially, we propose BASES, a novel user simula… ▽ More Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation. Considering the scarcity and limit (e.g., privacy issues) of real user data, in this paper, we conduct large-scale user simulation for web search, to improve the analysis and modeling of user search behavior. Specially, we propose BASES, a novel user simulation framework with LLM-based agents, designed to facilitate comprehensive simulations of web search user behaviors. Our simulation framework can generate unique user profiles at scale, which subsequently leads to diverse search behaviors. To demonstrate the effectiveness of BASES, we conduct evaluation experiments based on two human benchmarks in both Chinese and English, demonstrating that BASES can effectively simulate large-scale human-like search behaviors. To further accommodate the research on web search, we develop WARRIORS, a new large-scale dataset encompassing web search user behaviors, including both Chinese and English versions, which can greatly bolster research in the field of information retrieval. Our code and data will be publicly released soon. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17497 [pdf, other]

REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering

Authors: Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao, **g Liu, Ji-Rong Wen

Abstract: Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of large language models (LLMs). Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved documents, thus likely leading to misleading or even incorrect utilization of external knowledge (i.… ▽ More Considering the limited internal parametric knowledge, retrieval-augmented generation (RAG) has been widely used to extend the knowledge scope of large language models (LLMs). Despite the extensive efforts on RAG research, in existing methods, LLMs cannot precisely assess the relevance of retrieved documents, thus likely leading to misleading or even incorrect utilization of external knowledge (i.e., retrieved documents). To address this issue, in this paper, we propose REAR, a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA). As the key motivation, we aim to enhance the self-awareness of source relevance for LLMs, so as to adaptively utilize external knowledge in RAG systems. Specially, we develop a new architecture for LLM based RAG system, by incorporating a specially designed rank head that precisely assesses the relevance of retrieved documents. Furthermore, we propose an improved training method based on bi-granularity relevance fusion and noise-resistant training. By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches. Our code and data can be accessed at https://github.com/RUCAIBox/REAR. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17334 [pdf, other]

BiVRec: Bidirectional View-based Multimodal Sequential Recommendation

Authors: Jiaxi Hu, **gtong Gao, Xiangyu Zhao, Yuehong Hu, Yuxuan Liang, Yiqi Wang, Ming He, Zitao Liu, Hongzhi Yin

Abstract: The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information… ▽ More The integration of multimodal information into sequential recommender systems has attracted significant attention in recent research. In the initial stages of multimodal sequential recommendation models, the mainstream paradigm was ID-dominant recommendations, wherein multimodal information was fused as side information. However, due to their limitations in terms of transferability and information intrusion, another paradigm emerged, wherein multimodal features were employed directly for recommendation, enabling recommendation across datasets. Nonetheless, it overlooked user ID information, resulting in low information utilization and high training costs. To this end, we propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views, leveraging their synergistic relationship to enhance recommendation performance bidirectionally. To tackle the information heterogeneity issue, we first construct structured user interest representations and then learn the synergistic relationship between them. Specifically, BivRec comprises three modules: Multi-scale Interest Embedding, comprehensively modeling user interests by expanding user interaction sequences with multi-scale patching; Intra-View Interest Decomposition, constructing highly structured interest representations using carefully designed Gaussian attention and Cluster attention; and Cross-View Interest Learning, learning the synergistic relationship between the two recommendation views through coarse-grained overall semantic similarity and fine-grained interest allocation similarity BiVRec achieves state-of-the-art performance on five datasets and showcases various practical advantages. △ Less

Submitted 4 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17124 [pdf, other]

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

Authors: Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

Abstract: For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be thoroughly explored. In this paper, we explore how different prompting strategies influence LLM confidence calibration and how it could be improved. We… ▽ More For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be thoroughly explored. In this paper, we explore how different prompting strategies influence LLM confidence calibration and how it could be improved. We conduct extensive experiments on six prompting methods in the question-answering context and we observe that, while these methods help improve the expected LLM calibration, they also trigger LLMs to be over-confident when responding to some instances. Inspired by human cognition, we propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. First, FaR elicits the known "facts" that are relevant to the input prompt from the LLM. And then it asks the model to "reflect" over them to generate the final answer. Experiments show that FaR prompting achieves significantly better calibration; it lowers the Expected Calibration Error by 23.5% on our multi-purpose QA tasks. Notably, FaR prompting even elicits the capability of verbally expressing concerns in less confident scenarios, which helps trigger retrieval augmentation for solving these harder instances. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 17 pages, 10 figures

arXiv:2402.16500 [pdf, ps, other]

The Map between Symmetries and Orbital Rules to Realize Tunable Band Gap in Quantum Anomalous Hall Effect Material

Authors: Jiaohong Shu, Xinxin Zhao, Weiqin Fan, Lili Wang, Guanglong Chen, Jianbao Wu, Yiming Mi

Abstract: We establish the map between symmetries and orbital rules to realize tunable band gap in quantum anomalous Hall effect material. This band gap is determined by the SOC between local orbitals associated with band crossing, which is constrained by at least one of lattice symmetries. The band gap could be turned on/off by breaking or kee** corresponding lattice symmetry through rotation of magnetiz… ▽ More We establish the map between symmetries and orbital rules to realize tunable band gap in quantum anomalous Hall effect material. This band gap is determined by the SOC between local orbitals associated with band crossing, which is constrained by at least one of lattice symmetries. The band gap could be turned on/off by breaking or kee** corresponding lattice symmetry through rotation of magnetization direction. The components of local orbital related to band crossing is required to match the symmetry, and to produce non-zero SOC when symmetry is broken. Following this map, the TiSb monolayer is predicted to be a quantum anomalous Hall effect material with a band gap adjusted in the range of 0 to 209 meV through magnetization direction rotation. △ Less

Submitted 24 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.16438 [pdf, other]

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Authors: Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen

Abstract: Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially,… ▽ More Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer" the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs. △ Less

Submitted 6 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted by ACL 2024

arXiv:2402.16402 [pdf, other]

Graph Learning with Distributional Edge Layouts

Authors: Xinjian Zhao, Chaolong Ying, Tianshu Yu

Abstract: Graph Neural Networks (GNNs) learn from graph-structured data by passing local messages between neighboring nodes along edges on certain topological layouts. Typically, these topological layouts in modern GNNs are deterministically computed (e.g., attention-based GNNs) or locally sampled (e.g., GraphSage) under heuristic assumptions. In this paper, we for the first time pose that these layouts can… ▽ More Graph Neural Networks (GNNs) learn from graph-structured data by passing local messages between neighboring nodes along edges on certain topological layouts. Typically, these topological layouts in modern GNNs are deterministically computed (e.g., attention-based GNNs) or locally sampled (e.g., GraphSage) under heuristic assumptions. In this paper, we for the first time pose that these layouts can be globally sampled via Langevin dynamics following Boltzmann distribution equipped with explicit physical energy, leading to higher feasibility in the physical world. We argue that such a collection of sampled/optimized layouts can capture the wide energy distribution and bring extra expressivity on top of WL-test, therefore easing downstream tasks. As such, we propose Distributional Edge Layouts (DELs) to serve as a complement to a variety of GNNs. DEL is a pre-processing strategy independent of subsequent GNN variants, thus being highly flexible. Experimental results demonstrate that DELs consistently and substantially improve a series of GNN baselines, achieving state-of-the-art performance on multiple datasets. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 20 pages, 10 figures

arXiv:2402.16371 [pdf, other]

Adaptive Online Learning of Separable Path Graph Transforms for Intra-prediction

Authors: Wen-Yang Lu, Eduardo Pavez, Antonio Ortega, Xin Zhao, Shan Liu

Abstract: Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in tex… ▽ More Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in textures with intricate patterns. This paper introduces a novel adaptive separable path graph-based transform (GBT) that can provide better decorrelation than the DCT for intra-predicted texture data. The proposed GBT is learned in an online scenario with sequential K-means clustering, which groups similar blocks during encoding and decoding to adaptively learn the GBT for the current block from previously reconstructed areas with similar characteristics. A signaling overhead is added to the bitstream of each coding block to indicate the usage of the proposed graph-based transform. We assess the performance of this method combined with H.264/AVC intra-coding tools and demonstrate that it can significantly outperform H.264/AVC DCT for intra-predicted texture data. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 figures

arXiv:2402.16358 [pdf, other]

An Integrated Data Processing Framework for Pretraining Foundation Models

Authors: Yiding Sun, Feng Wang, Yutao Zhu, Wayne Xin Zhao, Jiaxin Mao

Abstract: The ability of the foundation models heavily relies on large-scale, diverse, and high-quality pretraining data. In order to improve data quality, researchers and practitioners often have to manually curate datasets from difference sources and develop dedicated data cleansing pipeline for each data repository. Lacking a unified data processing framework, this process is repetitive and cumbersome. T… ▽ More The ability of the foundation models heavily relies on large-scale, diverse, and high-quality pretraining data. In order to improve data quality, researchers and practitioners often have to manually curate datasets from difference sources and develop dedicated data cleansing pipeline for each data repository. Lacking a unified data processing framework, this process is repetitive and cumbersome. To mitigate this issue, we propose a data processing framework that integrates a Processing Module which consists of a series of operators at different granularity levels, and an Analyzing Module which supports probing and evaluation of the refined data. The proposed framework is easy to use and highly flexible. In this demo paper, we first introduce how to use this framework with some example use cases and then demonstrate its effectiveness in improving the data quality with an automated evaluation with ChatGPT and an end-to-end evaluation in pretraining the GPT-2 model. The code and demonstration videos are accessible on GitHub. △ Less

Submitted 23 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures; accepted by SIGIR'24 demo track

arXiv:2402.16346 [pdf, other]

Boosting Graph Pooling with Persistent Homology

Authors: Chaolong Ying, Xinjian Zhao, Tianshu Yu

Abstract: Recently, there has been an emerging trend to integrate persistent homology (PH) into graph neural networks (GNNs) to enrich expressive power. However, naively plugging PH features into GNN layers always results in marginal improvement with low interpretability. In this paper, we investigate a novel mechanism for injecting global topological invariance into pooling layers using PH, motivated by th… ▽ More Recently, there has been an emerging trend to integrate persistent homology (PH) into graph neural networks (GNNs) to enrich expressive power. However, naively plugging PH features into GNN layers always results in marginal improvement with low interpretability. In this paper, we investigate a novel mechanism for injecting global topological invariance into pooling layers using PH, motivated by the observation that filtration operation in PH naturally aligns graph pooling in a cut-off manner. In this fashion, message passing in the coarsened graph acts along persistent pooled topology, leading to improved performance. Experimentally, we apply our mechanism to a collection of graph pooling methods and observe consistent and substantial performance gain over several popular datasets, demonstrating its wide applicability and flexibility. △ Less

Submitted 1 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15784 [pdf, other]

IRConStyle: Image Restoration Framework Using Contrastive Learning and Style Transfer

Authors: Dongqi Fan, Xin Zhao, Liang Chang

Abstract: Recently, the contrastive learning paradigm has achieved remarkable success in high-level tasks such as classification, detection, and segmentation. However, contrastive learning applied in low-level tasks, like image restoration, is limited, and its effectiveness is uncertain. This raises a question: Why does the contrastive learning paradigm not yield satisfactory results in image restoration? I… ▽ More Recently, the contrastive learning paradigm has achieved remarkable success in high-level tasks such as classification, detection, and segmentation. However, contrastive learning applied in low-level tasks, like image restoration, is limited, and its effectiveness is uncertain. This raises a question: Why does the contrastive learning paradigm not yield satisfactory results in image restoration? In this paper, we conduct in-depth analyses and propose three guidelines to address the above question. In addition, inspired by style transfer and based on contrastive learning, we propose a novel module for image restoration called \textbf{ConStyle}, which can be efficiently integrated into any U-Net structure network. By leveraging the flexibility of ConStyle, we develop a \textbf{general restoration network} for image restoration. ConStyle and the general restoration network together form an image restoration framework, namely \textbf{IRConStyle}. To demonstrate the capability and compatibility of ConStyle, we replace the general restoration network with transformer-based, CNN-based, and MLP-based networks, respectively. We perform extensive experiments on various image restoration tasks, including denoising, deblurring, deraining, and dehazing. The results on 19 benchmarks demonstrate that ConStyle can be integrated with any U-Net-based network and significantly enhance performance. For instance, ConStyle NAFNet significantly outperforms the original NAFNet on SOTS outdoor (dehazing) and Rain100H (deraining) datasets, with PSNR improvements of 4.16 dB and 3.58 dB with 85% fewer parameters. △ Less

Submitted 7 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.15429 [pdf, other]

ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation

Authors: Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao

Abstract: Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is when… ▽ More Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions. However, as is common with many Deep Learning (DL) models, DMs are subject to a lack of robustness. While there are attempts to evaluate the robustness of T2I DMs as a binary or worst-case problem, they cannot answer how robust in general the model is whenever an adversarial example (AE) can be found. In this study, we first introduce a probabilistic notion of T2I DMs' robustness; and then establish an efficient framework, ProTIP, to evaluate it with statistical guarantees. The main challenges stem from: i) the high computational cost of the generation process; and ii) determining if a perturbed input is an AE involves comparing two output distributions, which is fundamentally harder compared to other DL tasks like classification where an AE is identified upon misprediction of labels. To tackle the challenges, we employ sequential analysis with efficacy and futility early stop** rules in the statistical testing for identifying AEs, and adaptive concentration inequalities to dynamically determine the "just-right" number of stochastic perturbations whenever the verification target is met. Empirical experiments validate the effectiveness and efficiency of ProTIP over common T2I DMs. Finally, we demonstrate an application of ProTIP to rank commonly used defence methods. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.15370 [pdf, other]

Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

Authors: Xiaowei Zhao, Yong Zhou, Xiujuan Xu

Abstract: Aspect Sentiment Triple Extraction (ASTE) is an emerging task in fine-grained sentiment analysis. Recent studies have employed Graph Neural Networks (GNN) to model the syntax-semantic relationships inherent in triplet elements. However, they have yet to fully tap into the vast potential of syntactic and semantic information within the ASTE task. In this work, we propose a \emph{Dual Encoder: Explo… ▽ More Aspect Sentiment Triple Extraction (ASTE) is an emerging task in fine-grained sentiment analysis. Recent studies have employed Graph Neural Networks (GNN) to model the syntax-semantic relationships inherent in triplet elements. However, they have yet to fully tap into the vast potential of syntactic and semantic information within the ASTE task. In this work, we propose a \emph{Dual Encoder: Exploiting the potential of Syntactic and Semantic} model (D2E2S), which maximizes the syntactic and semantic relationships among words. Specifically, our model utilizes a dual-channel encoder with a BERT channel to capture semantic information, and an enhanced LSTM channel for comprehensive syntactic information capture. Subsequently, we introduce the heterogeneous feature interaction module to capture intricate interactions between dependency syntax and attention semantics, and to dynamically select vital nodes. We leverage the synergy of these modules to harness the significant potential of syntactic and semantic information in ASTE tasks. Testing on public benchmarks, our D2E2S model surpasses the current state-of-the-art(SOTA), demonstrating its effectiveness. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Accepted by COLING 2024

arXiv:2402.13667 [pdf, other]

GCOF: Self-iterative Text Generation for Copywriting Using Large Language Model

Authors: Jianghui Zhou, Ya Gao, Jie Liu, Xuemin Zhao, Zhaohua Yang, Yue Wu, Lirong Shi

Abstract: Large language models(LLM) such as ChatGPT have substantially simplified the generation of marketing copy, yet producing content satisfying domain specific requirements, such as effectively engaging customers, remains a significant challenge. In this work, we introduce the Genetic Copy Optimization Framework (GCOF) designed to enhance both efficiency and engagememnt of marketing copy creation. We… ▽ More Large language models(LLM) such as ChatGPT have substantially simplified the generation of marketing copy, yet producing content satisfying domain specific requirements, such as effectively engaging customers, remains a significant challenge. In this work, we introduce the Genetic Copy Optimization Framework (GCOF) designed to enhance both efficiency and engagememnt of marketing copy creation. We conduct explicit feature engineering within the prompts of LLM. Additionally, we modify the crossover operator in Genetic Algorithm (GA), integrating it into the GCOF to enable automatic feature engineering. This integration facilitates a self-iterative refinement of the marketing copy. Compared to human curated copy, Online results indicate that copy produced by our framework achieves an average increase in click-through rate (CTR) of over $50\%$. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 8 pages, 5 figures, 1 table

arXiv:2402.13590 [pdf, other]

Tunable topological phases in nanographene-based spin-1/2 alternating-exchange Heisenberg chains

Authors: Chenxiao Zhao, Gonçalo Catarina, **-Jiang Zhang, João C. G. Henriques, Lin Yang, Ji Ma, Xinliang Feng, Oliver Gröning, Pascal Ruffieux, Joaquín Fernández-Rossier, Roman Fasel

Abstract: Unlocking the potential of topological order within many-body spin systems has long been a central pursuit in the realm of quantum materials. Despite extensive efforts, the quest for a versatile platform enabling site-selective spin manipulation, essential for tuning and probing diverse topological phases, has persisted. Here, we utilize on-surface synthesis to construct spin-1/2 alternating-excha… ▽ More Unlocking the potential of topological order within many-body spin systems has long been a central pursuit in the realm of quantum materials. Despite extensive efforts, the quest for a versatile platform enabling site-selective spin manipulation, essential for tuning and probing diverse topological phases, has persisted. Here, we utilize on-surface synthesis to construct spin-1/2 alternating-exchange Heisenberg (AH) chains[1] with antiferromagnetic couplings $J_1$ and $J_2$ by covalently linking Clar's goblets -- nanographenes each hosting two antiferromagnetically-coupled unpaired electrons[2]. Utilizing scanning tunneling microscopy, we exert atomic-scale control over the spin chain lengths, parities and exchange-coupling terminations, and probe their magnetic response by means of inelastic tunneling spectroscopy. Our investigation confirms the gapped nature of bulk excitations in the chains, known as triplons[3]. Besides, the triplon dispersion relation is successfully extracted from the spatial variation of tunneling spectral amplitudes. Furthermore, depending on the parity and termination of chains, we observe varying numbers of in-gap $S=1/2$ edge spins, enabling the determination of the degeneracy of distinct topological ground states in the thermodynamic limit-either 1, 2, or 4. By monitoring interactions between these edge spins, we identify the exponential decay of spin correlations. Our experimental findings, corroborated by theoretical calculations, present a phase-controlled many-body platform, opening promising avenues toward the development of spin-based quantum devices. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13577 [pdf, other]

BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Authors: Xueliang Zhao, Xinting Huang, Tingchen Fu, Qintong Li, Shansan Gong, Lemao Liu, Wei Bi, Lingpeng Kong

Abstract: Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively lever… ▽ More Multimodal reasoning stands as a pivotal capability for large vision-language models (LVLMs). The integration with Domain-Specific Languages (DSL), offering precise visual representations, equips these models with the opportunity to execute more accurate reasoning in complex and professional domains. However, the vanilla Chain-of-Thought (CoT) prompting method faces challenges in effectively leveraging the unique strengths of visual and DSL representations, primarily due to their differing reasoning mechanisms. Additionally, it often falls short in addressing critical steps in multi-step reasoning tasks. To mitigate these challenges, we introduce the \underline{B}i-Modal \underline{B}ehavioral \underline{A}lignment (BBA) prompting method, designed to maximize the potential of DSL in augmenting complex multi-modal reasoning tasks. This method initiates by guiding LVLMs to create separate reasoning chains for visual and DSL representations. Subsequently, it aligns these chains by addressing any inconsistencies, thus achieving a cohesive integration of behaviors from different modalities. Our experiments demonstrate that BBA substantially improves the performance of GPT-4V(ision) on geometry problem solving ($28.34\% \to 34.22\%$), chess positional advantage prediction ($42.08\% \to 46.99\%$) and molecular property prediction ($77.47\% \to 83.52\%$). △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Preprint

arXiv:2402.13508 [pdf, other]

doi 10.3847/1538-4357/ad2b61

PEARLS: NuSTAR and XMM-Newton Extragalactic Survey of the JWST North Ecliptic Pole Time-Domain Field II

Authors: Xiurui Zhao, Francesca Civano, Christopher N. A. Willmer, Silvia Bonoli, Chien-Ting Chen, Samantha Creech, Renato Dupke, Francesca M. Fornasini, Rolf A. Jansen, Satoshi Kikuta, Anton M. Koekemoer, Sibasish Laha, Stefano Marchesi, Rosalia O'Brien, Ross Silver, S. P. Willner, Rogier A. Windhorst, Hao**g Yan, Jailson Alcaniz, Narciso Benitez, Saulo Carneiro, Javier Cenarro, David Cristóbal-Hornillos, Alessandro Ederoclite, Antonio Hernán-Caballero , et al. (8 additional authors not shown)

Abstract: We present the second NuSTAR and XMM-Newton extragalactic survey of the JWST North Ecliptic Pole (NEP) Time-Domain Field (TDF). The first NuSTAR NEP-TDF survey (Zhao et al. 2021) had 681 ks total exposure time executed in NuSTAR cycle 5, in 2019 and 2020. This second survey, acquired from 2020 to 2022 in cycle 6, adds 880 ks of NuSTAR exposure time. The overall NuSTAR NEP-TDF survey is the most se… ▽ More We present the second NuSTAR and XMM-Newton extragalactic survey of the JWST North Ecliptic Pole (NEP) Time-Domain Field (TDF). The first NuSTAR NEP-TDF survey (Zhao et al. 2021) had 681 ks total exposure time executed in NuSTAR cycle 5, in 2019 and 2020. This second survey, acquired from 2020 to 2022 in cycle 6, adds 880 ks of NuSTAR exposure time. The overall NuSTAR NEP-TDF survey is the most sensitive NuSTAR extragalactic survey to date, and a total of 60 sources were detected above the 95% reliability threshold. We constrain the hard X-ray number counts, logN-log S, down to 1.7 x 10$^{-14}$ erg cm$^{-2}$ s$^{-1}$ at 8-24 keV and detect an excess of hard X-ray sources at the faint end. About 47% of the NuSTAR-detected sources are heavily obscured (NH > 10$^{23}$ cm$^{-2}$), and 18+20% of the NuSTAR-detected sources are Compton-thick (N>10$^{24}$ cm$^{-2}$). These fractions are consistent with those measured in other NuSTAR surveys. Four sources presented >2$σ$ variability in the 3-year survey. In addition to NuSTAR, a total of 62 ks of XMM-Newton observations were taken during NuSTAR cycle 6. The XMM-Newton observations provide soft X-ray (0.5-10keV) coverage in the same field and enable more robust identification of the visible and infrared counterparts of the NuSTAR-detected sources. A total of 286 soft X-ray sources were detected, out of which 214 XMM-Newton sources have secure counterparts from multiwavelength catalogs. △ Less

Submitted 21 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 37 pages, 27 figures, Accepted by ApJ

Journal ref: ApJ 965, 188 (2024)

arXiv:2402.13455 [pdf]

Light Bridges and Solar Active Region Evolution Processes

Authors: Fuyu Li, Changhui Rao, Xinhua Zhao, Yang Guo, Xiaoying Gong, Yuhao Chen, Nanbin Xiang, Huaning Wang

Abstract: The formation mechanism of light bridges (LBs) is strongly related to the dynamic evolution of solar active regions (ARs). To study the relationship between LB formation and AR evolution phases, we employ 109 LB samples from 69 ARs in 2014 using observational data from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory (HMI/SDO). LBs are well matched with the weak field l… ▽ More The formation mechanism of light bridges (LBs) is strongly related to the dynamic evolution of solar active regions (ARs). To study the relationship between LB formation and AR evolution phases, we employ 109 LB samples from 69 ARs in 2014 using observational data from the Helioseismic and Magnetic Imager on board the Solar Dynamics Observatory (HMI/SDO). LBs are well matched with the weak field lanes (WFLs), except that aligned on the polarity inversion line of δ sunspots. For penumbral intrusion (type-A) and umbral-dot emergence (type-C) LBs, the WFLs represent the splitting of magnetic flux systems. The sunspots tend to decay and split into several parts after type-A and type-C LBs formed. For sunspot/umbra merging (type-B) LBs, the WFLs declining are caused by collisions of flux systems. The sunspots merge and keep stable after type-B LBs formed. We conclude that type-B LBs are formed by collisions of flux systems, while type-A and type-C LBs are generated by splits. The time differences (δT) between LBs appearing and ARs peaking have average value of 1.06, -1.60, 1.82 for type-A, B, C LBs, with the standard deviation of 3.27, 2.17, 1.89, respectively. A positive value of δT means that the LB appear after AR peaking, whereas a minus δT before the peak. Type-A LBs trend to form in the decaying phase or around the peak time. Type-B LBs are more likely to be formed in the develo** phase. Type-C LBs mostly take shape in the decaying phase of ARs. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.13033 [pdf, other]

Enhancing Real-World Complex Network Representations with Hyperedge Augmentation

Authors: Xiangyu Zhao, Zehui Li, Mingzhu Shen, Guy-Bart Stan, Pietro Liò, Yiren Zhao

Abstract: Graph augmentation methods play a crucial role in improving the performance and enhancing generalisation capabilities in Graph Neural Networks (GNNs). Existing graph augmentation methods mainly perturb the graph structures and are usually limited to pairwise node relations. These methods cannot fully address the complexities of real-world large-scale networks that often involve higher-order node r… ▽ More Graph augmentation methods play a crucial role in improving the performance and enhancing generalisation capabilities in Graph Neural Networks (GNNs). Existing graph augmentation methods mainly perturb the graph structures and are usually limited to pairwise node relations. These methods cannot fully address the complexities of real-world large-scale networks that often involve higher-order node relations beyond only being pairwise. Meanwhile, real-world graph datasets are predominantly modelled as simple graphs, due to the scarcity of data that can be used to form higher-order edges. Therefore, reconfiguring the higher-order edges as an integration into graph augmentation strategies lights up a promising research path to address the aforementioned issues. In this paper, we present Hyperedge Augmentation (HyperAug), a novel graph augmentation method that constructs virtual hyperedges directly form the raw data, and produces auxiliary node features by extracting from the virtual hyperedge information, which are used for enhancing GNN performances on downstream tasks. We design three diverse virtual hyperedge construction strategies to accompany the augmentation scheme: (1) via graph statistics, (2) from multiple data perspectives, and (3) utilising multi-modality. Furthermore, to facilitate HyperAug evaluation, we provide 23 novel real-world graph datasets across various domains including social media, biology, and e-commerce. Our empirical study shows that HyperAug consistently and significantly outperforms GNN baselines and other graph augmentation methods, across a variety of application contexts, which clearly indicates that it can effectively incorporate higher-order node relations into graph augmentation methods for real-world complex networks. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Preprint. Under review. 17 pages, 4 figures, 14 tables. arXiv admin note: text overlap with arXiv:2306.05108

arXiv:2402.12948 [pdf, other]

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

Authors: Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao

Abstract: Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation div… ▽ More Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1. △ Less

Submitted 28 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12573 [pdf, ps, other]

Fully faithful functors, skyscraper sheaves, and birational equivalence

Authors: Chunyi Li, Xun Lin, Xiaolei Zhao

Abstract: Let $X$ and $Y$ be two smooth projective varieties such that there is a fully faithful exact functor from $D^b(\mathrm{Coh}(X))$ to $D^b(\mathrm{Coh}(Y))$. We show that $X$ and $Y$ are birational equivalent if the functor maps one skyscraper sheaf to a skyscraper sheaf. Further assuming that $X$ and $Y$ are of the same dimension, we show that if $X$ has ample canonical bundle and… ▽ More Let $X$ and $Y$ be two smooth projective varieties such that there is a fully faithful exact functor from $D^b(\mathrm{Coh}(X))$ to $D^b(\mathrm{Coh}(Y))$. We show that $X$ and $Y$ are birational equivalent if the functor maps one skyscraper sheaf to a skyscraper sheaf. Further assuming that $X$ and $Y$ are of the same dimension, we show that if $X$ has ample canonical bundle and $H^0(X ,K_X)\neq 0$, or if $X$ is a K3 surface with Picard number one, then $Y$ is birational to a Fourier--Mukai partner of $X$. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 20 pages. Comments are very welcome!

arXiv:2402.11436 [pdf, other]

Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement

Authors: Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, William Yang Wang

Abstract: Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's bias in evaluating their own output. In this paper, we formally define LLM's self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2… ▽ More Recent studies show that large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's bias in evaluating their own output. In this paper, we formally define LLM's self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/llm_self_bias. △ Less

Submitted 18 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11207 [pdf, ps, other]

Search for the production of deuterons and antideuterons in e^+e^- annihilation at center-of-mass energies between 4.13 and 4.70 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (593 additional authors not shown)

Abstract: Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the… ▽ More Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the $e^+e^-\to ppπ^-\bar{d}+c.c.$ cross section is determined to be from 9.0 to 145 fb depending on the center-of-mass energy at the $90\%$ confidence level. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.11163 [pdf, other]

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph

Authors: **hao Jiang, Kun Zhou, Wayne Xin Zhao, Yang Song, Chen Zhu, Hengshu Zhu, Ji-Rong Wen

Abstract: In this paper, we aim to improve the reasoning ability of large language models (LLMs) over knowledge graphs (KGs) to answer complex questions. Inspired by existing methods that design the interaction strategy between LLMs and KG, we propose an autonomous LLM-based agent framework, called KG-Agent, which enables a small LLM to actively make decisions until finishing the reasoning process over KGs.… ▽ More In this paper, we aim to improve the reasoning ability of large language models (LLMs) over knowledge graphs (KGs) to answer complex questions. Inspired by existing methods that design the interaction strategy between LLMs and KG, we propose an autonomous LLM-based agent framework, called KG-Agent, which enables a small LLM to actively make decisions until finishing the reasoning process over KGs. In KG-Agent, we integrate the LLM, multifunctional toolbox, KG-based executor, and knowledge memory, and develop an iteration mechanism that autonomously selects the tool then updates the memory for reasoning over KG. To guarantee the effectiveness, we leverage program language to formulate the multi-hop reasoning process over the KG, and synthesize a code-based instruction dataset to fine-tune the base LLM. Extensive experiments demonstrate that only using 10K samples for tuning LLaMA-7B can outperform state-of-the-art methods using larger LLMs or more data, on both in-domain and out-domain datasets. Our code and data will be publicly released. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: work in progress; efficient 7B LLM-based agent

arXiv:2402.10468 [pdf, other]

Adversarial Curriculum Graph Contrastive Learning with Pair-wise Augmentation

Authors: Xinjian Zhao, Liang Zhang, Yang Liu, Ruocheng Guo, Xiangyu Zhao

Abstract: Graph contrastive learning (GCL) has emerged as a pivotal technique in the domain of graph representation learning. A crucial aspect of effective GCL is the caliber of generated positive and negative samples, which is intrinsically dictated by their resemblance to the original data. Nevertheless, precise control over similarity during sample generation presents a formidable challenge, often impedi… ▽ More Graph contrastive learning (GCL) has emerged as a pivotal technique in the domain of graph representation learning. A crucial aspect of effective GCL is the caliber of generated positive and negative samples, which is intrinsically dictated by their resemblance to the original data. Nevertheless, precise control over similarity during sample generation presents a formidable challenge, often impeding the effective discovery of representative graph patterns. To address this challenge, we propose an innovative framework: Adversarial Curriculum Graph Contrastive Learning (ACGCL), which capitalizes on the merits of pair-wise augmentation to engender graph-level positive and negative samples with controllable similarity, alongside subgraph contrastive learning to discern effective graph patterns therein. Within the ACGCL framework, we have devised a novel adversarial curriculum training methodology that facilitates progressive learning by sequentially increasing the difficulty of distinguishing the generated samples. Notably, this approach transcends the prevalent sparsity issue inherent in conventional curriculum learning strategies by adaptively concentrating on more challenging training data. Finally, a comprehensive assessment of ACGCL is conducted through extensive experiments on six well-known benchmark datasets, wherein ACGCL conspicuously surpasses a set of state-of-the-art baselines. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10405 [pdf, other]

Theory of Wetting Dynamics with Surface Binding

Authors: Xue** Zhao, Susanne Liese, Alf Honigmann, Frank Jülicher, Christoph A. Weber

Abstract: Biomolecules, such as proteins and RNAs, can phase separate in the cytoplasm of cells to form biological condensates. Such condensates are liquid-like droplets that can wet biological surfaces such as membranes. Many molecules that can participate in phase separation can also reversibly bind to membrane surfaces. When a droplet wets such a surface, these molecules can diffuse both inside the dropl… ▽ More Biomolecules, such as proteins and RNAs, can phase separate in the cytoplasm of cells to form biological condensates. Such condensates are liquid-like droplets that can wet biological surfaces such as membranes. Many molecules that can participate in phase separation can also reversibly bind to membrane surfaces. When a droplet wets such a surface, these molecules can diffuse both inside the droplet or in the bound state on the surface. How the interplay between surface binding and surface diffusion affects the wetting kinetics is not well understood. Here, we derive the governing equations using non-equilibrium thermodynamics by relating the diffusive fluxes and forces at the surface coupled to the bulk. We use our theory to study the spreading kinetics in the presence of surface binding and find that binding speeds up wetting by nucleating a droplet inside the surface. Our results are relevant both to artificial systems and to condensates in cells. They suggest that the wetting of droplets in living cells could be regulated by two-dimensional droplets in the surface-bound layer changing the binding affinity to biological surfaces. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.10244 [pdf, other]

doi 10.1002/andp.202300535

Entanglement generation in capacitively coupled Transmon-cavity system

Authors: Jian-Zhuang Wu, Lian-E Lu, Xin-Yu Zhao, Yong-Hong Ma

Abstract: In this paper, the higher energy levels of the transmon qubit are taken into consideration to investigate the continuous variable entanglement generation between the transmon qubit and the single-mode cavity. Based on the framework of cavity quantum electrodynamics, we show the entanglement generation depends on the the driving field intensity, coupling strength, cavity field frequency, and qubit… ▽ More In this paper, the higher energy levels of the transmon qubit are taken into consideration to investigate the continuous variable entanglement generation between the transmon qubit and the single-mode cavity. Based on the framework of cavity quantum electrodynamics, we show the entanglement generation depends on the the driving field intensity, coupling strength, cavity field frequency, and qubit frequency. The numerical results show that strong entanglement can be generated by properly tuning these parameters. It is our hope that the results presented in this paper may lead to a better understanding of quantum entanglement generation in cavity QED system and provide new perspectives for further research in quantum information processing. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.10189 [pdf, other]

Uncertainty Quantification for In-Context Learning of Large Language Models

Authors: Chen Ling, Xujiang Zhao, Xuchao Zhang, Wei Cheng, Yanchi Liu, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Jie Ji, Guangji Bai, Liang Zhao, Haifeng Chen

Abstract: In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) and revolutionized various fields by providing a few task-relevant demonstrations in the prompt. However, trustworthy issues with LLM's response, such as hallucination, have also been actively discussed. Existing works have been devoted to quantifying the uncertainty in LLM's response, but they often overlo… ▽ More In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) and revolutionized various fields by providing a few task-relevant demonstrations in the prompt. However, trustworthy issues with LLM's response, such as hallucination, have also been actively discussed. Existing works have been devoted to quantifying the uncertainty in LLM's response, but they often overlook the complex nature of LLMs and the uniqueness of in-context learning. In this work, we delve into the predictive uncertainty of LLMs associated with in-context learning, highlighting that such uncertainties may stem from both the provided demonstrations (aleatoric uncertainty) and ambiguities tied to the model's configurations (epistemic uncertainty). We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion. Extensive experiments are conducted to demonstrate the effectiveness of the decomposition. The code and data are available at: https://github.com/lingchen0331/UQ_ICL. △ Less

Submitted 28 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: Accepted to the main conference of NAACL 2024

arXiv:2402.09910 [pdf, other]

DE-COP: Detecting Copyrighted Content in Language Models Training Data

Authors: André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei Li

Abstract: How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approa… ▽ More How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content was included in training. DE-COP's core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model's training cutoff, along with their paraphrases. Our experiments show that DE-COP surpasses the prior best method by 9.6% in detection performance (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 4% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP. △ Less

Submitted 25 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

ACM Class: I.2

arXiv:2402.09651 [pdf, other]

Practitioners' Challenges and Perceptions of CI Build Failure Predictions at Atlassian

Authors: Yang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit, Patanamon Thongtanunam, Arik Friedman, Xing Zhao, Anton Krasikov

Abstract: Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key fa… ▽ More Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team's decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes. △ Less

Submitted 14 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.09543 [pdf, other]

Rethinking Large Language Model Architectures for Sequential Recommendations

Authors: Hanbing Wang, Xiaorui Liu, Wenqi Fan, Xiangyu Zhao, Venkataramana Kini, Devendra Yadav, Fei Wang, Zhen Wen, Jiliang Tang, Hui Liu

Abstract: Recently, sequential recommendation has been adapted to the LLM paradigm to enjoy the power of LLMs. LLM-based methods usually formulate recommendation information into natural language and the model is trained to predict the next item in an auto-regressive manner. Despite their notable success, the substantial computational overhead of inference poses a significant obstacle to their real-world ap… ▽ More Recently, sequential recommendation has been adapted to the LLM paradigm to enjoy the power of LLMs. LLM-based methods usually formulate recommendation information into natural language and the model is trained to predict the next item in an auto-regressive manner. Despite their notable success, the substantial computational overhead of inference poses a significant obstacle to their real-world applicability. In this work, we endeavor to streamline existing LLM-based recommendation models and propose a simple yet highly effective model Lite-LLM4Rec. The primary goal of Lite-LLM4Rec is to achieve efficient inference for the sequential recommendation task. Lite-LLM4Rec circumvents the beam search decoding by using a straight item projection head for ranking scores generation. This design stems from our empirical observation that beam search decoding is ultimately unnecessary for sequential recommendations. Additionally, Lite-LLM4Rec introduces a hierarchical LLM structure tailored to efficiently handle the extensive contextual information associated with items, thereby reducing computational overhead while enjoying the capabilities of LLMs. Experiments on three publicly available datasets corroborate the effectiveness of Lite-LLM4Rec in both performance and inference efficiency (notably 46.8% performance improvement and 97.28% efficiency improvement on ML-1m) over existing LLM-based methods. Our implementations will be open sourced. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 8 pages, 5 figures, conference

arXiv:2402.07787 [pdf, other]

Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

Authors: Xiaowei Zhao, Yong Zhou, Xiujuan Xu, Yu Liu

Abstract: Aspect-based Sentiment Analysis (ABSA) evaluates sentiment expressions within a text to comprehend sentiment information. Previous studies integrated external knowledge, such as knowledge graphs, to enhance the semantic features in ABSA models. Recent research has examined the use of Graph Neural Networks (GNNs) on dependency and constituent trees for syntactic analysis. With the ongoing developme… ▽ More Aspect-based Sentiment Analysis (ABSA) evaluates sentiment expressions within a text to comprehend sentiment information. Previous studies integrated external knowledge, such as knowledge graphs, to enhance the semantic features in ABSA models. Recent research has examined the use of Graph Neural Networks (GNNs) on dependency and constituent trees for syntactic analysis. With the ongoing development of ABSA, more innovative linguistic and structural features are being incorporated (e.g. latent graph), but this also introduces complexity and confusion. As of now, a scalable framework for integrating diverse linguistic and structural features into ABSA does not exist. This paper presents the Extensible Multi-Granularity Fusion (EMGF) network, which integrates information from dependency and constituent syntactic, attention semantic , and external knowledge graphs. EMGF, equipped with multi-anchor triplet learning and orthogonal projection, efficiently harnesses the combined potential of each granularity feature and their synergistic interactions, resulting in a cumulative effect without additional computational expenses. Experimental findings on SemEval 2014 and Twitter datasets confirm EMGF's superiority over existing ABSA methods. △ Less

Submitted 4 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 8 pages, 4 figures

arXiv:2402.07031 [pdf, other]

Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration

Authors: Chih-Hong Cheng, Paul Stöckel, Xingyu Zhao

Abstract: Modeling and calibrating the fidelity of synthetic data is paramount in sha** the future of safe and reliable self-driving technology by offering a cost-effective and scalable alternative to real-world data collection. We focus on its role in safety-critical applications, introducing four types of instance-level fidelity that go beyond mere visual input characteristics. The aim is to ensure that… ▽ More Modeling and calibrating the fidelity of synthetic data is paramount in sha** the future of safe and reliable self-driving technology by offering a cost-effective and scalable alternative to real-world data collection. We focus on its role in safety-critical applications, introducing four types of instance-level fidelity that go beyond mere visual input characteristics. The aim is to ensure that applying testing on synthetic data can reveal real-world safety issues, and the absence of safety-critical issues when testing under synthetic data can provide a strong safety guarantee in real-world behavior. We suggest an optimization method to refine the synthetic data generator, reducing fidelity gaps identified by deep learning components. Experiments show this tuning enhances the correlation between safety-critical errors in synthetic and real data. △ Less

Submitted 2 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.06639 [pdf, other]

Social Vulnerabilities and Wildfire Evacuations: A Case Study of the 2019 Kincade Fire

Authors: Yuran Sun, Ana Forrister, Erica D. Kuligowski, Ruggiero Lovreglio, Thomas J. Cova, Xilei Zhao

Abstract: Vulnerable populations are disproportionately impacted by natural hazards like wildfires. It is crucial to develop equitable and effective evacuation strategies to meet their unique needs. While existing studies offer valuable insights, we need to improve our understanding of how vulnerabilities affect wildfire evacuation decision-making, as well as how this varies spatially. The goal of this stud… ▽ More Vulnerable populations are disproportionately impacted by natural hazards like wildfires. It is crucial to develop equitable and effective evacuation strategies to meet their unique needs. While existing studies offer valuable insights, we need to improve our understanding of how vulnerabilities affect wildfire evacuation decision-making, as well as how this varies spatially. The goal of this study is to conduct an in-depth analysis of the impacts of social vulnerabilities on aggregated evacuation decisions, including evacuation rates, delay in departure time, and evacuation destination distances by leveraging large-scale GPS data. Specifically, we inferred evacuation decisions at the census block group level, utilizing GPS data. We then employed ordinary least squares and geographically weighted regression models to investigate the impacts of social vulnerabilities on evacuation decisions. We also used Moran's I to test if these impacts were consistent across different block groups. The 2019 Kincade Fire in Sonoma County, California, was used as the case study. The impacts of social vulnerabilities on evacuation rates show significant spatial variations across block groups, whereas their effects on the other two decision types do not. Additionally, unemployment, a factor under-explored in previous studies, was found to negatively impact both the delay in departure time and destination distances of evacuees at the aggregate level. Furthermore, upon comparing the significant factors across different models, we observed that some of the vulnerabilities influencing evacuation rates for all residents differed from those affecting the delay in departure time and destination distances, which only applied to evacuees. These new insights can guide emergency managers and transportation planners to enhance equitable wildfire evacuation planning and operations. △ Less

Submitted 23 January, 2024; originally announced February 2024.

arXiv:2402.06579 [pdf, ps, other]

Some remarks about deformation theory and formality conjecture

Authors: Huachen Chen, Laura Pertusi, Xiaolei Zhao

Abstract: Using the algebraic criterion proved by Bandiera, Manetti and Meazzini, we show the formality conjecture for universally gluable objects with linearly reductive automorphism groups in the bounded derived category of a K3 surface. As an application, we prove the formality conjecture for polystable objects in the Kuznetsov components of Gushel--Mukai threefolds and quartic double solids. Using the algebraic criterion proved by Bandiera, Manetti and Meazzini, we show the formality conjecture for universally gluable objects with linearly reductive automorphism groups in the bounded derived category of a K3 surface. As an application, we prove the formality conjecture for polystable objects in the Kuznetsov components of Gushel--Mukai threefolds and quartic double solids. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 15 pages, to appear in Annali dell'Università di Ferrara, special volume Edge Days: 2018-2022

arXiv:2402.05864 [pdf, other]

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

Authors: Xuandong Zhao, Lei Li, Yu-Xiang Wang

Abstract: In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. It enjoys robustness properties similar to the standard sampling decoder, but is provably up to 2x better in its quality-robustness tradeoff than sampling and never worse than any other decoder. We also design a cryptographic watermarking scheme analogous to Aaronson's Gumbel watermark, but naturally tailored for… ▽ More In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. It enjoys robustness properties similar to the standard sampling decoder, but is provably up to 2x better in its quality-robustness tradeoff than sampling and never worse than any other decoder. We also design a cryptographic watermarking scheme analogous to Aaronson's Gumbel watermark, but naturally tailored for PF decoder. The watermarking scheme does not change the distribution to sample, while allowing arbitrarily low false positive rate and high recall whenever the generated text has high entropy. Our experiments show that the PF decoder (and its watermarked counterpart) significantly outperform(s) naive sampling (and it's Gumbel watermarked counterpart) in terms of perplexity, while retaining the same robustness (and detectability), hence making it a promising new approach for LLM decoding. The code is available at https://github.com/XuandongZhao/pf-decoding △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05395 [pdf, other]

Efficient Estimation for Functional Accelerated Failure Time Model

Authors: Changyu Liu, Wen Su, Kin-Yat Liu, Guosheng Yin, Xingqiu Zhao

Abstract: We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baselin… ▽ More We propose a functional accelerated failure time model to characterize effects of both functional and scalar covariates on the time to event of interest, and provide regularity conditions to guarantee model identifiability. For efficient estimation of model parameters, we develop a sieve maximum likelihood approach where parametric and nonparametric coefficients are bundled with an unknown baseline hazard function in the likelihood function. Not only do the bundled parameters cause immense numerical difficulties, but they also result in new challenges in theoretical development. By develo** a general theoretical framework, we overcome the challenges arising from the bundled parameters and derive the convergence rate of the proposed estimator. Furthermore, we prove that the finite-dimensional estimator is $\sqrt{n}$-consistent, asymptotically normal and achieves the semiparametric information bound. The proposed inference procedures are evaluated by extensive simulation studies and illustrated with an application to the sequential organ failure assessment data from the Improving Care of Acute Lung Injury Patients study. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.04533 [pdf, ps, other]

Minimizing Block Incentive Volatility Through Verkle Tree-Based Dynamic Transaction Storage

Authors: Xiongfei Zhao, Gerui Zhang, Hou-Wan Long, Yain-Whar Si

Abstract: Transaction fees are a crucial revenue source for miners in public and consortium blockchains. However, while public blockchains have additional revenue streams, transaction fees serve as the primary income for miners in consortium blockchains formed by various financial institutions. These miners allocate different levels of computing resources to process transactions and earn corresponding fees.… ▽ More Transaction fees are a crucial revenue source for miners in public and consortium blockchains. However, while public blockchains have additional revenue streams, transaction fees serve as the primary income for miners in consortium blockchains formed by various financial institutions. These miners allocate different levels of computing resources to process transactions and earn corresponding fees. Nonetheless, relying solely on transaction fees can lead to significant volatility and encourage non-standard mining behaviors, thereby posing threats to the blockchain's security and integrity. Despite previous attempts to mitigate the impact of transaction fees on illicit mining behaviors, a comprehensive solution to this vulnerability is yet to be established. To address this gap, we introduce a novel approach that leverages Dynamic Transaction Storage (DTS) strategies to effectively minimize block incentive volatility. Our solution implements a Verkle tree-based storage mechanism to reduce bandwidth consumption. Moreover, to configure the DTS strategies, we evaluate several optimization algorithms and formulate the challenge as a Vehicle Routing Problem. Our experiments conducted using historical transactions from Bitcoin and remittance data from the Industrial and Commercial Bank of China reveal that the strategy focusing on time-based transaction incorporation priority, while excluding a designated space for small-fee transactions, as discovered by the gradient-based optimizer algorithm, proves most effective in reducing volatility. Hence, the DTS strategy can sustain stable block incentives irrespective of transaction types or user bidding behavior. Furthermore, the inclusion of higher-fee transactions, often smaller in size, can alleviate propagation delays and the occurrence of forks. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.04527 [pdf, other]

RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation

Authors: Xiaohan Yu, Li Zhang, Xin Zhao, Yue Wang, Zhongrui Ma

Abstract: Large language models (LLM) have recently emerged as a powerful tool for a variety of natural language processing tasks, bringing a new surge of combining LLM with recommendation systems, termed as LLM-based RS. Current approaches generally fall into two main paradigms, the ID direct usage paradigm and the ID translation paradigm, noting their core weakness stems from lacking recommendation knowle… ▽ More Large language models (LLM) have recently emerged as a powerful tool for a variety of natural language processing tasks, bringing a new surge of combining LLM with recommendation systems, termed as LLM-based RS. Current approaches generally fall into two main paradigms, the ID direct usage paradigm and the ID translation paradigm, noting their core weakness stems from lacking recommendation knowledge and uniqueness. To address this limitation, we propose a new paradigm, ID representation, which incorporates pre-trained ID embeddings into LLMs in a complementary manner. In this work, we present RA-Rec, an efficient ID representation alignment framework for LLM-based recommendation, which is compatible with multiple ID-based methods and LLM architectures. Specifically, we treat ID embeddings as soft prompts and design an innovative alignment module and an efficient tuning method with tailored data construction for alignment. Extensive experiments demonstrate RA-Rec substantially outperforms current state-of-the-art methods, achieving up to 3.0% absolute HitRate@100 improvements while utilizing less than 10x training data. △ Less

Submitted 19 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 10 pages

arXiv:2402.04519 [pdf, other]

doi 10.1007/s11263-023-01937-0

BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

Authors: Xin Zhao, Shiyu Hu, Yipei Wang, **g Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li

Abstract: Single object tracking (SOT) is a fundamental problem in computer vision, with a wide range of applications, including autonomous driving, augmented reality, and robot navigation. The robustness of SOT faces two main challenges: tiny target and fast motion. These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the… ▽ More Single object tracking (SOT) is a fundamental problem in computer vision, with a wide range of applications, including autonomous driving, augmented reality, and robot navigation. The robustness of SOT faces two main challenges: tiny target and fast motion. These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera. To evaluate the robustness of SOT methods, we propose BioDrone -- the first bionic drone-based visual benchmark for SOT. Unlike existing UAV datasets, BioDrone features videos captured from a flap**-wing UAV system with a major camera shake due to its aerodynamics. BioDrone hence highlights the tracking of tiny targets with drastic changes between consecutive frames, providing a new robust vision benchmark for SOT. To date, BioDrone offers the largest UAV-based SOT benchmark with high-quality fine-grained manual annotations and automatically generates frame-level labels, designed for robust vision analyses. Leveraging our proposed BioDrone, we conduct a systematic evaluation of existing SOT methods, comparing the performance of 20 representative models and studying novel means of optimizing a SOTA method (KeepTrack KeepTrack) for robust SOT. Our evaluation leads to new baselines and insights for robust SOT. Moving forward, we hope that BioDrone will not only serve as a high-quality benchmark for robust SOT, but also invite future research into robust computer vision. The database, toolkits, evaluation server, and baseline results are available at http://biodrone.aitestunion.com. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: This paper is published in IJCV (refer to DOI). Please cite the published IJCV

Journal ref: Int J Comput Vis (2023)

arXiv:2402.03829 [pdf, ps, other]

Precise Measurement of Born Cross Sections for $e^+e^-\to D\bar{D}$ and Observation of One Structure between $\sqrt{s} = 3.80-4.95$ GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (604 additional authors not shown)

Abstract: Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections… ▽ More Using data samples collected with the BESIII detector at the BEPCII collider at center-of-mass energies ranging from 3.80 to 4.95 GeV, corresponding to an integrated luminosity of 20 fb$^{-1}$, a measurement of Born cross sections for the $e^+e^-\to D^{0}\bar{D}^{0}$ and $D^{+}D^{-}$ processes is presented with unprecedented precision. By performing a simultaneous fit to the dressed cross sections for both processes, one possible new structure around 3.9 GeV/$c^2$ is observed for the first time, in addition to seven known resonances $ψ(3770)$, $ψ(4040)$, $ψ(4160)$, $Y(4230)$, $Y(4360)$, $ψ(4415)$, and $Y(4660)$. These results offer crucial experimental insights into the nature of hadron production in the open charm region. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 9 pages, 4 figures, 1 tables, 1 Supplemental_Material

arXiv:2402.03708 [pdf, other]

SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images

Authors: Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan

Abstract: Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark da… ▽ More Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark dataset for fine-grained Ship Instance Segmentation in Panchromatic satellite images, namely SISP, which contains 56,693 well-annotated ship instances with four fine-grained categories across 10,000 sliced images, and all the images are collected from SuperView-1 satellite with the resolution of 0.5m. Targets in the proposed SISP dataset have characteristics that are consistent with real satellite scenes, such as high class imbalance, various scenes, large variations in target densities and scales, and high inter-class similarity and intra-class diversity, all of which make the SISP dataset more suitable for real-world applications. In addition, we introduce a Dynamic Feature Refinement-assist Instance segmentation network, namely DFRInst, as the benchmark method for ship instance segmentation in satellite images, which can fortify the explicit representation of crucial features, thus improving the performance of ship instance segmentation. Experiments and analysis are performed on the proposed SISP dataset to evaluate the benchmark method and several state-of-the-art methods to establish baselines for facilitating future research. The proposed dataset and source codes will be available at: https://github.com/Justlovesmile/SISP. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 14 pages, 9 figures

arXiv:2402.03697 [pdf, other]

SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification

Authors: Nishchal Sapkota, Yejia Zhang, Sirui Li, Peixian Liang, Zhuo Zhao, **g**g Zhang, Xiaomin Zha, Yiru Zhou, Yunxia Cao, Danny Z Chen

Abstract: Male infertility accounts for about one-third of global infertility cases. Manual assessment of sperm abnormalities through head morphology analysis encounters issues of observer variability and diagnostic discrepancies among experts. Its alternative, Computer-Assisted Semen Analysis (CASA), suffers from low-quality sperm images, small datasets, and noisy class labels. We propose a new approach fo… ▽ More Male infertility accounts for about one-third of global infertility cases. Manual assessment of sperm abnormalities through head morphology analysis encounters issues of observer variability and diagnostic discrepancies among experts. Its alternative, Computer-Assisted Semen Analysis (CASA), suffers from low-quality sperm images, small datasets, and noisy class labels. We propose a new approach for sperm head morphology classification, called SHMC-Net, which uses segmentation masks of sperm heads to guide the morphology classification of sperm images. SHMC-Net generates reliable segmentation masks using image priors, refines object boundaries with an efficient graph-based method, and trains an image network with sperm head crops and a mask network with the corresponding masks. In the intermediate stages of the networks, image and mask features are fused with a fusion scheme to better learn morphological features. To handle noisy class labels and regularize training on small datasets, SHMC-Net applies Soft Mixup to combine mixup augmentation and a loss function. We achieve state-of-the-art results on SCIAN and HuSHeM datasets, outperforming methods that use additional pre-training or costly ensembling techniques. △ Less

Submitted 5 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Published on ISBI 2024

arXiv:2402.02803 [pdf, other]

Large Language Model Distilling Medication Recommendation Model

Authors: Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Zijian Zhang, Feng Tian, Yefeng Zheng

Abstract: The recommendation of medication is a vital aspect of intelligent healthcare systems, as it involves prescribing the most suitable drugs based on a patient's specific health needs. Unfortunately, many sophisticated models currently in use tend to overlook the nuanced semantics of medical data, while only relying heavily on identities. Furthermore, these models face significant challenges in handli… ▽ More The recommendation of medication is a vital aspect of intelligent healthcare systems, as it involves prescribing the most suitable drugs based on a patient's specific health needs. Unfortunately, many sophisticated models currently in use tend to overlook the nuanced semantics of medical data, while only relying heavily on identities. Furthermore, these models face significant challenges in handling cases involving patients who are visiting the hospital for the first time, as they lack prior prescription histories to draw upon. To tackle these issues, we harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs). Our research aims to transform existing medication recommendation methodologies using LLMs. In this paper, we introduce a novel approach called Large Language Model Distilling Medication Recommendation (LEADER). We begin by creating appropriate prompt templates that enable LLMs to suggest medications effectively. However, the straightforward integration of LLMs into recommender systems leads to an out-of-corpus issue specific to drugs. We handle it by adapting the LLMs with a novel output layer and a refined tuning loss function. Although LLM-based models exhibit remarkable capabilities, they are plagued by high computational costs during inference, which is impractical for the healthcare sector. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model. Extensive experiments conducted on two real-world datasets, MIMIC-III and MIMIC-IV, demonstrate that our proposed model not only delivers effective results but also is efficient. To ease the reproducibility of our experiments, we release the implementation code online. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02643 [pdf, other]

LLM-Enhanced Data Management

Authors: Xuanhe Zhou, Xinyang Zhao, Guoliang Li

Abstract: Machine learning (ML) techniques for optimizing data management problems have been extensively studied and widely deployed in recent five years. However traditional ML methods have limitations on generalizability (adapting to different scenarios) and inference ability (understanding the context). Fortunately, large language models (LLMs) have shown high generalizability and human-competitive abili… ▽ More Machine learning (ML) techniques for optimizing data management problems have been extensively studied and widely deployed in recent five years. However traditional ML methods have limitations on generalizability (adapting to different scenarios) and inference ability (understanding the context). Fortunately, large language models (LLMs) have shown high generalizability and human-competitive abilities in understanding context, which are promising for data management tasks (e.g., database diagnosis, database tuning). However, existing LLMs have several limitations: hallucination, high cost, and low accuracy for complicated tasks. To address these challenges, we design LLMDB, an LLM-enhanced data management paradigm which has generalizability and high inference ability while avoiding hallucination, reducing LLM cost, and achieving high accuracy. LLMDB embeds domain-specific knowledge to avoid hallucination by LLM fine-tuning and prompt engineering. LLMDB reduces the high cost of LLMs by vector databases which provide semantic search and caching abilities. LLMDB improves the task accuracy by LLM agent which provides multiple-round inference and pipeline executions. We showcase three real-world scenarios that LLMDB can well support, including query rewrite, database diagnosis and data analytics. We also summarize the open research challenges of LLMDB. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Showing 251–300 of 3,162 results for author: Zhao, X