Search | arXiv e-print repository

A Platform for All-optical Thomson/ Compton Scattering with Versatile Parameters

Authors: Siyu Chen, Wenchao Yan, Mingyang Zhu, Yaojun Li, Xichen Hu, Hao Xu, Jie Feng, Xulei Ge, Wenzhao Wang, Guangwei Lu, Mingxuan Wei, Lin Lu, Xiaojun Huang, Boyuan Li, Xiaohui Yuan, Feng Liu, Min Chen, Liming Chen, Jie Zhang

Abstract: A dual-beam platform for all-optical electron-photon scattering, or Thomson/Compton scattering, with adjustable collision-angle and parameter tuning ability has been developed, which, in principle, can be used for the verification of strong-field quantum electrodynamics effects. Combining this platform with a 200 TW Ti:Sapphire laser system, we demonstrated the generation of inverse Compton scatte… ▽ More A dual-beam platform for all-optical electron-photon scattering, or Thomson/Compton scattering, with adjustable collision-angle and parameter tuning ability has been developed, which, in principle, can be used for the verification of strong-field quantum electrodynamics effects. Combining this platform with a 200 TW Ti:Sapphire laser system, we demonstrated the generation of inverse Compton scattering X/gamma-rays with tunable energies from tens of keV to MeV. The polarization of X/gamma radiation was manipulated by controlling the polarization of scattering laser. In the near future, by combining this experimental platform with multi-PW laser facilities, it is proposed to experimentally generate X/gamma radiation with orbital angular momentum for the nuclear isomer excitation, and more importantly, to explore the regime transition from nonlinear Thomson scattering to nonlinear Compton scattering, eventually to demonstrate the verification of theories on extremely strong field quantum electrodynamics effects. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.12242 [pdf, other]

CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

Authors: Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang

Abstract: Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE,… ▽ More Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, 7 figures, accepted to LREC-COLING 2024

arXiv:2404.11943 [pdf, other]

AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration

Authors: Bo Pan, Jiaying Lu, Ke Wang, Li Zheng, Zhen Wen, Yingchaojie Feng, Minfeng Zhu, Wei Chen

Abstract: The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with… ▽ More The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with existing coordination frameworks. This difficulty stems from the inherent ambiguity of natural language for specifying the collaboration process and the significant cognitive effort required to extract crucial information (e.g. agent relationship, task dependency, result correspondence) from a vast amount of text-form content during exploration. In this work, we present a visual exploration framework to facilitate the design of coordination strategies in multi-agent collaboration. We first establish a structured representation for LLM-based multi-agent coordination strategy to regularize the ambiguity of natural language. Based on this structure, we devise a three-stage generation method that leverages LLMs to convert a user's general goal into an executable initial coordination strategy. Users can further intervene at any stage of the generation process, utilizing LLMs and a set of interactions to explore alternative strategies. Whenever a satisfactory strategy is identified, users can commence the collaboration and examine the visually enhanced execution result. We develop AgentCoord, a prototype interactive system, and conduct a formal user study to demonstrate the feasibility and effectiveness of our approach. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10985 [pdf, ps, other]

Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images

Authors: Junbiao Pang, Zailin Dong, Jiaxin Deng, Mengyuan Zhu, Yunwei Zhang

Abstract: Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD revision, semantic-based management, and the generation of 3D prototypes in both the architecture and engineering industries. Labeling symbols from a CAD drawing is a challenging yet notorious task from a practical point of view. In this work, we propose to label and spot symbols from CAD images that are converted from CAD… ▽ More Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD revision, semantic-based management, and the generation of 3D prototypes in both the architecture and engineering industries. Labeling symbols from a CAD drawing is a challenging yet notorious task from a practical point of view. In this work, we propose to label and spot symbols from CAD images that are converted from CAD drawings. The advantage of spotting symbols from CAD images lies in the low requirement of labelers and the low-cost annotation. However, pixel-wise spotting symbols from CAD images is challenging work. We propose a pixel-wise point location via Progressive Gaussian Kernels (PGK) to balance between training efficiency and location accuracy. Besides, we introduce a local offset to the heatmap-based point location method. Based on the keypoints detection, we propose a symbol grou** method to redraw the rectangle symbols in CAD images. We have released a dataset containing CAD images of equipment rooms from telecommunication industrial CAD drawings. Extensive experiments on this real-world dataset show that the proposed method has good generalization ability. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 10 pages, 10 figures,6 tables

arXiv:2404.09322 [pdf]

The intelligent prediction and assessment of financial information risk in the cloud computing model

Authors: Yufu Wang, Mingwei Zhu, Jiaqiang Yuan, Guanghui Wang, Hong Zhou

Abstract: Cloud computing (cloud computing) is a kind of distributed computing, referring to the network "cloud" will be a huge data calculation and processing program into countless small programs, and then, through the system composed of multiple servers to process and analyze these small programs to get the results and return to the user. This report explores the intersection of cloud computing and finan… ▽ More Cloud computing (cloud computing) is a kind of distributed computing, referring to the network "cloud" will be a huge data calculation and processing program into countless small programs, and then, through the system composed of multiple servers to process and analyze these small programs to get the results and return to the user. This report explores the intersection of cloud computing and financial information processing, identifying risks and challenges faced by financial institutions in adopting cloud technology. It discusses the need for intelligent solutions to enhance data processing efficiency and accuracy while addressing security and privacy concerns. Drawing on regulatory frameworks, the report proposes policy recommendations to mitigate concentration risks associated with cloud computing in the financial industry. By combining intelligent forecasting and evaluation technologies with cloud computing models, the study aims to provide effective solutions for financial data processing and management, facilitating the industry's transition towards digital transformation. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08793 [pdf, other]

JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models

Authors: Yingchaojie Feng, Zhizhang Chen, Zhining Kang, Sijia Wang, Minfeng Zhu, Wei Zhang, Wei Chen

Abstract: The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential we… ▽ More The proliferation of large language models (LLMs) has underscored concerns regarding their security vulnerabilities, notably against jailbreak attacks, where adversaries design jailbreak prompts to circumvent safety mechanisms for potential misuse. Addressing these concerns necessitates a comprehensive analysis of jailbreak prompts to evaluate LLMs' defensive capabilities and identify potential weaknesses. However, the complexity of evaluating jailbreak performance and understanding prompt characteristics makes this analysis laborious. We collaborate with domain experts to characterize problems and propose an LLM-assisted framework to streamline the analysis process. It provides automatic jailbreak assessment to facilitate performance evaluation and support analysis of components and keywords in prompts. Based on the framework, we design JailbreakLens, a visual analysis system that enables users to explore the jailbreak performance against the target model, conduct multi-level analysis of prompt characteristics, and refine prompt instances to verify findings. Through a case study, technical evaluations, and expert interviews, we demonstrate our system's effectiveness in hel** users evaluate model security and identify model weaknesses. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Submitted to VIS 2024

arXiv:2404.08004 [pdf, other]

GRANP: A Graph Recurrent Attentive Neural Process Model for Vehicle Trajectory Prediction

Authors: Yuhao Luo, Kehua Chen, Meixin Zhu

Abstract: As a vital component in autonomous driving, accurate trajectory prediction effectively prevents traffic accidents and improves driving efficiency. To capture complex spatial-temporal dynamics and social interactions, recent studies developed models based on advanced deep-learning methods. On the other hand, recent studies have explored the use of deep generative models to further account for traje… ▽ More As a vital component in autonomous driving, accurate trajectory prediction effectively prevents traffic accidents and improves driving efficiency. To capture complex spatial-temporal dynamics and social interactions, recent studies developed models based on advanced deep-learning methods. On the other hand, recent studies have explored the use of deep generative models to further account for trajectory uncertainties. However, the current approaches demonstrating indeterminacy involve inefficient and time-consuming practices such as sampling from trained models. To fill this gap, we proposed a novel model named Graph Recurrent Attentive Neural Process (GRANP) for vehicle trajectory prediction while efficiently quantifying prediction uncertainty. In particular, GRANP contains an encoder with deterministic and latent paths, and a decoder for prediction. The encoder, including stacked Graph Attention Networks, LSTM and 1D convolutional layers, is employed to extract spatial-temporal relationships. The decoder is used to learn a latent distribution and thus quantify prediction uncertainty. To reveal the effectiveness of our model, we evaluate the performance of GRANP on the highD dataset. Extensive experiments show that GRANP achieves state-of-the-art results and can efficiently quantify uncertainties. Additionally, we undertake an intuitive case study that showcases the interpretability of the proposed approach. The code is available at https://github.com/joy-driven/GRANP. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06591 [pdf, other]

Milgram's experiment in the knowledge space: Individual navigation strategies

Authors: Manran Zhu, János Kertész

Abstract: Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and informati… ▽ More Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing a graph embedding trained on the English Wikipedia, our study identified distinctive strategies that participants adopt: when the target is a famous person, participants typically use the geographical and occupational information of the target to navigate, reminiscent of hub-driven and proximity-driven approaches, respectively. We discovered that many participants playing the same game exhibit a "wisdom of the crowd" effect: The set of strategies provide a good estimate for the information landscape around the target indicating that the individual differences complement each other. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 25 pages, 8 figures

arXiv:2404.05464 [pdf, ps, other]

Parity violation in primordial tensor non-Gaussianities from matter bounce cosmology

Authors: Shingo Akama, Mian Zhu

Abstract: It has been shown that primordial tensor non-Gaussianities from a cubic Weyl action with a non-dynamical coupling are suppressed by the so-called slow-roll parameter in a conventional framework of slow-roll inflation. In this paper, we consider matter bounce cosmology in which the background spacetime is no longer quasi-de Sitter, and hence one might expect that the matter bounce models could pred… ▽ More It has been shown that primordial tensor non-Gaussianities from a cubic Weyl action with a non-dynamical coupling are suppressed by the so-called slow-roll parameter in a conventional framework of slow-roll inflation. In this paper, we consider matter bounce cosmology in which the background spacetime is no longer quasi-de Sitter, and hence one might expect that the matter bounce models could predict non-suppressed non-Gaussianities. Nevertheless, we first show that the corresponding non-Gaussian amplitudes from the cubic Weyl term with a non-dynamical coupling are much smaller than those from the conventional slow-roll inflation, in spite of the fact that there is no slow-roll suppression. We then introduce a dynamical coupling that can boost the magnitude of graviton cubic interactions and clarify that there is a parameter region where the tensor non-Gaussianities can be enhanced and can potentially be tested by cosmic microwave background experiments. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.02937 [pdf, other]

Towards Responsible and Reliable Traffic Flow Prediction with Large Language Models

Authors: Xusen Guo, Qiming Zhang, Junyue Jiang, Mingxing Peng, Hao, Yang, Meixin Zhu

Abstract: Traffic forecasting is crucial for intelligent transportation systems. It has experienced significant advancements thanks to the power of deep learning in capturing latent patterns of traffic data. However, recent deep-learning architectures require intricate model designs and lack an intuitive understanding of the map** from input data to predicted results. Achieving both accuracy and responsib… ▽ More Traffic forecasting is crucial for intelligent transportation systems. It has experienced significant advancements thanks to the power of deep learning in capturing latent patterns of traffic data. However, recent deep-learning architectures require intricate model designs and lack an intuitive understanding of the map** from input data to predicted results. Achieving both accuracy and responsibility in traffic prediction models remains a challenge due to the complexity of traffic data and the inherent opacity of deep learning models. To tackle these challenges, we propose a Responsible and Reliable Traffic flow forecasting model with Large Language Models (R2T-LLM), which leverages large language models (LLMs) to generate responsible traffic predictions. By transferring multi-modal traffic data into natural language descriptions, R2T-LLM captures complex spatial-temporal patterns and external factors from comprehensive traffic data. The LLM framework is fine-tuned using language-based instructions to align with spatial-temporal traffic flow data. Empirically, R2T-LLM shows competitive accuracy compared with deep learning baselines, while providing an intuitive and reliable explanation for predictions. We discuss the spatial-temporal and input dependencies for conditional future flow forecasting, showcasing R2T-LLM's potential for diverse city prediction tasks. This paper contributes to advancing accountable traffic prediction models and lays a foundation for future exploration of LLM applications in transportation. To the best of our knowledge, this is the first study to use LLM for accountable and reliable prediction of traffic flows. △ Less

Submitted 21 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 27pages, 8 figures

arXiv:2404.01151 [pdf, other]

Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs

Authors: Jialou Wang, Manli Zhu, Yulei Li, Honglei Li, Longzhi Yang, Wai Lok Woo

Abstract: Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditi… ▽ More Localization plays a crucial role in enhancing the practicality and precision of VQA systems. By enabling fine-grained identification and interaction with specific parts of an object, it significantly improves the system's ability to provide contextually relevant and spatially accurate responses, crucial for applications in dynamic environments like robotics and augmented reality. However, traditional systems face challenges in accurately map** objects within images to generate nuanced and spatially aware responses. In this work, we introduce "Detect2Interact", which addresses these challenges by introducing an advanced approach for fine-grained object visual key field detection. First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images. Next, we use Vision Studio to extract semantic object descriptions. Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map. As a result, Detect2Interact achieves consistent qualitative results on object key field detection across extensive test cases and outperforms the existing VQA system with object detection by providing a more reasonable and finer visual representation. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted to IEEE Intelligent Systems

arXiv:2404.01150 [pdf, other]

Visual-inertial state estimation based on Chebyshev polynomial optimization

Authors: Hongyu Zhang, Maoran Zhu, Qi Cai, Yuanxin Wu

Abstract: This paper proposes an innovative state estimation method for visual-inertial fusion based on Chebyshev polynomial optimization. Specifically, the pose is modeled as a Chebyshev polynomial of a certain order, and its time derivatives are used to calculate linear acceleration and angular velocity, which, along with inertial measurements, constitute dynamic constraints. This is coupled with a visual… ▽ More This paper proposes an innovative state estimation method for visual-inertial fusion based on Chebyshev polynomial optimization. Specifically, the pose is modeled as a Chebyshev polynomial of a certain order, and its time derivatives are used to calculate linear acceleration and angular velocity, which, along with inertial measurements, constitute dynamic constraints. This is coupled with a visual measurement model to construct a visual-inertial bundle adjustment formulation. Simulation and public dataset experiments show that the proposed method has better accuracy than the discrete-form preintegration method. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00712 [pdf, other]

Survey of Computerized Adaptive Testing: A Machine Learning Perspective

Authors: Qi Liu, Yan Zhuang, Haoyang Bi, Zhenya Huang, Weizhe Huang, Jiatong Li, Junhao Yu, Zirui Liu, Zirui Hu, Yuting Hong, Zachary A. Pardos, Hai** Ma, Mengxiao Zhu, Shi** Wang, Enhong Chen

Abstract: Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees, by dynamically adjusting test questions based on their performance. Widely adopted across diverse fields like education, healthcare, sports, and sociology, CAT has revolutionized testing practices. While traditional methods rely on psychometrics and statistics, the increasing c… ▽ More Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees, by dynamically adjusting test questions based on their performance. Widely adopted across diverse fields like education, healthcare, sports, and sociology, CAT has revolutionized testing practices. While traditional methods rely on psychometrics and statistics, the increasing complexity of large-scale testing has spurred the integration of machine learning techniques. This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing method. By examining the test question selection algorithm at the heart of CAT's adaptivity, we shed light on its functionality. Furthermore, we delve into cognitive diagnosis models, question bank construction, and test control within CAT, exploring how machine learning can optimize these components. Through an analysis of current methods, strengths, limitations, and challenges, we strive to develop robust, fair, and efficient CAT systems. By bridging psychometric-driven CAT research with machine learning, this survey advocates for a more inclusive and interdisciplinary approach to the future of adaptive testing. △ Less

Submitted 4 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.19345 [pdf]

Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine Learning

Authors: Kangming Xu, Huiming Zhou, Haotian Zheng, Mingwei Zhu, Qi Xin

Abstract: With the rapid evolution of the Internet and the exponential proliferation of information, users encounter information overload and the conundrum of choice. Personalized recommendation systems play a pivotal role in alleviating this burden by aiding users in filtering and selecting information tailored to their preferences and requirements. Such systems not only enhance user experience and satisfa… ▽ More With the rapid evolution of the Internet and the exponential proliferation of information, users encounter information overload and the conundrum of choice. Personalized recommendation systems play a pivotal role in alleviating this burden by aiding users in filtering and selecting information tailored to their preferences and requirements. Such systems not only enhance user experience and satisfaction but also furnish opportunities for businesses and platforms to augment user engagement, sales, and advertising efficacy.This paper undertakes a comparative analysis between the operational mechanisms of traditional e-commerce commodity classification systems and personalized recommendation systems. It delineates the significance and application of personalized recommendation systems across e-commerce, content information, and media domains. Furthermore, it delves into the challenges confronting personalized recommendation systems in e-commerce, including data privacy, algorithmic bias, scalability, and the cold start problem. Strategies to address these challenges are elucidated.Subsequently, the paper outlines a personalized recommendation system leveraging the BERT model and nearest neighbor algorithm, specifically tailored to address the exigencies of the eBay e-commerce platform. The efficacy of this recommendation system is substantiated through manual evaluation, and a practical application operational guide and structured output recommendation results are furnished to ensure the system's operability and scalability. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18660 [pdf, other]

InstructBrush: Learning Attention-based Instruction Optimization for Image Editing

Authors: Ruoyu Zhao, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Wei Wu, Pengcheng Xu, Mingrui Zhu, Nannan Wang, Xinbo Gao

Abstract: In recent years, instruction-based image editing methods have garnered significant attention in image editing. However, despite encompassing a wide range of editing priors, these methods are helpless when handling editing tasks that are challenging to accurately describe through language. We propose InstructBrush, an inversion method for instruction-based image editing methods to bridge this gap.… ▽ More In recent years, instruction-based image editing methods have garnered significant attention in image editing. However, despite encompassing a wide range of editing priors, these methods are helpless when handling editing tasks that are challenging to accurately describe through language. We propose InstructBrush, an inversion method for instruction-based image editing methods to bridge this gap. It extracts editing effects from exemplar image pairs as editing instructions, which are further applied for image editing. Two key techniques are introduced into InstructBrush, Attention-based Instruction Optimization and Transformation-oriented Instruction Initialization, to address the limitations of the previous method in terms of inversion effects and instruction generalization. To explore the ability of instruction inversion methods to guide image editing in open scenarios, we establish a TransformationOriented Paired Benchmark (TOP-Bench), which contains a rich set of scenes and editing types. The creation of this benchmark paves the way for further exploration of instruction inversion. Quantitatively and qualitatively, our approach achieves superior performance in editing and is more semantically consistent with the target editing effects. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Project Page: https://royzhao926.github.io/InstructBrush/

arXiv:2403.18344 [pdf, other]

LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language Models

Authors: Mingxing Peng, Xusen Guo, Xianda Chen, Meixin Zhu, Kehua Chen, Hao, Yang, Xuesong Wang, Yinhai Wang

Abstract: To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address thes… ▽ More To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address these challenges by proposing LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models (LLMs). Essentially, we reformulate the lane change prediction task as a language modeling problem, processing heterogeneous driving scenario information in natural language as prompts for input into the LLM and employing a supervised fine-tuning technique to tailor the LLM specifically for our lane change prediction task. This allows us to utilize the LLM's powerful common sense reasoning abilities to understand complex interactive information, thereby improving the accuracy of long-term predictions. Furthermore, we incorporate explanatory requirements into the prompts in the inference stage. Therefore, our LC-LLM model not only can predict lane change intentions and trajectories but also provides explanations for its predictions, enhancing the interpretability. Extensive experiments on the large-scale highD dataset demonstrate the superior performance and interpretability of our LC-LLM in lane change prediction task. To the best of our knowledge, this is the first attempt to utilize LLMs for predicting lane change behavior. Our study shows that LLMs can encode comprehensive interaction information for driving behavior understanding. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.17552 [pdf, other]

Naive Bayes-based Context Extension for Large Language Models

Authors: Jianlin Su, Murtadha Ahmed, Wenbo, Luo Ao, Mingren Zhu, Yunfeng Liu

Abstract: Large Language Models (LLMs) have shown promising in-context learning abilities. However, conventional In-Context Learning (ICL) approaches are often impeded by length limitations of transformer architecture, which pose challenges when attempting to effectively integrate supervision from a substantial number of demonstration examples. In this paper, we introduce a novel framework, called Naive Bay… ▽ More Large Language Models (LLMs) have shown promising in-context learning abilities. However, conventional In-Context Learning (ICL) approaches are often impeded by length limitations of transformer architecture, which pose challenges when attempting to effectively integrate supervision from a substantial number of demonstration examples. In this paper, we introduce a novel framework, called Naive Bayes-based Context Extension (NBCE), to enable existing LLMs to perform ICL with an increased number of demonstrations by significantly expanding their context size. Importantly, this expansion does not require fine-tuning or dependence on particular model architectures, all the while preserving linear efficiency. NBCE initially splits the context into equal-sized windows fitting the target LLM's maximum length. Then, it introduces a voting mechanism to select the most relevant window, regarded as the posterior context. Finally, it employs Bayes' theorem to generate the test task. Our experimental results demonstrate that NBCE substantially enhances performance, particularly as the number of demonstration examples increases, consistently outperforming alternative methods. The NBCE code will be made publicly accessible. The code NBCE is available at: https://github.com/amurtadha/NBCE-master △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to main NAACL 2024

arXiv:2403.16499 [pdf, other]

doi 10.1016/j.media.2024.103151

Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes

Authors: Tianwei Zhang, Dong Wei, Mengmeng Zhu, Shi Gu, Yefeng Zheng

Abstract: Self-supervised learning has emerged as a powerful tool for pretraining deep networks on unlabeled data, prior to transfer learning of target tasks with limited annotation. The relevance between the pretraining pretext and target tasks is crucial to the success of transfer learning. Various pretext tasks have been proposed to utilize properties of medical image data (e.g., three dimensionality), w… ▽ More Self-supervised learning has emerged as a powerful tool for pretraining deep networks on unlabeled data, prior to transfer learning of target tasks with limited annotation. The relevance between the pretraining pretext and target tasks is crucial to the success of transfer learning. Various pretext tasks have been proposed to utilize properties of medical image data (e.g., three dimensionality), which are more relevant to medical image analysis than generic ones for natural images. However, previous work rarely paid attention to data with anatomy-oriented imaging planes, e.g., standard cardiac magnetic resonance imaging views. As these imaging planes are defined according to the anatomy of the imaged organ, pretext tasks effectively exploiting this information can pretrain the networks to gain knowledge on the organ of interest. In this work, we propose two complementary pretext tasks for this group of medical image data based on the spatial relationship of the imaging planes. The first is to learn the relative orientation between the imaging planes and implemented as regressing their intersecting lines. The second exploits parallel imaging planes to regress their relative slice locations within a stack. Both pretext tasks are conceptually straightforward and easy to implement, and can be combined in multitask learning for better representation learning. Thorough experiments on two anatomical structures (heart and knee) and representative target tasks (semantic segmentation and classification) demonstrate that the proposed pretext tasks are effective in pretraining deep networks for remarkably boosted performance on the target tasks, and superior to other recent approaches. △ Less

Submitted 7 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Medical Image Analysis

arXiv:2403.15943 [pdf, ps, other]

Advanced Feature Manipulation for Enhanced Change Detection Leveraging Natural Language Models

Authors: Zhenglin Li, Yangchen Huang, Mengran Zhu, **gyu Zhang, **gHao Chang, Houze Liu

Abstract: Change detection is a fundamental task in computer vision that processes a bi-temporal image pair to differentiate between semantically altered and unaltered regions. Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities and have shown promise in numerous downstream applications. In this study, we harness the power of a pre-trained… ▽ More Change detection is a fundamental task in computer vision that processes a bi-temporal image pair to differentiate between semantically altered and unaltered regions. Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities and have shown promise in numerous downstream applications. In this study, we harness the power of a pre-trained LLM, extracting feature maps from extensive datasets, and employ an auxiliary network to detect changes. Unlike existing LLM-based change detection methods that solely focus on deriving high-quality feature maps, our approach emphasizes the manipulation of these feature maps to enhance semantic relevance. △ Less

Submitted 13 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: This version is not our full version based on our new progress, related data, and methodology we are dealing with, and based on the rules and the laws, we are adjusting our current version

arXiv:2403.14232 [pdf, other]

Contrastive Balancing Representation Learning for Heterogeneous Dose-Response Curves Estimation

Authors: Minqin Zhu, Anpeng Wu, Haoxuan Li, Ruoxuan Xiong, Bo Li, Xiaoqing Yang, Xuan Qin, Peng Zhen, Jiecheng Guo, Fei Wu, Kun Kuang

Abstract: Estimating the individuals' potential response to varying treatment doses is crucial for decision-making in areas such as precision medicine and management science. Most recent studies predict counterfactual outcomes by learning a covariate representation that is independent of the treatment variable. However, such independence constraints neglect much of the covariate information that is useful f… ▽ More Estimating the individuals' potential response to varying treatment doses is crucial for decision-making in areas such as precision medicine and management science. Most recent studies predict counterfactual outcomes by learning a covariate representation that is independent of the treatment variable. However, such independence constraints neglect much of the covariate information that is useful for counterfactual prediction, especially when the treatment variables are continuous. To tackle the above issue, in this paper, we first theoretically demonstrate the importance of the balancing and prognostic representations for unbiased estimation of the heterogeneous dose-response curves, that is, the learned representations are constrained to satisfy the conditional independence between the covariates and both of the treatment variables and the potential responses. Based on this, we propose a novel Contrastive balancing Representation learning Network using a partial distance measure, called CRNet, for estimating the heterogeneous dose-response curves without losing the continuity of treatments. Extensive experiments are conducted on synthetic and real-world datasets demonstrating that our proposal significantly outperforms previous methods. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13245 [pdf, other]

Federated reinforcement learning for robot motion planning with zero-shot generalization

Authors: Zhenyuan Yuan, Siyuan Xu, Minghui Zhu

Abstract: This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing… ▽ More This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework. △ Less

Submitted 7 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12130 [pdf, other]

Almost Optically Dark Galaxies in DECaLS (I): Detection, Optical Properties and Possible Origins

Authors: Lin Du, Wei Du, Cheng Cheng, Ming Zhu, Haiyang Yu, Hong Wu

Abstract: We report the discovery of eight optical counterparts of ALFALFA extragalactic objects from DECaLS, five of which are discovered for the first time. These objects were flagged as HI emission sources with no optical counterparts in SDSS before. Multi-band data reveal their unusual physical properties. They are faint and blue ($g-r=-0.35\sim0.55$), with quite low surface brightness (… ▽ More We report the discovery of eight optical counterparts of ALFALFA extragalactic objects from DECaLS, five of which are discovered for the first time. These objects were flagged as HI emission sources with no optical counterparts in SDSS before. Multi-band data reveal their unusual physical properties. They are faint and blue ($g-r=-0.35\sim0.55$), with quite low surface brightness ($μ_{\rm g,peak}=24.88\sim26.41\,{\rm mag}/{\rm arcsec}^2$), irregular morphologies, low stellar masses ($log_{10}(M_{*}/M_\odot)=5.27\sim7.15$), low star formation rates ($SFR=0.21\sim9.24\times10^{-3}\,{M_\odot}\,{\rm yr}^{-1}$), and remarkably high HI-to-stellar mass ratios ($log_{10}(M_{\rm HI}/M_{*}) = 1.72\sim3.22$, except AGC\,215415). They deviate from the scaling relations between HI and optical properties defined by the ALFALFA sample and the baryonic Tully-Fisher relation. They agree well with the main sequence of star-forming galaxies but exhibit low star-forming efficiency. Based on their physical properties and environments, we speculate that six of these objects may have originated from tidal processes, while the remaining two appear to have isolated origins. They may have had a relatively calm evolutionary history and only begun to form stars recently. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 32 pages, 11 figures, accepted by the Astrophysical Journal

arXiv:2403.10831 [pdf, other]

DUE: Dynamic Uncertainty-Aware Explanation Supervision via 3D Imputation

Authors: Qilong Zhao, Yifei Zhang, Mengdan Zhu, Siyi Gu, Yuyang Gao, Xiaofeng Yang, Liang Zhao

Abstract: Explanation supervision aims to enhance deep learning models by integrating additional signals to guide the generation of model explanations, showcasing notable improvements in both the predictability and explainability of the model. However, the application of explanation supervision to higher-dimensional data, such as 3D medical images, remains an under-explored domain. Challenges associated wit… ▽ More Explanation supervision aims to enhance deep learning models by integrating additional signals to guide the generation of model explanations, showcasing notable improvements in both the predictability and explainability of the model. However, the application of explanation supervision to higher-dimensional data, such as 3D medical images, remains an under-explored domain. Challenges associated with supervising visual explanations in the presence of an additional dimension include: 1) spatial correlation changed, 2) lack of direct 3D annotations, and 3) uncertainty varies across different parts of the explanation. To address these challenges, we propose a Dynamic Uncertainty-aware Explanation supervision (DUE) framework for 3D explanation supervision that ensures uncertainty-aware explanation guidance when dealing with sparsely annotated 3D data with diffusion-based 3D interpolation. Our proposed framework is validated through comprehensive experiments on diverse real-world medical imaging datasets. The results demonstrate the effectiveness of our framework in enhancing the predictability and explainability of deep learning models in the context of medical imaging diagnosis applications. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: 9 pages,6 figures

arXiv:2403.10720 [pdf, other]

Development and Application of a Monte Carlo Tree Search Algorithm for Simulating Da Vinci Code Game Strategies

Authors: Ye Zhang, Mengran Zhu, Kailin Gui, Jiayue Yu, Yong Hao, Haozhan Sun

Abstract: In this study, we explore the efficiency of the Monte Carlo Tree Search (MCTS), a prominent decision-making algorithm renowned for its effectiveness in complex decision environments, contingent upon the volume of simulations conducted. Notwithstanding its broad applicability, the algorithm's performance can be adversely impacted in certain scenarios, particularly within the domain of game strategy… ▽ More In this study, we explore the efficiency of the Monte Carlo Tree Search (MCTS), a prominent decision-making algorithm renowned for its effectiveness in complex decision environments, contingent upon the volume of simulations conducted. Notwithstanding its broad applicability, the algorithm's performance can be adversely impacted in certain scenarios, particularly within the domain of game strategy development. This research posits that the inherent branch divergence within the Da Vinci Code board game significantly impedes parallelism when executed on Graphics Processing Units (GPUs). To investigate this hypothesis, we implemented and meticulously evaluated two variants of the MCTS algorithm, specifically designed to assess the impact of branch divergence on computational performance. Our comparative analysis reveals a linear improvement in performance with the CPU-based implementation, in stark contrast to the GPU implementation, which exhibits a non-linear enhancement pattern and discernible performance troughs. These findings contribute to a deeper understanding of the MCTS algorithm's behavior in divergent branch scenarios, highlighting critical considerations for optimizing game strategy algorithms on parallel computing architectures. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: This paper has been accepted by CVIDL2024

arXiv:2403.09121 [pdf, other]

OutlineSpark: Igniting AI-powered Presentation Slides Creation from Computational Notebooks through Outlines

Authors: Fengjie Wang, Yanna Lin, Leni Yang, Haotian Li, Mingyang Gu, Min Zhu, Huamin Qu

Abstract: Computational notebooks are widely utilized for exploration and analysis. However, creating slides to communicate analysis results from these notebooks is quite tedious and time-consuming. Researchers have proposed automatic systems for generating slides from notebooks, which, however, often do not consider the process of users conceiving and organizing their messages from massive code cells. Thos… ▽ More Computational notebooks are widely utilized for exploration and analysis. However, creating slides to communicate analysis results from these notebooks is quite tedious and time-consuming. Researchers have proposed automatic systems for generating slides from notebooks, which, however, often do not consider the process of users conceiving and organizing their messages from massive code cells. Those systems ask users to go directly into the slide creation process, which causes potentially ill-structured slides and burdens in further refinement. Inspired by the common and widely recommended slide creation practice: drafting outlines first and then adding concrete content, we introduce OutlineSpark, an AI-powered slide creation tool that generates slides from a slide outline written by the user. The tool automatically retrieves relevant notebook cells based on the outlines and converts them into slide content. We evaluated OutlineSpark with 12 users. Both the quantitative and qualitative feedback from the participants verify its effectiveness and usability. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: To appear in Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2024)

arXiv:2403.06199 [pdf, other]

Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

Authors: Minjie Zhu, Yichen Zhu, Xin Liu, Ning Liu, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Zhicai Ou, Feifei Feng, Jian Tang

Abstract: Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning. Yet, their widespread application faces obstacles due to the high computational demands during both the training and inference phases, restricting their use to a limited audience within the research and user communities. In this paper, we investigate the design aspects… ▽ More Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning. Yet, their widespread application faces obstacles due to the high computational demands during both the training and inference phases, restricting their use to a limited audience within the research and user communities. In this paper, we investigate the design aspects of Multimodal Small Language Models (MSLMs) and propose an efficient multimodal assistant named Mipha, which is designed to create synergy among various aspects: visual representation, language models, and optimization strategies. We show that without increasing the volume of training data, our Mipha-3B outperforms the state-of-the-art large MLLMs, especially LLaVA-1.5-13B, on multiple benchmarks. Through detailed discussion, we provide insights and guidelines for develo** strong MSLMs that rival the capabilities of MLLMs. Our code is available at https://github.com/zhuyiche/llava-phi. △ Less

Submitted 25 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.04904 [pdf, other]

Antiferroelectric Nanodomains Stabilized by Chemical Disorder at Anti-phase Boundaries

Authors: Menglin Zhu, Michael Xu, Yu Yun, Liyan Wu, Or Shafir, Colin Gilgenbach, Lane W. Martin, Ilya Grinberg, Jonathan E. Spanier, James M. LeBeau

Abstract: Antiferroelectric perovskite oxides exhibit exceptional dielectric properties and high structural/chemical tunability, making them promising for a wide range of applications from high energy-density capacitors to solid-state cooling. However, tailoring the antiferroelectric phase stability through alloying is hampered by the complex interplay between chemistry and the alignment of dipole moments.… ▽ More Antiferroelectric perovskite oxides exhibit exceptional dielectric properties and high structural/chemical tunability, making them promising for a wide range of applications from high energy-density capacitors to solid-state cooling. However, tailoring the antiferroelectric phase stability through alloying is hampered by the complex interplay between chemistry and the alignment of dipole moments. In this study, correlations between chemical order and the stability of the antiferroelectric phase are established at anti-phase boundaries in \ce{Pb2MgWO6}. Using multislice ptychography, we reveal the three-dimensional nature of chemical order at the boundaries and show that they exhibit a finite width of chemical intermixing. Furthermore, regions at and adjacent to the anti-phase boundary exhibit antiferroelectric displacements in contrast to the overall paraelectric film. Combining spatial statistics and density functional theory simulations, local antiferroelectric distortions are shown to be confined to and stabilized by chemical disorder. Enabled by the three-dimensional information of multislice ptychography, these results provide insights into the interplay between chemical order and electronic properties to engineer antiferroelectric material response. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04753 [pdf, other]

Mechanism for Decision-aware Collaborative Federated Learning: A Pitfall of Shapley Values

Authors: Meng Qi, Mingxi Zhu

Abstract: This paper investigates mechanism design for decision-aware collaboration via federated learning (FL) platforms. Our framework consists of a digital platform and multiple decision-aware agents, each endowed with proprietary data sets. The platform offers an infrastructure that enables access to the data, creates incentives for collaborative learning aimed at operational decision-making, and conduc… ▽ More This paper investigates mechanism design for decision-aware collaboration via federated learning (FL) platforms. Our framework consists of a digital platform and multiple decision-aware agents, each endowed with proprietary data sets. The platform offers an infrastructure that enables access to the data, creates incentives for collaborative learning aimed at operational decision-making, and conducts FL to avoid direct raw data sharing. The computation and communication efficiency of the FL process is inherently influenced by the agent participation equilibrium induced by the mechanism. Therefore, assessing the system's efficiency involves two critical factors: the surplus created by coalition formation and the communication costs incurred across the coalition during FL. To evaluate the system efficiency under the intricate interplay between mechanism design, agent participation, operational decision-making, and the performance of FL algorithms, we introduce a multi-action collaborative federated learning (MCFL) framework for decision-aware agents. Under this framework, we further analyze the equilibrium for the renowned Shapley value based mechanisms. Specifically, we examine the issue of false-name manipulation, a form of dishonest behavior where participating agents create duplicate fake identities to split their original data among these identities. By solving the agent participation equilibrium, we demonstrate that while Shapley value effectively maximizes coalition-generated surplus by encouraging full participation, it inadvertently promotes false-name manipulation. This further significantly increases the communication costs when the platform conducts FL. Thus, we highlight a significant pitfall of Shapley value based mechanisms, which implicitly incentivizes data splitting and identity duplication, ultimately impairing the overall efficiency in FL systems. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03088 [pdf]

Shear-enhanced Liquid Crystal Spinning of Conjugated Polymer Fibers

Authors: Hao Jiang, Chi-yuan Yang, Deyu Tu, Zhu Chen, Wei Huang, Liang-wen Feng, Hengda Sun, Hongzhi Wang, Simone Fabiano, Meifang Zhu, Gang Wang

Abstract: Conjugated polymer fibers can be used to manufacture various soft fibrous optoelectronic devices, significantly advancing wearable devices and smart textiles. Recently, conjugated polymer-based fibrous electronic devices have been widely used in energy conversion, electrochemical sensing, and human-machine interaction. However, the insufficient mechanical properties of conjugated polymer fibers, t… ▽ More Conjugated polymer fibers can be used to manufacture various soft fibrous optoelectronic devices, significantly advancing wearable devices and smart textiles. Recently, conjugated polymer-based fibrous electronic devices have been widely used in energy conversion, electrochemical sensing, and human-machine interaction. However, the insufficient mechanical properties of conjugated polymer fibers, the difficulty in solution processing semiconductors with rigid main chains, and the challenges in large-scale continuous production have limited their further development in the wearable field. We regulated the pi - pi stacking interactions in conjugated polymer molecules below their critical liquid crystal concentration by applying fluid shear stress. We implemented secondary orientation, leading to the continuous fabrication of anisotropic semiconductor fibers. This strategy enables conjugated polymers with rigid backbones to synergistically enhance the mechanical and semiconductor properties of fibers through liquid crystal spinning. Furthermore, conjugated polymer fibers, exhibiting excellent electrochemical performance and high mechanical strength (600 MPa) that essentially meet the requirements for industrialized preparation, maintain stability under extreme temperatures, radiation, and chemical reagents. Lastly, we have demonstrated logic circuits using semiconductor fiber organic electrochemical transistors, showcasing its application potential in the field of wearable fabric-style logic processing. These findings confirm the importance of the liquid crystalline state and solution control in optimizing the performance of conjugated polymer fibers, thus paving the way for develo** a new generation of soft fiber semiconductor devices. △ Less

Submitted 6 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17979 [pdf, other]

Ensemble Methodology:Innovations in Credit Default Prediction Using LightGBM, XGBoost, and LocalEnsemble

Authors: Mengran Zhu, Ye Zhang, Yulu Gong, Kaijuan Xing, Xu Yan, **tong Song

Abstract: In the realm of consumer lending, accurate credit default prediction stands as a critical element in risk mitigation and lending decision optimization. Extensive research has sought continuous improvement in existing models to enhance customer experiences and ensure the sound economic functioning of lending institutions. This study responds to the evolving landscape of credit default prediction, c… ▽ More In the realm of consumer lending, accurate credit default prediction stands as a critical element in risk mitigation and lending decision optimization. Extensive research has sought continuous improvement in existing models to enhance customer experiences and ensure the sound economic functioning of lending institutions. This study responds to the evolving landscape of credit default prediction, challenging conventional models and introducing innovative approaches. By building upon foundational research and recent innovations, our work aims to redefine the standards of accuracy in credit default prediction, setting a new benchmark for the industry. To overcome these challenges, we present an Ensemble Methods framework comprising LightGBM, XGBoost, and LocalEnsemble modules, each making unique contributions to amplify diversity and improve generalization. By utilizing distinct feature sets, our methodology directly tackles limitations identified in previous studies, with the overarching goal of establishing a novel standard for credit default prediction accuracy. Our experimental findings validate the effectiveness of the ensemble model on the dataset, signifying substantial contributions to the field. This innovative approach not only addresses existing obstacles but also sets a precedent for advancing the accuracy and robustness of credit default prediction models. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.16038 [pdf]

Deep Learning Approaches for Improving Question Answering Systems in Hepatocellular Carcinoma Research

Authors: Shuning Huo, Yafei Xiang, Hanyi Yu, Mengran Zhu, Yulu Gong

Abstract: In recent years, advancements in natural language processing (NLP) have been fueled by deep learning techniques, particularly through the utilization of powerful computing resources like GPUs and TPUs. Models such as BERT and GPT-3, trained on vast amounts of data, have revolutionized language understanding and generation. These pre-trained models serve as robust bases for various tasks including… ▽ More In recent years, advancements in natural language processing (NLP) have been fueled by deep learning techniques, particularly through the utilization of powerful computing resources like GPUs and TPUs. Models such as BERT and GPT-3, trained on vast amounts of data, have revolutionized language understanding and generation. These pre-trained models serve as robust bases for various tasks including semantic understanding, intelligent writing, and reasoning, paving the way for a more generalized form of artificial intelligence. NLP, as a vital application of AI, aims to bridge the gap between humans and computers through natural language interaction. This paper delves into the current landscape and future prospects of large-scale model-based NLP, focusing on the question-answering systems within this domain. Practical cases and developments in artificial intelligence-driven question-answering systems are analyzed to foster further exploration and research in the realm of large-scale NLP. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.16036 [pdf]

Machine Learning-Based Vehicle Intention Trajectory Recognition and Prediction for Autonomous Driving

Authors: Hanyi Yu, Shuning Huo, Mengran Zhu, Yulu Gong, Yafei Xiang

Abstract: In recent years, the expansion of internet technology and advancements in automation have brought significant attention to autonomous driving technology. Major automobile manufacturers, including Volvo, Mercedes-Benz, and Tesla, have progressively introduced products ranging from assisted-driving vehicles to semi-autonomous vehicles. However, this period has also witnessed several traffic safety i… ▽ More In recent years, the expansion of internet technology and advancements in automation have brought significant attention to autonomous driving technology. Major automobile manufacturers, including Volvo, Mercedes-Benz, and Tesla, have progressively introduced products ranging from assisted-driving vehicles to semi-autonomous vehicles. However, this period has also witnessed several traffic safety incidents involving self-driving vehicles. For instance, in March 2016, a Google self-driving car was involved in a minor collision with a bus. At the time of the accident, the autonomous vehicle was attempting to merge into the right lane but failed to dynamically respond to the real-time environmental information during the lane change. It incorrectly assumed that the approaching bus would slow down to avoid it, leading to a low-speed collision with the bus. This incident highlights the current technological shortcomings and safety concerns associated with autonomous lane-changing behavior, despite the rapid advancements in autonomous driving technology. Lane-changing is among the most common and hazardous behaviors in highway driving, significantly impacting traffic safety and flow. Therefore, lane-changing is crucial for traffic safety, and accurately predicting drivers' lane change intentions can markedly enhance driving safety. This paper introduces a deep learning-based prediction method for autonomous driving lane change behavior, aiming to facilitate safe lane changes and thereby improve road safety. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.16035 [pdf]

Text Understanding and Generation Using Transformer Models for Intelligent E-commerce Recommendations

Authors: Yafei Xiang, Hanyi Yu, Yulu Gong, Shuning Huo, Mengran Zhu

Abstract: With the rapid development of artificial intelligence technology, Transformer structural pre-training model has become an important tool for large language model (LLM) tasks. In the field of e-commerce, these models are especially widely used, from text understanding to generating recommendation systems, which provide powerful technical support for improving user experience and optimizing service… ▽ More With the rapid development of artificial intelligence technology, Transformer structural pre-training model has become an important tool for large language model (LLM) tasks. In the field of e-commerce, these models are especially widely used, from text understanding to generating recommendation systems, which provide powerful technical support for improving user experience and optimizing service processes. This paper reviews the core application scenarios of Transformer pre-training model in e-commerce text understanding and recommendation generation, including but not limited to automatic generation of product descriptions, sentiment analysis of user comments, construction of personalized recommendation system and automated processing of customer service conversations. Through a detailed analysis of the model's working principle, implementation process, and application effects in specific cases, this paper emphasizes the unique advantages of pre-trained models in understanding complex user intentions and improving the quality of recommendations. In addition, the challenges and improvement directions for the future are also discussed, such as how to further improve the generalization ability of the model, the ability to handle large-scale data sets, and technical strategies to protect user privacy. Ultimately, the paper points out that the application of Transformer structural pre-training models in e-commerce has not only driven technological innovation, but also brought substantial benefits to merchants and consumers, and looking forward, these models will continue to play a key role in e-commerce and beyond. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15726 [pdf, other]

doi 10.1109/TCSVT.2024.3397997

CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge

Authors: Xiao Lin, Minghao Zhu, Ronghao Dang, Guangliang Zhou, Shaolong Shu, Feng Lin, Chengju Liu, Qijun Chen

Abstract: Most of existing category-level object pose estimation methods devote to learning the object category information from point cloud modality. However, the scale of 3D datasets is limited due to the high cost of 3D data collection and annotation. Consequently, the category features extracted from these limited point cloud samples may not be comprehensive. This motivates us to investigate whether we… ▽ More Most of existing category-level object pose estimation methods devote to learning the object category information from point cloud modality. However, the scale of 3D datasets is limited due to the high cost of 3D data collection and annotation. Consequently, the category features extracted from these limited point cloud samples may not be comprehensive. This motivates us to investigate whether we can draw on knowledge of other modalities to obtain category information. Inspired by this motivation, we propose CLIPose, a novel 6D pose framework that employs the pre-trained vision-language model to develop better learning of object category information, which can fully leverage abundant semantic knowledge in image and text modalities. To make the 3D encoder learn category-specific features more efficiently, we align representations of three modalities in feature space via multi-modal contrastive learning. In addition to exploiting the pre-trained knowledge of the CLIP's model, we also expect it to be more sensitive with pose parameters. Therefore, we introduce a prompt tuning approach to fine-tune image encoder while we incorporate rotations and translations information in the text descriptions. CLIPose achieves state-of-the-art performance on two mainstream benchmark datasets, REAL275 and CAMERA25, and runs in real-time during inference (40FPS). △ Less

Submitted 24 February, 2024; originally announced February 2024.

Comments: 14 pages, 4 figures, 9 tables

arXiv:2402.13669 [pdf, other]

Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

Authors: Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, Qian Liu

Abstract: The surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce… ▽ More The surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce Self-Distillation Fine-Tuning (SDFT), a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting while achieving comparable or superior performance on downstream tasks compared to the vanilla fine-tuning. Moreover, SDFT demonstrates the potential to maintain the helpfulness and safety alignment of LLMs. Our code is available at https://github.com/sail-sg/sdft. △ Less

Submitted 28 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: ACL 2024

arXiv:2402.13469 [pdf, other]

The Quantum Abstract Machine

Authors: Liyi Li, Le Chang, Rance Cleaveland, Mingwei Zhu, Xiaodi Wu

Abstract: This paper develops a model of quantum behavior that is intended to support the abstract yet accurate design and functional verification of quantum communication protocols. The work is motivated by the need for conceptual tools for the development of quantum-communication systems that are usable by non-specialists in quantum physics while also correctly capturing at a useful abstraction the underl… ▽ More This paper develops a model of quantum behavior that is intended to support the abstract yet accurate design and functional verification of quantum communication protocols. The work is motivated by the need for conceptual tools for the development of quantum-communication systems that are usable by non-specialists in quantum physics while also correctly capturing at a useful abstraction the underlying quantum phenomena. Our approach involves defining a quantum abstract machine (QAM) whose operations correspond to well-known quantum circuits; these operations, however, are given direct abstract semantics in a style similar to that of Berry's and Boudol's Chemical Abstract Machine. This paper defines the QAM's semantics and shows via examples how it may be used to model and reason about existing quantum communication protocols. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10686 [pdf, other]

Uncertainty, Calibration, and Membership Inference Attacks: An Information-Theoretic Perspective

Authors: Meiyi Zhu, Caili Guo, Chunyan Feng, Osvaldo Simeone

Abstract: In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleato… ▽ More In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 27 pages, 13 figures

arXiv:2402.09830 [pdf]

Utilizing GANs for Fraud Detection: Model Training with Synthetic Transaction Data

Authors: Mengran Zhu, Yulu Gong, Yafei Xiang, Hanyi Yu, Shuning Huo

Abstract: Anomaly detection is a critical challenge across various research domains, aiming to identify instances that deviate from normal data distributions. This paper explores the application of Generative Adversarial Networks (GANs) in fraud detection, comparing their advantages with traditional methods. GANs, a type of Artificial Neural Network (ANN), have shown promise in modeling complex data distrib… ▽ More Anomaly detection is a critical challenge across various research domains, aiming to identify instances that deviate from normal data distributions. This paper explores the application of Generative Adversarial Networks (GANs) in fraud detection, comparing their advantages with traditional methods. GANs, a type of Artificial Neural Network (ANN), have shown promise in modeling complex data distributions, making them effective tools for anomaly detection. The paper systematically describes the principles of GANs and their derivative models, emphasizing their application in fraud detection across different datasets. And by building a collection of adversarial verification graphs, we will effectively prevent fraud caused by bots or automated systems and ensure that the users in the transaction are real. The objective of the experiment is to design and implement a fake face verification code and fraud detection system based on Generative Adversarial network (GANs) algorithm to enhance the security of the transaction process.The study demonstrates the potential of GANs in enhancing transaction security through deep learning techniques. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.09820 [pdf]

Utilizing Deep Learning for Enhancing Network Resilience in Finance

Authors: Yulu Gong, Mengran Zhu, Shuning Huo, Yafei Xiang, Hanyi Yu

Abstract: In the age of the Internet, people's lives are increasingly dependent on today's network technology. Maintaining network integrity and protecting the legitimate interests of users is at the heart of network construction. Threat detection is an important part of a complete and effective defense system. How to effectively detect unknown threats is one of the concerns of network protection. Currently… ▽ More In the age of the Internet, people's lives are increasingly dependent on today's network technology. Maintaining network integrity and protecting the legitimate interests of users is at the heart of network construction. Threat detection is an important part of a complete and effective defense system. How to effectively detect unknown threats is one of the concerns of network protection. Currently, network threat detection is usually based on rules and traditional machine learning methods, which create artificial rules or extract common spatiotemporal features, which cannot be applied to large-scale data applications, and the emergence of unknown risks causes the detection accuracy of the original model to decline. With this in mind, this paper uses deep learning for advanced threat detection to improve protective measures in the financial industry. Many network researchers have shifted their focus to exception-based intrusion detection techniques. The detection technology mainly uses statistical machine learning methods - collecting normal program and network behavior data, extracting multidimensional features, and training decision machine learning models on this basis (commonly used include naive Bayes, decision trees, support vector machines, random forests, etc.). △ Less

Submitted 18 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.06023 [pdf, other]

Decision Theory-Guided Deep Reinforcement Learning for Fast Learning

Authors: Zelin Wan, **-Hee Cho, Mu Zhu, Ahmed H. Anwar, Charles Kamhoua, Munindar P. Singh

Abstract: This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary… ▽ More This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary problem contexts: the cart pole and maze navigation challenges. Experimental results demonstrate that the integration of decision theory not only facilitates effective initial guidance for DRL agents but also promotes a more structured and informed exploration strategy, particularly in environments characterized by large and intricate state spaces. The results of experiment demonstrate that DT-guided DRL can provide significantly higher rewards compared to regular DRL. Specifically, during the initial phase of training, the DT-guided DRL yields up to an 184% increase in accumulated reward. Moreover, even after reaching convergence, it maintains a superior performance, ending with up to 53% more reward than standard DRL in large maze problems. DT-guided DRL represents an advancement in mitigating a fundamental challenge of DRL by leveraging functions informed by human (designer) knowledge, setting a foundation for further research in this promising interdisciplinary domain. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03397 [pdf]

A Comprehensive Approach to Diagnosing Temporomandibular Joint Diseases: AI-driven TMD Diagnostic System

Authors: Y. Gua, C. T. Kong, D. D Zhangc, Y. J Baid, J. K. H. Tsoia, Hua Huangc, Y. Q. Dengc, Y. M Zhue

Abstract: AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint… ▽ More AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint Disorders (TMD). △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01858 [pdf, other]

Explaining latent representations of generative models with large multimodal models

Authors: Mengdan Zhu, Zhenke Liu, Bo Pan, Abhinav Angirekula, Liang Zhao

Abstract: Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence. With the rise of the large multimodal model, it can align images with text to generate answers. In this work, we propose a framework to comprehensively explain each latent variable in the generative models using a large multimodal model. We further measure… ▽ More Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence. With the rise of the large multimodal model, it can align images with text to generate answers. In this work, we propose a framework to comprehensively explain each latent variable in the generative models using a large multimodal model. We further measure the uncertainty of our generated explanations, quantitatively evaluate the performance of explanation generation among multiple large multimodal models, and qualitatively visualize the variations of each latent variable to learn the disentanglement effects of different generative models on explanations. Finally, we discuss the explanatory capabilities and limitations of state-of-the-art large multimodal models. △ Less

Submitted 17 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

arXiv:2402.01576 [pdf, other]

Training Adversarial yet Safe Agent to Characterize Safety Performance of Highly Automated Vehicles

Authors: Minghao Zhu, Anmol Sidhu, Keith A. Redmill

Abstract: This paper focuses on safety performance testing and characterization of black-box highly automated vehicles (HAV). Existing testing approaches typically obtain the testing outcomes by deploying the HAV into a specific testing environment. Such a testing environment can involve various passively given testing strategies presented by other traffic participants such as (i) the naturalistic driving p… ▽ More This paper focuses on safety performance testing and characterization of black-box highly automated vehicles (HAV). Existing testing approaches typically obtain the testing outcomes by deploying the HAV into a specific testing environment. Such a testing environment can involve various passively given testing strategies presented by other traffic participants such as (i) the naturalistic driving policy learned from human drivers, (ii) extracted concrete scenarios from real-world driving data, and (iii) model-based or data-driven adversarial testing methodologies focusing on forcing safety-critical events. The safety performance of HAV is further characterized by analyzing the obtained testing outcomes with a particular selected measure, such as the observed collision risk. The aforementioned testing practices suffer from the scarcity of safety-critical events, have limited operational design domain (ODD) coverage, or are biased toward long-tail unsafe cases. This paper presents a novel and informative testing strategy that differs from these existing practices. The proposal is inspired by the intuition that a relatively safer HAV driving policy would allow the traffic vehicles to exhibit a higher level of aggressiveness to achieve a certain fixed level of an overall safe outcome. One can specifically characterize such a HAV and traffic interactive strategy and use it as a safety performance indicator for the HAV. Under the proposed testing scheme, the HAV is evaluated under its full ODD with a reward function that represents a trade-off between safety and adversity in generating safety-critical events. The proposed methodology is demonstrated in simulation with various HAV designs under different operational design domains. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17364 [pdf, other]

doi 10.1007/s11433-023-2333-8

HiFAST: an HI data calibration and imaging pipeline for FAST

Authors: Yingjie **g, Jie Wang, Chen Xu, Ziming Liu, Qingze Chen, Tiantian Liang, **long Xu, Yixian Cao, **g Wang, Huijie Hu, Chuan-Peng Zhang, Qi Guo, Liang Gao, Mei Ai, Hengqian Gan, Xuyang Gao, **lin Han, Ligang Hou, Zhipeng Hou, Peng Jiang, Xu Kong, Fujia Li, Zerui Liu, Li Shao, Hengxing Pan , et al. (8 additional authors not shown)

Abstract: The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has the largest aperture and a 19-beam L-band receiver, making it powerful for investigating the neutral hydrogen atomic gas (HI) in the universe. We present HiFAST (https://hifast.readthedocs.io), a dedicated, modular, and self-contained calibration and imaging pipeline for processing the HI data of FAST. The pipeline consists of fr… ▽ More The Five-hundred-meter Aperture Spherical radio Telescope (FAST) has the largest aperture and a 19-beam L-band receiver, making it powerful for investigating the neutral hydrogen atomic gas (HI) in the universe. We present HiFAST (https://hifast.readthedocs.io), a dedicated, modular, and self-contained calibration and imaging pipeline for processing the HI data of FAST. The pipeline consists of frequency-dependent noise diode calibration, baseline fitting, standing wave removal using an FFT-based method, flux density calibration, stray radiation correction, and gridding to produce data cubes. These modules can be combined as needed to process the data from most FAST observation modes: tracking, drift scanning, On-The-Fly map**, and most of their variants. With HiFAST, the RMS noises of the calibrated spectra from all 19 beams were only slightly (~ 5%) higher than the theoretical expectation. The results for the extended source M33 and the point sources are consistent with the results from Arecibo. The moment maps (0,1 and 2) of M33 agree well with the results from the Arecibo Galaxy Environment Survey (AGES) with a fractional difference of less than 10%. For a common sample of 221 sources with signal-to-noise ratio S/N >10 from the Arecibo Legacy Fast ALFA (ALFALFA) survey, the mean value of fractional difference in the integrated flux density, $S_{\mathrm{int}}$, between the two datasets is approximately 0.005 %, with a dispersion of 15.4%. Further checks on the integrated flux density of 23 sources with seven observations indicate that the variance in the flux density of the source with luminous objects ($S_\mathrm{int}$ $ > 2.5$ Jy km s$^{-1}$) is less than 5%. Our tests suggest that the FAST telescope, with the efficient, precise, and user-friendly pipeline HiFAST, will yield numerous significant scientific findings in the investigation of the HI in the universe. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by SCPMA. 21 pages, 14 figures. The pipeline is accessible at https://hifast.readthedocs.io

arXiv:2401.16581 [pdf, other]

Continuum excitations in a spin-supersolid on a triangular lattice

Authors: M. Zhu, V. Romerio, N. Steiger, S. D. Nabi, N. Murai, S. Ohira-Kawamura, K. Yu. Povarov, Y. Skourski, R. Sibille, L. Keller, Z. Yan, S. Gvasaliya, A. Zheludev

Abstract: Magnetic, thermodynamic, neutron diffraction and inelastic neutron scattering are used to study spin correlations in the easy-axis XXZ triangular lattice magnet K2Co(SeO3)2. Despite the presence of quasi-2D "supersolid" magnetic order, the low-energy excitation spectrum contains no sharp modes and is instead a broad and structured multi-particle continuum. Applying a weak magnetic field drives the… ▽ More Magnetic, thermodynamic, neutron diffraction and inelastic neutron scattering are used to study spin correlations in the easy-axis XXZ triangular lattice magnet K2Co(SeO3)2. Despite the presence of quasi-2D "supersolid" magnetic order, the low-energy excitation spectrum contains no sharp modes and is instead a broad and structured multi-particle continuum. Applying a weak magnetic field drives the system into an m = 1/3 fractional magnetization plateau phase and restores sharp spin wave modes. To some extent, the behavior at zero field can be understood in terms of spin wave decay. However, the presence of clear excitation minima at the M-points of the Brillouin zone suggest that the spinon language may provide a more adequate description, and signals a possible proximity to a Dirac spin liquid state. △ Less

Submitted 26 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.16459 [pdf, other]

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors

Authors: Shiyin Dong, Mingrui Zhu, Kun Cheng, Nannan Wang, Xinbo Gao

Abstract: The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potenti… ▽ More The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potential synergies between generative and discriminative models. In this paper, we propose Vermouth, a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. We emphasize that there is no necessity for incorporating a heavyweight or intricate decoder to transform diffusion models into potent representation learners. Extensive comparative evaluations against tailored discriminative models showcase the efficacy of our approach on zero-shot sketch-based image retrieval (ZS-SBIR), few-shot classification, and open-vocabulary semantic segmentation tasks. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 18 pages,11 figures

arXiv:2401.15843 [pdf, other]

APIGen: Generative API Method Recommendation

Authors: Yujia Chen, Cuiyun Gao, Muyijie Zhu, Qing Liao, Yong Wang, Guoai Xu

Abstract: Automatic API method recommendation is an essential task of code intelligence, which aims to suggest suitable APIs for programming queries. Existing approaches can be categorized into two primary groups: retrieval-based and learning-based approaches. Although these approaches have achieved remarkable success, they still come with notable limitations. The retrieval-based approaches rely on the text… ▽ More Automatic API method recommendation is an essential task of code intelligence, which aims to suggest suitable APIs for programming queries. Existing approaches can be categorized into two primary groups: retrieval-based and learning-based approaches. Although these approaches have achieved remarkable success, they still come with notable limitations. The retrieval-based approaches rely on the text representation capabilities of embedding models, while the learning-based approaches require extensive task-specific labeled data for training. To mitigate the limitations, we propose APIGen, a generative API recommendation approach through enhanced in-context learning (ICL). APIGen involves two main components: (1) Diverse Examples Selection. APIGen searches for similar posts to the programming queries from the lexical, syntactical, and semantic perspectives, providing more informative examples for ICL. (2) Guided API Recommendation. APIGen enables large language models (LLMs) to perform reasoning before generating API recommendations, where the reasoning involves fine-grained matching between the task intent behind the queries and the factual knowledge of the APIs. With the reasoning process, APIGen makes recommended APIs better meet the programming requirement of queries and also enhances the interpretability of results. We compare APIGen with four existing approaches on two publicly available benchmarks. Experiments show that APIGen outperforms the best baseline CLEAR by 105.8% in method-level API recommendation and 54.3% in class-level API recommendation in terms of SuccessRate@1. Besides, APIGen achieves an average 49.87% increase compared to the zero-shot performance of popular LLMs such as GPT-4 in method-level API recommendation regarding the SuccessRate@3 metric. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: To appear in the proceedings of the 31st IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2024)

arXiv:2401.15397 [pdf, other]

FASHI: A search for extragalactic OH megamasers with FAST

Authors: Chuan-Peng Zhang, Cheng Cheng, Ming Zhu, **-Long Xu, Peng Jiang

Abstract: The FAST All Sky HI survey (FASHI) is broader in frequency band and sky volume, and deeper in detection sensitivity than the Arecibo Legacy Fast ALFA survey (ALFALFA). To efficiently expand the sample of OH megamasers (OHMs), whose strongest line has a rest frequency of 1667.35903 MHz, we directly matched the IRAS Point Source Catalog Redshift (PSCz) catalog with the corresponding FASHI data cube.… ▽ More The FAST All Sky HI survey (FASHI) is broader in frequency band and sky volume, and deeper in detection sensitivity than the Arecibo Legacy Fast ALFA survey (ALFALFA). To efficiently expand the sample of OH megamasers (OHMs), whose strongest line has a rest frequency of 1667.35903 MHz, we directly matched the IRAS Point Source Catalog Redshift (PSCz) catalog with the corresponding FASHI data cube. From 145 PSCz sources already covered by FASHI, we obtained 27 OHMs with a detection rate of 18.6%, including 9 previously known and 18 new ones, within a redshift range of $0.14314\lesssim z_{\rm OH} \lesssim0.27656$. We also measured the hyperfine ratio of nine OHMs between the 1667 and 1665 MHz lines. The ratio ranges from 1.32 to 15.22, with an average of $R_{1667:1665}=4.74$. In a fit to the $L_{\rm OH}$ vs. $L_{\rm FIR}$ relation, we have ${\rm log}L_{\rm OH}= (1.57\pm0.10){\rm log}L_{\rm FIR}-(15.80\pm1.19)$, which is almost the same as derived from previous observations. As expected, since the OHM sample was selected by cross-correlation with the IRAS-selected PSCz, our detected OHMs are [ultra]luminous infrared galaxies ([U]LIRGs). However, not all [U]LIRGs have detectable OH emission, suggesting that the OH emission may be triggered within a specific stage of the merger or can only be seen in specific orientations. In general, FAST, with its 19-beam array and UWB receiver, will be a powerful tool for observing more OHMs and unraveling their mystery in the future. △ Less

Submitted 16 June, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: 21 pages, 6 figures. Comments are welcome

arXiv:2401.15002 [pdf, other]

BackdoorBench: A Comprehensive Benchmark and Analysis of Backdoor Learning

Authors: Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Mingli Zhu, Ruotong Wang, Li Liu, Chao Shen

Abstract: As an emerging and vital topic for studying deep neural networks' vulnerability (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibili… ▽ More As an emerging and vital topic for studying deep neural networks' vulnerability (DNNs), backdoor learning has attracted increasing interest in recent years, and many seminal backdoor attack and defense algorithms are being developed successively or concurrently, in the status of a rapid arms race. However, mainly due to the diverse settings, and the difficulties of implementation and reproducibility of existing works, there is a lack of a unified and standardized benchmark of backdoor learning, causing unfair comparisons, and unreliable conclusions (e.g., misleading, biased or even false conclusions). Consequently, it is difficult to evaluate the current progress and design the future development roadmap of this literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning called BackdoorBench. Our benchmark makes three valuable contributions to the research community. 1) We provide an integrated implementation of state-of-the-art (SOTA) backdoor learning algorithms (currently including 16 attack and 27 defense algorithms), based on an extensible modular-based codebase. 2) We conduct comprehensive evaluations of 12 attacks against 16 defenses, with 5 poisoning ratios, based on 4 models and 4 datasets, thus 11,492 pairs of evaluations in total. 3) Based on above evaluations, we present abundant analysis from 8 perspectives via 18 useful analysis tools, and provide several inspiring insights about backdoor learning. We hope that our efforts could build a solid foundation of backdoor learning to facilitate researchers to investigate existing algorithms, develop more innovative algorithms, and explore the intrinsic mechanism of backdoor learning. Finally, we have created a user-friendly website at http://backdoorbench.com, which collects all important information of BackdoorBench, including codebase, docs, leaderboard, and model Zoo. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14148 [pdf, other]

LanDA: Language-Guided Multi-Source Domain Adaptation

Authors: Zhenbin Wang, Lei Zhang, Lituan Wang, Minjuan Zhu

Abstract: Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain. However, existing MSDA techniques assume target domain images are available, yet overlook image-rich semantic information. Consequently, an open question is whether MSDA can be guided solely by textual cues in the absenc… ▽ More Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain. However, existing MSDA techniques assume target domain images are available, yet overlook image-rich semantic information. Consequently, an open question is whether MSDA can be guided solely by textual cues in the absence of target domain images. By employing a multimodal model with a joint image and language embedding space, we propose a novel language-guided MSDA approach, termed LanDA, based on optimal transfer theory, which facilitates the transfer of multiple source domains to a new target domain, requiring only a textual description of the target domain without needing even a single target domain image, while retaining task-relevant information. We present extensive experiments across different transfer scenarios using a suite of relevant benchmarks, demonstrating that LanDA outperforms standard fine-tuning and ensemble approaches in both target and source domains. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 20 pages, 8 figures

Showing 51–100 of 860 results for author: Zhu, M