Search | arXiv e-print repository

CausalPrism: A Visual Analytics Approach for Subgroup-based Causal Heterogeneity Exploration

Authors: Jiehui Zhou, Xumeng Wang, Wong Kam-Kwai, Wei Zhang, Xingyu Liu, Juntian Zhang, Minfeng Zhu, Wei Chen

Abstract: In causal inference, estimating Heterogeneous Treatment Effects (HTEs) from observational data is critical for understanding how different subgroups respond to treatments, with broad applications such as precision medicine and targeted advertising. However, existing work on HTE, subgroup discovery, and causal visualization is insufficient to address two challenges: first, the sheer number of poten… ▽ More In causal inference, estimating Heterogeneous Treatment Effects (HTEs) from observational data is critical for understanding how different subgroups respond to treatments, with broad applications such as precision medicine and targeted advertising. However, existing work on HTE, subgroup discovery, and causal visualization is insufficient to address two challenges: first, the sheer number of potential subgroups and the necessity to balance multiple objectives (e.g., high effects and low variances) pose a considerable analytical challenge. Second, effective subgroup analysis has to follow the analysis goal specified by users and provide causal results with verification. To this end, we propose a visual analytics approach for subgroup-based causal heterogeneity exploration. Specifically, we first formulate causal subgroup discovery as a constrained multi-objective optimization problem and adopt a heuristic genetic algorithm to learn the Pareto front of optimal subgroups described by interpretable rules. Combining with this model, we develop a prototype system, CausalPrism, that incorporates tabular visualization, multi-attribute rankings, and uncertainty plots to support users in interactively exploring and sorting subgroups and explaining treatment effects. Quantitative experiments validate that the proposed model can efficiently mine causal subgroups that outperform state-of-the-art HTE and subgroup discovery methods, and case studies and expert interviews demonstrate the effectiveness and usability of the system. Code is available at https://osf.io/jaqmf/?view_only=ac9575209945476b955bf829c85196e9. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 7 figures

arXiv:2407.00579 [pdf, ps, other]

Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems

Authors: Miaomiao Zhu, Pengxu Chen, Liang Yang, Alexandros-Apostolos A. Boulogeorgos, Theodoros A. Tsiftsis, Hongwu Liu

Abstract: Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o… ▽ More Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim of maximizing the covert rate. Specifically, a dual-function base-station (BS) transmits the superposition signal to sense multiple targets, while achieving covert and reliable communications for a pair of NOMA covert and public users, respectively, in the presence of a warden. Two superposition transmission schemes, namely, the transmissions with dedicated sensing signal (w-DSS) and without dedicated sensing signal (w/o-DSS), are respectively considered in the formulations of the joint transmission and reflection beamforming optimization problems. Numerical results demonstrate that active-RIS-aided NOMA-ISAC system outperforms the passive-RIS-aided and without-RIS counterparts in terms of covert rate and trade-off between covert communication and sensing performance metrics. Finally, the w/o-DSS scheme, which omits the dedicated sensing signal, achieves a higher covert rate than the w-DSS scheme by allocating more transmit power for the covert transmissions, while preserving a comparable multi-target sensing performance. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19966 [pdf, other]

Simulating Financial Market via Large Language Model based Agents

Authors: Shen Gao, Yuntao Wen, Minghang Zhu, Jianing Wei, Yuhan Cheng, Qunzi Zhang, Shuo Shang

Abstract: Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}… ▽ More Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}arket (ASFM), which first constructs a simulated stock market with a real order matching system. Then, we propose a large language model based agent as the stock trader, which contains the profile, observation, and tool-learning based action module. The trading agent can comprehensively understand current market dynamics and financial policy information, and make decisions that align with their trading strategy. In the experiments, we first verify that the reactions of our ASFM are consistent with the real stock market in two controllable scenarios. In addition, we also conduct experiments in two popular economics research directions, and we find that conclusions drawn in our \model align with the preliminary findings in economics research. Based on these observations, we believe our proposed ASFM provides a new paradigm for economic research. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19693 [pdf, other]

MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?

Authors: **ming Li, Yichen Zhu, Zhiyuan Xu, **dong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang

Abstract: It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, m… ▽ More It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, mastering commonsense and abstract reasoning. This has led to the recent utilization of MLLMs as the brain in robotic systems, enabling these models to conduct high-level planning prior to triggering low-level control actions for task execution. However, it remains uncertain whether existing MLLMs are reliable in serving the brain role of robots. In this study, we introduce the first benchmark for evaluating Multimodal LLM for Robotic (MMRo) benchmark, which tests the capability of MLLMs for robot applications. Specifically, we identify four essential capabilities perception, task planning, visual reasoning, and safety measurement that MLLMs must possess to qualify as the robot's central processing unit. We have developed several scenarios for each capability, resulting in a total of 14 metrics for evaluation. We present experimental results for various MLLMs, including both commercial and open-source models, to assess the performance of existing systems. Our findings indicate that no single model excels in all areas, suggesting that current MLLMs are not yet trustworthy enough to serve as the cognitive core for robots. Our data can be found in https://mm-robobench.github.io/. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19672 [pdf, other]

Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics

Authors: Chengrui Gao, Ziyuan Yang, Andrew Beng ** Teoh, Min Zhu

Abstract: Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order text… ▽ More Recently, finger knuckle prints (FKPs) have gained attention due to their rich textural patterns, positioning them as a promising biometric for identity recognition. Prior FKP recognition methods predominantly leverage first-order feature descriptors, which capture intricate texture details but fail to account for structural information. Emerging research, however, indicates that second-order textures, which describe the curves and arcs of the textures, encompass this overlooked structural information. This paper introduces a novel FKP recognition approach, the Dual-Order Texture Competition Network (DOTCNet), designed to capture texture information in FKP images comprehensively. DOTCNet incorporates three dual-order texture competitive modules (DTCMs), each targeting textures at different scales. Each DTCM employs a learnable texture descriptor, specifically a learnable Gabor filter (LGF), to extract texture features. By leveraging LGFs, the network extracts first and second order textures to describe fine textures and structural features thoroughly. Furthermore, an attention mechanism enhances relevant features in the first-order features, thereby highlighting significant texture details. For second-order features, a competitive mechanism emphasizes structural information while reducing noise from higher-order features. Extensive experimental results reveal that DOTCNet significantly outperforms several standard algorithms on the publicly available PolyU-FKP dataset. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18784 [pdf, other]

Self-consistent expansion and field-theoretic renormalization group for a singular nonlinear diffusion equation with anomalous scaling

Authors: Minhui Zhu, Nigel Goldenfeld

Abstract: The method of self-consistent expansions is a powerful tool for handling strong coupling problems that might otherwise be beyond the reach of perturbation theory, providing surprisingly accurate approximations even at low order. First applied in its embryonic form to fully-developed turbulence, it has subsequently been successfully applied to a variety of problems that include polymer statistics,… ▽ More The method of self-consistent expansions is a powerful tool for handling strong coupling problems that might otherwise be beyond the reach of perturbation theory, providing surprisingly accurate approximations even at low order. First applied in its embryonic form to fully-developed turbulence, it has subsequently been successfully applied to a variety of problems that include polymer statistics, interface dynamics and high order perturbation theory for the anharmonic oscillator. Here we show that the self-consistent expansion can be applied to singular perturbation problems arising in the theory of partial differential equations. We demonstrate its application to Barenblatt's nonlinear diffusion equation for porous media filtration, where the long-time asymptotics exhibits anomalous dimensions which can be systematically calculated using the perturbative renormalization group. We find that even the first order self-consistent expansion improves the approximation of the anomalous dimension obtained by the first order perturbative renormalization group, especially in the strong coupling regime. We also develop a field-theoretic framework for deterministic partial differential equations to facilitate the application of self-consistent expansions to other dynamic systems, and illustrate its application using the example of Barenblatt's equation. The scope of our results on the combination of renormalization group and self-consistent expansions is limited to partial differential equations whose long-time asymptotics is controlled by incomplete similarity. However, our work suggests that these methods could be applied to a broader suite of singular perturbation problems such as boundary layer theory, multiple scales analysis and matched asymptotic expansions, for which excellent approximations using renormalization group methods alone are already available. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 13 pages, 1 figure

arXiv:2406.18691 [pdf, other]

Geometric Features Enhanced Human-Object Interaction Detection

Authors: Manli Zhu, Edmond S. L. Ho, Shuang Chen, Longzhi Yang, Hubert P. H. Shum

Abstract: Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. Howe… ▽ More Cameras are essential vision instruments to capture images for pattern detection and measurement. Human-object interaction (HOI) detection is one of the most popular pattern detection approaches for captured human-centric visual scenes. Recently, Transformer-based models have become the dominant approach for HOI detection due to their advanced network architectures and thus promising results. However, most of them follow the one-stage design of vanilla Transformer, leaving rich geometric priors under-exploited and leading to compromised performance especially when occlusion occurs. Given that geometric features tend to outperform visual ones in occluded scenarios and offer information that complements visual cues, we propose a novel end-to-end Transformer-style HOI detection model, i.e., geometric features enhanced HOI detector (GeoHOI). One key part of the model is a new unified self-supervised keypoint learning method named UniPointNet that bridges the gap of consistent keypoint representation across diverse object categories, including humans. GeoHOI effectively upgrades a Transformer-based HOI detector benefiting from the keypoints similarities measuring the likelihood of human-object interactions as well as local keypoint patches to enhance interaction query representation, so as to boost HOI predictions. Extensive experiments show that the proposed method outperforms the state-of-the-art models on V-COCO and achieves competitive performance on HICO-DET. Case study results on the post-disaster rescue with vision-based instruments showcase the applicability of the proposed GeoHOI in real-world applications. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted to IEEE TIM

arXiv:2406.18518 [pdf, other]

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/ △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18017 [pdf, other]

Dependence Analysis and Structured Construction for Batched Sparse Code

Authors: Jiaxin Qing, Xiaohong Cai, Yijun Fan, Mingyang Zhu, Raymond W. Yeung

Abstract: In coding theory, codes are usually designed with a certain level of randomness to facilitate analysis and accommodate different channel conditions. However, the resulting random code constructed can be suboptimal in practical implementations. Represented by a bipartite graph, the Batched Sparse Code (BATS Code) is a randomly constructed erasure code that utilizes network coding to achieve near-op… ▽ More In coding theory, codes are usually designed with a certain level of randomness to facilitate analysis and accommodate different channel conditions. However, the resulting random code constructed can be suboptimal in practical implementations. Represented by a bipartite graph, the Batched Sparse Code (BATS Code) is a randomly constructed erasure code that utilizes network coding to achieve near-optimal performance in wireless multi-hop networks. In the performance analysis in the previous research, it is implicitly assumed that the coded batches in the BATS code are independent. This assumption holds only asymptotically when the number of input symbols is infinite, but it does not generally hold in a practical setting where the number of input symbols is finite, especially when the code is constructed randomly. We show that dependence among the batches significantly degrades the code's performance. In order to control the batch dependence through graphical design, we propose constructing the BATS code in a structured manner. A hardware-friendly structured BATS code called the Cyclic-Shift BATS (CS-BATS) code is proposed, which constructs the code from a small base graph using light-weight cyclic-shift operations. We demonstrate that when the base graph is properly designed, a higher decoding rate and a smaller complexity can be achieved compared with the random BATS code. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16978 [pdf, other]

MetaFollower: Adaptable Personalized Autonomous Car Following

Authors: Xianda Chen, Kehua Chen, Meixin Zhu, Hao, Yang, Shaojie Shen, Xuesong Wang, Yinhai Wang

Abstract: Car-following (CF) modeling, a fundamental component in microscopic traffic simulation, has attracted increasing interest of researchers in the past decades. In this study, we propose an adaptable personalized car-following framework -MetaFollower, by leveraging the power of meta-learning. Specifically, we first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from v… ▽ More Car-following (CF) modeling, a fundamental component in microscopic traffic simulation, has attracted increasing interest of researchers in the past decades. In this study, we propose an adaptable personalized car-following framework -MetaFollower, by leveraging the power of meta-learning. Specifically, we first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events. Afterward, the pre-trained model can be fine-tuned on new drivers with only a few CF trajectories to achieve personalized CF adaptation. We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability. Unlike conventional adaptive cruise control (ACC) systems that rely on predefined settings and constant parameters without considering heterogeneous driving characteristics, MetaFollower can accurately capture and simulate the intricate dynamics of car-following behavior while considering the unique driving styles of individual drivers. We demonstrate the versatility and adaptability of MetaFollower by showcasing its ability to adapt to new drivers with limited training data quickly. To evaluate the performance of MetaFollower, we conduct rigorous experiments comparing it with both data-driven and physics-based models. The results reveal that our proposed framework outperforms baseline models in predicting car-following behavior with higher accuracy and safety. To the best of our knowledge, this is the first car-following model aiming to achieve fast adaptation by considering both driver and temporal heterogeneity based on meta-learning. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16531 [pdf, other]

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

Authors: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

Abstract: The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed… ▽ More The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed, incorporating the powerful SAM, ChatGPT and generative models. Upon this basis, We propose the GIM dataset, which has the following advantages: 1) Large scale, including over one million pairs of AI-manipulated images and real images. 2) Rich Image Content, encompassing a broad range of image classes 3) Diverse Generative Manipulation, manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce two benchmark settings to evaluate the generalization capability and comprehensive performance of baseline methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial Block (FSB), and a Multi-window Anomalous Modelling (MWAM) Module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses previous state-of-the-art works significantly on two different benchmarks. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Code page: https://github.com/chenyirui/GIM

arXiv:2406.14862 [pdf, other]

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models

Authors: Mengdan Zhu, Raasikh Kanjiani, Jiahui Lu, Andrew Choi, Qirui Ye, Liang Zhao

Abstract: Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framewo… ▽ More Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. By perturbing latent variables and interpreting changes in generated data, the framework provides a systematic approach to understanding and controlling the data generation process, enhancing the transparency and interpretability of deep generative models. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables. △ Less

Submitted 28 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.12577 [pdf, other]

Cephalometric Landmark Detection across Ages with Prototypical Network

Authors: Han Wu, Chong Wang, Lanzhuju Mei, Tong Yang, Min Zhu, Dingggang Shen, Zhiming Cui

Abstract: Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across… ▽ More Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across various age groups, including adolescents and adults. In this paper, we propose CeLDA, the first work for Cephalometric Landmark Detection across Ages. Our method leverages a prototypical network for landmark detection by comparing image features with landmark prototypes. To tackle the appearance discrepancy of landmarks between age groups, we design new strategies for CeLDA to improve prototype alignment and obtain a holistic estimation of landmark prototypes from a large set of training images. Moreover, a novel prototype relation mining paradigm is introduced to exploit the anatomical relations between the landmark prototypes. Extensive experiments validate the superiority of CeLDA in detecting cephalometric landmarks on both adult and adolescent subjects. To our knowledge, this is the first effort toward develo** a unified solution and dataset for cephalometric landmark detection across age groups. Our code and dataset will be made public on https://github.com/ShanghaiTech-IMPACT/Cephalometric-Landmark-Detection-across-Ages-with-Prototypical-Network △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: MICCAI 2024

arXiv:2406.11689 [pdf, other]

Lightweight Model Pre-training via Language Guided Knowledge Distillation

Authors: Mingsheng Li, Lin Zhang, Mingzhen Zhu, Zilong Huang, Gang Yu, Jiayuan Fan, Tao Chen

Abstract: This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches a… ▽ More This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation. We have made the code available at https://github.com/mZhenz/LGD. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11150 [pdf, ps, other]

The stability of sheath to the nonisentropic Euler-Poisson system with fluid-boundary interaction

Authors: Haiyan Yin, Rong Zeng, Mengmeng Zhu

Abstract: In the present paper, we define the sheath by a monotone stationary solution to the nonisentropic Euler-Poisson system under a condition known as the Bohm criterion and consider a situation in which charged particles accumulate on the boundary due to the flux from the inner region. Under this fluid-boundary interactive setting, we prove the large time asymptotic stability of the sheath provided th… ▽ More In the present paper, we define the sheath by a monotone stationary solution to the nonisentropic Euler-Poisson system under a condition known as the Bohm criterion and consider a situation in which charged particles accumulate on the boundary due to the flux from the inner region. Under this fluid-boundary interactive setting, we prove the large time asymptotic stability of the sheath provided that the initial perturbation is sufficiently small in some weighted Sobolev spaces. Moreover, the convergence rate of the solution toward the sheath is obtained. The proof is based on the weighted energy method. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2011.02111; text overlap with arXiv:2403.09730 by other authors

arXiv:2406.10540 [pdf, other]

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Authors: Xu Han, Qiannan Yang, Xianda Chen, Xiaowen Chu, Meixin Zhu

Abstract: Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in a… ▽ More Reinforcement Learning (RL) plays a crucial role in advancing autonomous driving technologies by maximizing reward functions to achieve the optimal policy. However, crafting these reward functions has been a complex, manual process in many practices. To reduce this complexity, we introduce a novel framework that integrates Large Language Models (LLMs) with RL to improve reward function design in autonomous driving. This framework utilizes the coding capabilities of LLMs, proven in other areas, to generate and evolve reward functions for highway scenarios. The framework starts with instructing LLMs to create an initial reward function code based on the driving environment and task descriptions. This code is then refined through iterative cycles involving RL training and LLMs' reflection, which benefits from their ability to review and improve the output. We have also developed a specific prompt template to improve LLMs' understanding of complex driving simulations, ensuring the generation of effective and error-free code. Our experiments in a highway driving simulator across three traffic configurations show that our method surpasses expert handcrafted reward functions, achieving a 22% higher average success rate. This not only indicates safer driving but also suggests significant gains in development productivity. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08841 [pdf, ps, other]

Engineering bound state in continuum via giant atom in photonic waveguide

Authors: Xiaojun Zhang, Mingjie Zhu, Zhihai Wang

Abstract: The bound state in the continuum (BIC) in photonic system has been widely used in the field of lasing and sensing. We here find the controllable BIC in an artificial giant atom-dressed one-dimensional photonic waveguide. The giant atom couples to the waveguide via two distant sites. We find that the energy and the photonic distribution in the BIC can be controlled on demand by tuning the frequency… ▽ More The bound state in the continuum (BIC) in photonic system has been widely used in the field of lasing and sensing. We here find the controllable BIC in an artificial giant atom-dressed one-dimensional photonic waveguide. The giant atom couples to the waveguide via two distant sites. We find that the energy and the photonic distribution in the BIC can be controlled on demand by tuning the frequency and the size of the giant atom as well as its coupling phase with the waveguide. More interestingly, we predict the quantum beats in the atomic and photonic dynamical evolution, which is induced by the oscillation between the BIC and bound state outside the continuum (BOC). These findings provide an approach to manipulate the waveguide system via the bound states, and can be applied in the quantum information processing. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 figures, comments are welcomed

arXiv:2406.06558 [pdf, other]

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

Authors: Ye Zhang, Qian Leng, Mengran Zhu, Rui Ding, Yue Wu, **tong Song, Yulu Gong

Abstract: The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF tech… ▽ More The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF techniques with advanced machine learning models, including Bayesian classifiers, Stochastic Gradient Descent (SGD), Categorical Gradient Boosting (CatBoost), and 12 instances of Deberta-v3-large models. Our approach aims to address the challenges associated with detecting AI-generated text by leveraging the strengths of both traditional feature extraction methods and state-of-the-art deep learning models. Through extensive experiments on a comprehensive dataset, we demonstrate the effectiveness of our proposed method in accurately distinguishing between human and AI-generated text. Our approach achieves superior performance compared to existing methods. This research contributes to the advancement of AI-generated text detection techniques and lays the foundation for develo** robust solutions to mitigate the challenges posed by AI-generated content. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2406.05380 [pdf]

Observation of floating surface state in obstructed atomic insulator candidate NiP$_2$

Authors: Xiang-Rui Liu, Ming-Yuan Zhu, Yuanwen Feng, Meng Zeng, Xiao-Ming Ma, Yu-Jie Hao, Yue Dai, Rong-Hao Luo, Kohei Yamagami, Yi Liu, Shengtao Cui, Zhe Sun, Jia-Yu Liu, Zhengtai Liu, Mao Ye, Dawei Shen, Bing Li, Chang Liu

Abstract: Obstructed atomic insulator is recently proposed as an unconventional material, in which electric charge centers localized at sites away from the atoms. A half-filling surface state would emerge at specific interfaces cutting through these charge centers and avoid intersecting any atoms. In this article, we utilized angle-resolved photoemission spectroscopy and density functional theory calculatio… ▽ More Obstructed atomic insulator is recently proposed as an unconventional material, in which electric charge centers localized at sites away from the atoms. A half-filling surface state would emerge at specific interfaces cutting through these charge centers and avoid intersecting any atoms. In this article, we utilized angle-resolved photoemission spectroscopy and density functional theory calculations to study one of the obstructed atomic insulator candidates, NiP$_2$. A floating surface state with large effective mass that is isolated from all bulk states is resolved on the (100) cleavage plane, distinct from previously reported surface states in obstructed atomic insulators that are merged into bulk bands. Density functional theory calculation results elucidate that this floating surface state is originated from the obstructed Wannier charge centers, albeit underwent surface reconstruction that splits the half-filled obstructed surface state. Our findings not only shed lights on the spectroscopy study of obstructed atomic insulators and obstructed surface states, but also provide possible route for development of new catalysts. △ Less

Submitted 16 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: 21 pages, 5 figures

arXiv:2406.05042 [pdf, other]

Digital Twins of the EM Environment: Benchmark for Ray Launching Models

Authors: Michele Zhu, Lorenzo Cazzella, Francesco Linsalata, Maurizio Magarini, Matteo Matteucci, Umberto Spagnolini

Abstract: Digital Twin has emerged as a promising paradigm for accurately representing the electromagnetic (EM) wireless environments. The resulting virtual representation of the reality facilitates comprehensive insights into the propagation environment, empowering multi-layer decision-making processes at the physical communication level. This paper investigates the digitization of wireless communication p… ▽ More Digital Twin has emerged as a promising paradigm for accurately representing the electromagnetic (EM) wireless environments. The resulting virtual representation of the reality facilitates comprehensive insights into the propagation environment, empowering multi-layer decision-making processes at the physical communication level. This paper investigates the digitization of wireless communication propagation, with particular emphasis on the indispensable aspect of ray-based propagation simulation for real-time Digital Twins. A benchmark for ray-based propagation simulations is presented to evaluate computational time, with two urban scenarios characterized by different mesh complexity, single and multiple wireless link configurations, and simulations with/without diffuse scattering. Exhaustive empirical analyses are performed showing and comparing the behavior of different ray-based solutions. By offering standardized simulations and scenarios, this work provides a technical benchmark for practitioners involved in the implementation of real-time Digital Twins and optimization of ray-based propagation models. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04501 [pdf, other]

FLUID-LLM: Learning Computational Fluid Dynamics with Spatiotemporal-aware Large Language Models

Authors: Max Zhu, Adrián Bazaga, Pietro Liò

Abstract: Learning computational fluid dynamics (CFD) traditionally relies on computationally intensive simulations of the Navier-Stokes equations. Recently, large language models (LLMs) have shown remarkable pattern recognition and reasoning abilities in natural language processing (NLP) and computer vision (CV). However, these models struggle with the complex geometries inherent in fluid dynamics. We intr… ▽ More Learning computational fluid dynamics (CFD) traditionally relies on computationally intensive simulations of the Navier-Stokes equations. Recently, large language models (LLMs) have shown remarkable pattern recognition and reasoning abilities in natural language processing (NLP) and computer vision (CV). However, these models struggle with the complex geometries inherent in fluid dynamics. We introduce FLUID-LLM, a novel framework combining pre-trained LLMs with spatiotemporal-aware encoding to predict unsteady fluid dynamics. Our approach leverages the temporal autoregressive abilities of LLMs alongside spatial-aware layers, bridging the gap between previous CFD prediction methods. Evaluations on standard benchmarks reveal significant performance improvements across various fluid datasets. Our results demonstrate that FLUID-LLM effectively integrates spatiotemporal information into pre-trained LLMs, enhancing CFD task performance. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03733 [pdf, other]

Credit Card Fraud Detection Using Advanced Transformer Model

Authors: Chang Yu, Yongshun Xu, ** Cao, Ye Zhang, Yinxin **, Mengran Zhu

Abstract: With the proliferation of various online and mobile payment systems, credit card fraud has emerged as a significant threat to financial security. This study focuses on innovative applications of the latest Transformer models for more robust and precise fraud detection. To ensure the reliability of the data, we meticulously processed the data sources, balancing the dataset to address the issue of d… ▽ More With the proliferation of various online and mobile payment systems, credit card fraud has emerged as a significant threat to financial security. This study focuses on innovative applications of the latest Transformer models for more robust and precise fraud detection. To ensure the reliability of the data, we meticulously processed the data sources, balancing the dataset to address the issue of data sparsity significantly. We also selected highly correlated vectors to strengthen the training process.To guarantee the reliability and practicality of the new Transformer model, we conducted performance comparisons with several widely adopted models, including Support Vector Machine (SVM), Random Forest, Neural Network, and Logistic Regression. We rigorously compared these models using metrics such as Precision, Recall, and F1 Score. Through these detailed analyses and comparisons, we present to the readers a highly efficient and powerful anti-fraud mechanism with promising prospects. The results demonstrate that the Transformer model not only excels in traditional applications but also shows great potential in niche areas like fraud detection, offering a substantial advancement in the field. △ Less

Submitted 21 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: This paper have been received by https://ieee-metacom.org/

arXiv:2406.02712 [pdf, other]

Efficiency in Pure-Exchange Economies with Risk-Averse Monetary Utilities

Authors: Mario Ghossoub, Michael Boyuan Zhu

Abstract: We study Pareto efficiency in a pure-exchange economy where agents' preferences are represented by risk-averse monetary utilities. These coincide with law-invariant monetary utilities, and they can be shown to correspond to the class of monotone, (quasi-)concave, Schur concave, and translation-invariant utility functionals. This covers a large class of utility functionals, including a variety of l… ▽ More We study Pareto efficiency in a pure-exchange economy where agents' preferences are represented by risk-averse monetary utilities. These coincide with law-invariant monetary utilities, and they can be shown to correspond to the class of monotone, (quasi-)concave, Schur concave, and translation-invariant utility functionals. This covers a large class of utility functionals, including a variety of law-invariant robust utilities. We show that Pareto optima exist and are comonotone, and we provide a crisp characterization thereof in the case of law-invariant positively homogeneous monetary utilities. This characterization provides an easily implementable algorithm that fully determines the shape of Pareto-optimal allocations. In the special case of law-invariant comonotone-additive monetary utility functionals (concave Yaari-Dual utilities), we provide a closed-form characterization of Pareto optima. As an application, we examine risk-sharing markets where all agents evaluate risk through law-invariant coherent risk measures, a widely popular class of risk measures. In a numerical illustration, we characterize Pareto-optimal risk-sharing for some special types of coherent risk measures. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02435 [pdf, other]

Generative Active Learning for Long-tailed Instance Segmentation

Authors: Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen

Abstract: Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is… ▽ More Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is still a lack of research oriented towards active learning on generated data. In this paper, we explore how to perform active learning specifically for generated data in the long-tailed instance segmentation task. Subsequently, we propose BSGAL, a new algorithm that online estimates the contribution of the generated data based on gradient cache. BSGAL can handle unlimited generated data and complex downstream segmentation tasks effectively. Experiments show that BSGAL outperforms the baseline approach and effectually improves the performance of long-tailed segmentation. Our code can be found at https://github.com/aim-uofa/DiverGen. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.00416 [pdf, other]

Representation and De-interleaving of Mixtures of Hidden Markov Processes

Authors: Jiadi Bao, Mengtao Zhu, Yunjie Li, Shafei Wang

Abstract: De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi… ▽ More De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consuming. To address these issues, this paper proposes a novel representation model and corresponding de-interleaving methods for the mixtures of HMPs. At first, a generative model for representing the mixtures of HMPs is designed. Subsequently, the de-interleaving process is formulated as a posterior inference for the generative model. Secondly, an exact inference method is developed to maximize the likelihood of the complete data, and two approximate inference methods are developed to maximize the evidence lower bound by creating tractable structures. Then, a theoretical error probability lower bound is derived using the likelihood ratio test, and the algorithms are shown to get reasonably close to the bound. Finally, simulation results demonstrate that the proposed methods are highly effective and robust for non-ideal situations, outperforming baseline methods on simulated and real-life data. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 13 pages, 9 figures, submitted to IEEE transactions on Signal Processing

arXiv:2406.00012 [pdf, other]

Extracting Essential and Disentangled Knowledge for Recommendation Enhancement

Authors: Kounianhua Du, Jizheng Chen, Jianghao Lin, Menghui Zhu, Bo Chen, Shuai Li, Ruiming Tang

Abstract: Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution, e.g., the evolving user interests, click signals fluctuation during sales promotions, etc. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-ac… ▽ More Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution, e.g., the evolving user interests, click signals fluctuation during sales promotions, etc. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-accumulating data is hard, which causes dramatic storage overhead. Memorizing old data through a parametric knowledge base is then proposed, which compresses the vast amount of raw data into model parameters. Despite the flexibility, how to improve the memorization and generalization capabilities of the parametric knowledge base is challenging. In this paper, we propose two constraints to extract Essential and Disentangled Knowledge from past data for rational and generalized recommendation enhancement, which improves the capabilities of the parametric knowledge base without increasing the size of it. The essential principle helps to compress the input into representative vectors that capture the task-relevant information and filter out the noisy information. The disentanglement principle reduces the redundancy of stored information and pushes the knowledge base to focus on capturing the disentangled invariant patterns. These two rules together promote rational compression of information for robust and generalized knowledge representations. Extensive experiments on two datasets justify the effectiveness of the proposed method. △ Less

Submitted 20 May, 2024; originally announced June 2024.

arXiv:2405.19740 [pdf, other]

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Authors: Jiatong Li, Renjun Hu, Kunzhe Huang, Yan Zhuang, Qi Liu, Mengxiao Zhu, Xing Shi, Wei Lin

Abstract: Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through k… ▽ More Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations. These perturbations employ human-like restatement techniques to generate on-the-fly test samples from static benchmarks, meticulously retaining knowledge-critical content while altering irrelevant details. Our toolkit further includes a suite of transition analyses that compare performance on raw vs. perturbed test sets to precisely assess LLMs' genuine knowledge capacity. Six state-of-the-art LLMs are re-evaluated using PertEval. Results reveal significantly inflated performance of the LLMs on raw benchmarks, including an absolute 21% overestimation for GPT-4. Additionally, through a nuanced response pattern analysis, we discover that PertEval retains LLMs' uncertainty to specious knowledge, potentially being resolved through rote memorization and leading to inflated performance. We also find that the detailed transition analyses by PertEval could illuminate weaknesses in existing LLMs' knowledge mastery and guide the development of refinement. Given these insights, we posit that PertEval can act as an essential tool that, when applied alongside any close-ended benchmark, unveils the true knowledge capacity of LLMs, marking a significant step toward more trustworthy LLM evaluation. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 23 pages, 12 figures, 10 tables

arXiv:2405.18610 [pdf, other]

DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

Authors: Zhiyao Luo, Mingcheng Zhu, Fenglin Liu, Jiali Li, Yangchen Pan, Jiandong Zhou, Tingting Zhu

Abstract: Reinforcement learning (RL) has garnered increasing recognition for its potential to optimise dynamic treatment regimes (DTRs) in personalised medicine, particularly for drug dosage prescriptions and medication recommendations. However, a significant challenge persists: the absence of a unified framework for simulating diverse healthcare scenarios and a comprehensive analysis to benchmark the effe… ▽ More Reinforcement learning (RL) has garnered increasing recognition for its potential to optimise dynamic treatment regimes (DTRs) in personalised medicine, particularly for drug dosage prescriptions and medication recommendations. However, a significant challenge persists: the absence of a unified framework for simulating diverse healthcare scenarios and a comprehensive analysis to benchmark the effectiveness of RL algorithms within these contexts. To address this gap, we introduce \textit{DTR-Bench}, a benchmarking platform comprising four distinct simulation environments tailored to common DTR applications, including cancer chemotherapy, radiotherapy, glucose management in diabetes, and sepsis treatment. We evaluate various state-of-the-art RL algorithms across these settings, particularly highlighting their performance amidst real-world challenges such as pharmacokinetic/pharmacodynamic (PK/PD) variability, noise, and missing data. Our experiments reveal varying degrees of performance degradation among RL algorithms in the presence of noise and patient variability, with some algorithms failing to converge. Additionally, we observe that using temporal observation representations does not consistently lead to improved performance in DTR settings. Our findings underscore the necessity of develo** robust, adaptive RL algorithms capable of effectively managing these complexities to enhance patient-specific healthcare. We have open-sourced our benchmark and code at https://github.com/GilesLuo/DTR-Bench. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 13 pages for main content

arXiv:2405.17278 [pdf, ps, other]

EF-Calib: Spatiotemporal Calibration of Event- and Frame-Based Cameras Using Continuous-Time Trajectories

Authors: Shaoan Wang, Zhanhua Xin, Yaoqing Hu, Dongyue Li, Mingzhu Zhu, Junzhi Yu

Abstract: Event camera, a bio-inspired asynchronous triggered camera, offers promising prospects for fusion with frame-based cameras owing to its low latency and high dynamic range. However, calibrating stereo vision systems that incorporate both event and frame-based cameras remains a significant challenge. In this letter, we present EF-Calib, a spatiotemporal calibration framework for event- and frame-bas… ▽ More Event camera, a bio-inspired asynchronous triggered camera, offers promising prospects for fusion with frame-based cameras owing to its low latency and high dynamic range. However, calibrating stereo vision systems that incorporate both event and frame-based cameras remains a significant challenge. In this letter, we present EF-Calib, a spatiotemporal calibration framework for event- and frame-based cameras using continuous-time trajectories. A novel calibration pattern applicable to both camera types and the corresponding event recognition algorithm is proposed. Leveraging the asynchronous nature of events, a derivable piece-wise B-spline to represent camera pose continuously is introduced, enabling calibration for intrinsic parameters, extrinsic parameters, and time offset, with analytical Jacobians provided. Various experiments are carried out to evaluate the calibration performance of EF-Calib, including calibration experiments for intrinsic parameters, extrinsic parameters, and time offset. Experimental results show that EF-Calib achieves the most accurate intrinsic parameters compared to current SOTA, the close accuracy of the extrinsic parameters compared to the frame-based results, and accurate time offset estimation. EF-Calib provides a convenient and accurate toolbox for calibrating the system that fuses events and frames. The code of this paper will also be open-sourced at: https://github.com/wsakobe/EF-Calib. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2405.16498 [pdf, other]

On Sequential Loss Approximation for Continual Learning

Authors: Menghao Waiyan William Zhu, Ercan Engin Kuruoğlu

Abstract: We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these method… ▽ More We introduce for continual learning Autodiff Quadratic Consolidation (AQC), which approximates the previous loss function with a quadratic function, and Neural Consolidation (NC), which approximates the previous loss function with a neural network. Although they are not scalable to large neural networks, they can be used with a fixed pre-trained feature extractor. We empirically study these methods in class-incremental learning, for which regularization-based methods produce unsatisfactory results, unless combined with replay. We find that for small datasets, quadratic approximation of the previous loss function leads to poor results, even with full Hessian computation, and NC could significantly improve the predictive performance, while for large datasets, when used with a fixed pre-trained feature extractor, AQC provides superior predictive performance. We also find that using tanh-output features can improve the predictive performance of AQC. In particular, in class-incremental Split MNIST, when a Convolutional Neural Network (CNN) with tanh-output features is pre-trained on EMNIST Letters and used as a fixed pre-trained feature extractor, AQC can achieve predictive performance comparable to joint training. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16428 [pdf]

Crystal facet orientation and temperature dependence of charge and spin Hall effects in noncollinear antiferromagnet: A first-principles investigation

Authors: Meng Zhu, Xinlu Li, Fanxing Zheng, Jianting Dong, Ye Zhou, Kun Wu, Jia Zhang

Abstract: Noncollinear antiferromagnets (nc-AFMs) have attracted increasing research attention in spintronics due to their unique spin structures and fascinating charge and spin transport properties. By using first-principles calculations, we comprehensively investigate the charge and spin Hall effects in representative noncollinear antiferromagnet Mn3Pt. Our study reveals that the Hall effects in nc-AFMs a… ▽ More Noncollinear antiferromagnets (nc-AFMs) have attracted increasing research attention in spintronics due to their unique spin structures and fascinating charge and spin transport properties. By using first-principles calculations, we comprehensively investigate the charge and spin Hall effects in representative noncollinear antiferromagnet Mn3Pt. Our study reveals that the Hall effects in nc-AFMs are critically dependent on the crystal facet orientation and temperature. For (001) orientated Mn3Pt, each charge and spin Hall conductivity element is comprised of both time reversal odd (T-odd) and even (T-even) contribution, associated with longitudinal conductivity, which leads to sizable and highly anisotropic Hall conductivity. The temperature dependence of charge and spin Hall conductivity has been elucidated by considering both phonon and spin disorder scattering. The scaling relations between Hall conductivity and longitudinal conductivity have also been investigated. The existence of prominent spin Hall effect in nc-AFMs may generate spin current with Sz spin polarization, which is advantageous for field free switching of perpendicular magnetization. Our work may provide unambiguous understanding on the charge and spin transport in noncollinear antiferromagnets and pave their way for applications in antiferromagnetic spintronics. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16134 [pdf, other]

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Authors: Mingli Zhu, Siyuan Liang, Baoyuan Wu

Abstract: Deep neural networks face persistent challenges in defending against backdoor attacks, leading to an ongoing battle between attacks and defenses. While existing backdoor defense strategies have shown promising performance on reducing attack success rates, can we confidently claim that the backdoor threat has truly been eliminated from the model? To address it, we re-investigate the characteristics… ▽ More Deep neural networks face persistent challenges in defending against backdoor attacks, leading to an ongoing battle between attacks and defenses. While existing backdoor defense strategies have shown promising performance on reducing attack success rates, can we confidently claim that the backdoor threat has truly been eliminated from the model? To address it, we re-investigate the characteristics of the backdoored models after defense (denoted as defense models). Surprisingly, we find that the original backdoors still exist in defense models derived from existing post-training defense strategies, and the backdoor existence is measured by a novel metric called backdoor existence coefficient. It implies that the backdoors just lie dormant rather than being eliminated. To further verify this finding, we empirically show that these dormant backdoors can be easily re-activated during inference, by manipulating the original trigger with well-designed tiny perturbation using universal adversarial attack. More practically, we extend our backdoor reactivation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. The effectiveness of the proposed methods are verified on both image classification and multimodal contrastive learning (i.e., CLIP) tasks. In conclusion, this work uncovers a critical vulnerability that has never been explored in existing defense strategies, emphasizing the urgency of designing more robust and advanced backdoor defense mechanisms in the future. △ Less

Submitted 30 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15842 [pdf, other]

Model Cascading for Code: Reducing Inference Costs with Model Cascading for LLM Based Code Generation

Authors: Boyuan Chen, Mingzhi Zhu, Brendan Dolan-Gavitt, Muhammad Shafique, Siddharth Garg

Abstract: The rapid development of large language models (LLMs) has led to significant advancements in code completion tasks. While larger models have higher accuracy, they also cost much more to run. Meanwhile, model cascading has been proven effective to conserve computational resources while enhancing accuracy in LLMs on natural language generation tasks. It generates output with the smallest model in a… ▽ More The rapid development of large language models (LLMs) has led to significant advancements in code completion tasks. While larger models have higher accuracy, they also cost much more to run. Meanwhile, model cascading has been proven effective to conserve computational resources while enhancing accuracy in LLMs on natural language generation tasks. It generates output with the smallest model in a set, and only queries the larger models when it fails to meet predefined quality criteria. However, this strategy has not been used in code completion tasks, primarily because assessing the quality of code completions differs substantially from assessing natural language, where the former relies heavily on the functional correctness. To address this, we propose letting each model generate and execute a set of test cases for their solutions, and use the test results as the cascading threshold. We show that our model cascading strategy reduces computational costs while increases accuracy compared to generating the output with a single model. We also introduce a heuristics to determine the optimal combination of the number of solutions, test cases, and test lines each model should generate, based on the budget. Compared to speculative decoding, our method works on black-box models, having the same level of cost-accuracy trade-off, yet providing much more choices based on the server's budget. Ours is the first work to optimize cost-accuracy trade-off for LLM code generation with model cascading. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.13923 [pdf, other]

Why Not Transform Chat Large Language Models to Non-English?

Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4. △ Less

Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13516 [pdf, other]

LIRE: listwise reward enhancement for preference alignment

Authors: Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao

Abstract: Recently, tremendous strides have been made to align the generation of Large Language Models (LLMs) with human values to mitigate toxic or unhelpful content. Leveraging Reinforcement Learning from Human Feedback (RLHF) proves effective and is widely adopted by researchers. However, implementing RLHF is complex, and its sensitivity to hyperparameters renders achieving stable performance and scalabi… ▽ More Recently, tremendous strides have been made to align the generation of Large Language Models (LLMs) with human values to mitigate toxic or unhelpful content. Leveraging Reinforcement Learning from Human Feedback (RLHF) proves effective and is widely adopted by researchers. However, implementing RLHF is complex, and its sensitivity to hyperparameters renders achieving stable performance and scalability challenging. Furthermore, prevailing approaches to preference alignment primarily concentrate on pairwise comparisons, with limited exploration into multi-response scenarios, thereby overlooking the potential richness within the candidate pool. For the above reasons, we propose a new approach: Listwise Reward Enhancement for Preference Alignment (LIRE), a gradient-based reward optimization approach that incorporates the offline rewards of multiple responses into a streamlined listwise framework, thus eliminating the need for online sampling during training. LIRE is straightforward to implement, requiring minimal parameter tuning, and seamlessly aligns with the pairwise paradigm while naturally extending to multi-response scenarios. Moreover, we introduce a self-enhancement algorithm aimed at iteratively refining the reward during training. Our experiments demonstrate that LIRE consistently outperforms existing methods across several benchmarks on dialogue and summarization tasks, with good transferability to out-of-distribution data, assessed using proxy reward models and human annotators. △ Less

Submitted 4 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted by ACL 2024 Findings

arXiv:2405.12950 [pdf]

Emergent Ferromagnetism at LaFeO3/SrTiO3 Interface Arising from Strain-induced Spin-State Transition

Authors: Menglin Zhu, Joseph Lanier, Sevim Polat Genlik, Jose G. Flores, Victor da Cruz Pinha Barbosa, Mohit Randeria, Patrick M. Woodward, Maryam Ghazisaeidi, Fengyuan Yang, **woo Hwang

Abstract: Creating new interfacial magnetic states with desired functionalities is attractive for fundamental studies and spintronics applications. The emergence of interfacial magnetic phases demands the fabrication of pristine interfaces and the characterization and understanding of atomic structure as well as electronic, magnetic, and orbital degrees of freedom at the interface. Here, we report a novel i… ▽ More Creating new interfacial magnetic states with desired functionalities is attractive for fundamental studies and spintronics applications. The emergence of interfacial magnetic phases demands the fabrication of pristine interfaces and the characterization and understanding of atomic structure as well as electronic, magnetic, and orbital degrees of freedom at the interface. Here, we report a novel interfacial insulating ferromagnetic order in antiferromagnetic LaFeO3 grown on SrTiO3, characterized by a combination of electron microscopy and spectroscopy, magnetometry, and density functional theory. The epitaxial strain drives a spin-state disproportionation in the interfacial layer of LaFeO3, which leads to a checkerboard arrangement of low- and high-spin Fe3+ ions inside smaller and larger FeO6 octahedra, respectively. Ferromagnetism at the interface arises from superexchange interactions between the low- and high-spin Fe3+. The detailed understanding of creation of emergent magnetism illustrates the potential of designing and controlling orbital degrees of freedom at the interface to realize novel phases and functionalities for future spin-electronic applications. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12679 [pdf]

Observation of Spin Splitting in Room-Temperature Metallic Antiferromagnet CrSb

Authors: Meng Zeng, Ming-Yuan Zhu, Yu-Peng Zhu, Xiang-Rui Liu, Xiao-Ming Ma, Yu-Jie Hao, Pengfei Liu, Gexing Qu, Yichen Yang, Zhicheng Jiang, Kohei Yamagami, Masashi Arita, Xiaoqian Zhang, Tian-Hao Shao, Yue Dai, Kenya Shimada, Zhengtai Liu, Mao Ye, Yaobo Huang, Qihang Liu, Chang Liu

Abstract: Recently, unconventional antiferromagnets that enable the splitting of electronic spins have been theoretically proposed and experimentally realized, where the magnetic sublattices containing moments pointing at different directions are connected by a novel set of symmetries. Such spin splitting (SS) is substantial, $k$-dependent, and independent of the spin-orbit coupling strength, making these m… ▽ More Recently, unconventional antiferromagnets that enable the splitting of electronic spins have been theoretically proposed and experimentally realized, where the magnetic sublattices containing moments pointing at different directions are connected by a novel set of symmetries. Such spin splitting (SS) is substantial, $k$-dependent, and independent of the spin-orbit coupling strength, making these magnets promising materials for antiferromagnetic spintronics. Here, combined with angle-resolved photoemission spectroscopy (ARPES) and density functional theory (DFT) calculations, we perform a systematic study on CrSb, a metallic spin-split antiferromagnet candidate with $T_N$ = 703 K. Our data reveals the electronic structure of CrSb along both out-of-plane and in-plane momentum directions, which renders anisotropic $k$-dependent SS and agrees well with the calculational results. The magnitude of such SS reaches up to at least 0.8 eV at non-high-symmetry momentum points, which is significantly higher than the largest known SOC-induced SS. This compound expands the choice of materials in the field of antiferromagnetic spintronics and is likely to stimulate subsequent investigations of high-efficiency spintronic devices that are functional at room temperature. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 14 pages, 4 figures

arXiv:2405.10889 [pdf]

Unconventional Unidirectional Magnetoresistance in vdW Heterostructures

Authors: I-Hsuan Kao, Junyu Tang, Gabriel Calderon Ortiz, Menglin Zhu, Sean Yuan, Rahul Rao, Jiahan Li, James H. Edgar, Jiaqiang Yan, David G. Mandrus, Kenji Watanabe, Takashi Taniguchi, **woo Hwang, Ran Cheng, Jyoti Katoch, Simranjeet Singh

Abstract: Electrical readout of magnetic states is a key to realize novel spintronics devices for efficient computing and data storage. Unidirectional magnetoresistance (UMR) in bilayer systems, consisting of a spin source material and a magnetic layer, refers to a change in the longitudinal resistance upon the reversal of magnetization, which typically originates from the interaction of spin-current and ma… ▽ More Electrical readout of magnetic states is a key to realize novel spintronics devices for efficient computing and data storage. Unidirectional magnetoresistance (UMR) in bilayer systems, consisting of a spin source material and a magnetic layer, refers to a change in the longitudinal resistance upon the reversal of magnetization, which typically originates from the interaction of spin-current and magnetization at the interface. Because of UMR s linear dependence on applied charge current and magnetization, it can be used to electrically read the magnetization state. However, in conventional spin source materials, the spin polarization of an electric field induced spin current is restricted to be in the film plane and hence the ensuing UMR can only respond to the in plane component of the magnetization. On the other hand, magnets with perpendicular magnetic anisotropy (PMA) are highly desired for magnetic memory and spin-logic devices, while the electrical read out of PMA magnets through UMR is critically missing. Here, we report the discovery of an unconventional UMR in bilayer heterostructures of a topological semimetal (WTe2) and a PMA ferromagnetic insulator (Cr2Ge2Te6, CGT), which allows to electrically read the up and down magnetic states of the CGT layer by measuring the longitudinal resistance. Our theoretical calculations based on a tight binding model show that the unconventional UMR originates from the interplay of crystal symmetry breaking in WTe2 and magnetic exchange interaction across the WTe2 and CGT interface. Combining with the ability of WTe2 to obtain magnetic field free switching of the PMA magnets, our discoveries open an exciting pathway to achieve two terminal magnetic memory devices that operate solely on the spin orbit torque and UMR, which is critical for develo** next-generation non volatile and low power consumption data storage technologies. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10368 [pdf, other]

Trapped-Ion Quantum Simulation of Electron Transfer Models with Tunable Dissipation

Authors: Visal So, Midhuna Duraisamy Suganthi, Abhishek Menon, Mingjian Zhu, Roman Zhuravel, Han Pu, Peter G. Wolynes, José N. Onuchic, Guido Pagano

Abstract: Electron transfer is at the heart of many fundamental physical, chemical, and biochemical processes essential for life. Exact simulation of reactions in these systems is often hindered by the large number of degrees of freedom and by the essential role of quantum effects. In this work, we experimentally simulate a paradigmatic model of molecular electron transfer using a multi-species trapped-ion… ▽ More Electron transfer is at the heart of many fundamental physical, chemical, and biochemical processes essential for life. Exact simulation of reactions in these systems is often hindered by the large number of degrees of freedom and by the essential role of quantum effects. In this work, we experimentally simulate a paradigmatic model of molecular electron transfer using a multi-species trapped-ion crystal, where the donor-acceptor gap, the electronic and vibronic couplings, and the bath relaxation dynamics can all be controlled independently. We employ the ground-state qubit of one ion to simulate the electronic degree of freedom and the optical qubit of another ion to perform reservoir engineering on a collective mode encoding a reaction coordinate. We observe the real-time dynamics of the spin excitation, measuring the transfer rate in several regimes of adiabaticity and relaxation dynamics. The setup allows access to the electron transfer dynamics in the non-perturbative regime, where there is no clear hierarchy among the energy scales in the model, as has been suggested to be optimal for many rate phenomena, including photosynthesis. Our results provide a testing ground for increasingly rich models of molecular excitation transfer processes that are relevant for molecular electronics and light-harvesting systems. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10185 [pdf, other]

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

Authors: Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen

Abstract: Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy. Most instance segmentation datasets today require costly manual annotation, limiting their data scale. Models trained on such data are prone to overfitting on the training set, especially for those rare categories. While recent works have delved into exploiting generative m… ▽ More Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy. Most instance segmentation datasets today require costly manual annotation, limiting their data scale. Models trained on such data are prone to overfitting on the training set, especially for those rare categories. While recent works have delved into exploiting generative models to create synthetic datasets for data augmentation, these approaches do not efficiently harness the full potential of generative models. To address these issues, we introduce a more efficient strategy to construct generative datasets for data augmentation, termed DiverGen. Firstly, we provide an explanation of the role of generative data from the perspective of distribution discrepancy. We investigate the impact of different data on the distribution learned by the model. We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting. Additionally, we find that the diversity of generative data is crucial for improving model performance and enhance it through various strategies, including category diversity, prompt diversity, and generative model diversity. With these strategies, we can scale the data to millions while maintaining the trend of model performance improvement. On the LVIS dataset, DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR 2024, codes are available at \href{this https URL}{https://github.com/aim-uofa/DiverGen}

arXiv:2405.07827 [pdf, other]

Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

Authors: Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

Abstract: Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage… ▽ More Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted at CVPRw 2024

arXiv:2405.05481 [pdf, other]

Achieving millisecond coherence fluxonium through overlap Josephson junctions

Authors: Fei Wang, Kannan Lu, Huijuan Zhan, Lu Ma, Feng Wu, Hantao Sun, Hao Deng, Yang Bai, Feng Bao, Xu Chang, Ran Gao, Xun Gao, Guicheng Gong, Lijuan Hu, Ruizi Hu, Honghong Ji, Xizheng Ma, Liyong Mao, Zhijun Song, Chengchun Tang, Hongcheng Wang, Tenghui Wang, Ziang Wang, Tian Xia, Hongxin Xu , et al. (10 additional authors not shown)

Abstract: Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephs… ▽ More Fluxonium qubits are recognized for their high coherence times and high operation fidelities, attributed to their unique design incorporating over 100 Josephson junctions per superconducting loop. However, this complexity poses significant fabrication challenges, particularly in achieving high yield and junction uniformity with traditional methods. Here, we introduce an overlap process for Josephson junction fabrication that achieves nearly 100% yield and maintains uniformity across a 2-inch wafer with less than 5% variation for the phase slip junction and less than 2% for the junction array. Our compact junction array design facilitates fluxonium qubits with energy relaxation times exceeding 1 millisecond at the flux frustration point, demonstrating consistency with state-of-the-art dielectric loss tangents and flux noise across multiple devices. This work suggests the scalability of high coherence fluxonium processors using CMOS-compatible processes, marking a significant step towards practical quantum computing. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.03199 [pdf, other]

Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting

Authors: Nannan Bian, Minhong Zhu, Li Chen, Weiran Cai

Abstract: Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Resorting to multi-layer perceptrons (MLPs) provides a compromising solution, yet they suffer from two critical problems caused by the intrinsic point-wise map** mode, in terms of deficient contextua… ▽ More Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Resorting to multi-layer perceptrons (MLPs) provides a compromising solution, yet they suffer from two critical problems caused by the intrinsic point-wise map** mode, in terms of deficient contextual dependencies and inadequate information bottleneck. Here, we propose the Coarsened Perceptron Network (CP-Net), featured by a coarsening strategy that alleviates the above problems associated with the prototype MLPs by forming information granules in place of solitary temporal points. The CP-Net utilizes primarily a two-stage framework for extracting semantic and contextual patterns, which preserves correlations over larger timespans and filters out volatile noises. This is further enhanced by a multi-scale setting, where patterns of diverse granularities are fused towards a comprehensive prediction. Based purely on convolutions of structural simplicity, CP-Net is able to maintain a linear computational complexity and low runtime, while demonstrates an improvement of 4.1% compared with the SOTA method on seven forecasting benchmarks. △ Less

Submitted 20 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.02791 [pdf, other]

Efficient Text-driven Motion Generation via Latent Consistency Training

Authors: Mengxian Hu, Minghao Zhu, Xun Zhou, Qingqing Yan, Shu Li, Chengju Liu, Qijun Chen

Abstract: Motion diffusion models excel at text-driven motion generation but struggle with real-time inference since motion sequences are time-axis redundant and solving reverse diffusion trajectory involves tens or hundreds of sequential iterations. In this paper, we propose a Motion Latent Consistency Training (MLCT) framework, which allows for large-scale skip sampling of compact motion latent representa… ▽ More Motion diffusion models excel at text-driven motion generation but struggle with real-time inference since motion sequences are time-axis redundant and solving reverse diffusion trajectory involves tens or hundreds of sequential iterations. In this paper, we propose a Motion Latent Consistency Training (MLCT) framework, which allows for large-scale skip sampling of compact motion latent representation by constraining the consistency of the outputs of adjacent perturbed states on the precomputed trajectory. In particular, we design a flexible motion autoencoder with quantization constraints to guarantee the low-dimensionality, succinctness, and boundednes of the motion embedding space. We further present a conditionally guided consistency training framework based on conditional trajectory simulation without additional pre-training diffusion model, which significantly improves the conditional generation performance with minimal training cost. Experiments on two benchmarks demonstrate our model's state-of-the-art performance with an 80\% inference cost saving and around 14 ms on a single RTX 4090 GPU. △ Less

Submitted 25 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

arXiv:2405.00435 [pdf, other]

CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model

Authors: Wei Zhang, Wong Kam-Kwai, Biying Xu, Yiwen Ren, Yuhuai Li, Minfeng Zhu, Yingchaojie Feng, Wei Chen

Abstract: The integration of new technology with cultural studies enhances our understanding of cultural heritage but often struggles to connect with diverse audiences. It is challenging to align personal interpretations with the intended meanings across different cultures. Our study investigates the important factors in appreciating art from a cross-cultural perspective. We explore the application of Large… ▽ More The integration of new technology with cultural studies enhances our understanding of cultural heritage but often struggles to connect with diverse audiences. It is challenging to align personal interpretations with the intended meanings across different cultures. Our study investigates the important factors in appreciating art from a cross-cultural perspective. We explore the application of Large Language Models (LLMs) to bridge the cultural and language barriers in understanding Traditional Chinese Paintings (TCPs). We present CultiVerse, a visual analytics system that utilizes LLMs within a mixed-initiative framework, enhancing interpretative appreciation of TCP in a cross-cultural dialogue. CultiVerse addresses the challenge of translating the nuanced symbolism in art, which involves interpreting complex cultural contexts, aligning cross-cultural symbols, and validating cultural acceptance. CultiVerse integrates an interactive interface with the analytical capability of LLMs to explore a curated TCP dataset, facilitating the analysis of multifaceted symbolic meanings and the exploration of cross-cultural serendipitous discoveries. Empirical evaluations affirm that CultiVerse significantly improves cross-cultural understanding, offering deeper insights and engaging art appreciation. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00026 [pdf]

Enhancing Credit Card Fraud Detection A Neural Network and SMOTE Integrated Approach

Authors: Mengran Zhu, Ye Zhang, Yulu Gong, Changxin Xu, Yafei Xiang

Abstract: Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transact… ▽ More Credit card fraud detection is a critical challenge in the financial sector, demanding sophisticated approaches to accurately identify fraudulent transactions. This research proposes an innovative methodology combining Neural Networks (NN) and Synthet ic Minority Over-sampling Technique (SMOTE) to enhance the detection performance. The study addresses the inherent imbalance in credit card transaction data, focusing on technical advancements for robust and precise fraud detection. Results demonstrat e that the integration of NN and SMOTE exhibits superior precision, recall, and F1-score compared to traditional models, highlighting its potential as an advanced solution for handling imbalanced datasets in credit card fraud detection scenarios. This rese arch contributes to the ongoing efforts to develop effective and efficient mechanisms for safeguarding financial transactions from fraudulent activities. △ Less

Submitted 26 February, 2024; originally announced May 2024.

arXiv:2404.18304 [pdf, other]

Retrieval-Oriented Knowledge for Click-Through Rate Prediction

Authors: Huanshuo Liu, Bo Chen, Menghui Zhu, Jianghao Lin, Jiarui Qin, Yang Yang, Hao Zhang, Ruiming Tang

Abstract: Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-… ▽ More Click-through rate (CTR) prediction plays an important role in personalized recommendations. Recently, sample-level retrieval-based models (e.g., RIM) have achieved remarkable performance by retrieving and aggregating relevant samples. However, their inefficiency at the inference stage makes them impractical for industrial applications. To overcome this issue, this paper proposes a universal plug-and-play Retrieval-Oriented Knowledge (ROK) framework. Specifically, a knowledge base, consisting of a retrieval-oriented embedding layer and a knowledge encoder, is designed to preserve and imitate the retrieved & aggregated representations in a decomposition-reconstruction paradigm. Knowledge distillation and contrastive learning methods are utilized to optimize the knowledge base, and the learned retrieval-enhanced representations can be integrated with arbitrary CTR models in both instance-wise and feature-wise manners. Extensive experiments on three large-scale datasets show that ROK achieves competitive performance with the retrieval-based CTR models while reserving superior inference efficiency and model compatibility. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.16152 [pdf, ps, other]

Rethinking Grant-Free Protocol in mMTC

Authors: Minhao Zhu, Yifei Sun, Lizhao You, Zhaorui Wang, Ya-Feng Liu, Shuguang Cui

Abstract: This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a… ▽ More This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a fixed length of preamble to devices for identity detection as it lacks the prior information on the number of active devices $K$. However, in practice, $K$ varies dynamically over time, resulting in degraded identity detection performance especially when $K$ is large. Consequently, the current grant-free protocol fails to ensure stable identity detection performance. To address this issue, we propose a two-stage communication protocol which consists of estimation of $K$ in Phase I and detection of identities of active devices in Phase II. The preamble length for identity detection in Phase II is dynamically allocated based on the estimated $K$ in Phase I through a table lookup manner such that the identity detection performance could always be better than a predefined threshold. In addition, we design an algorithm for estimating $K$ in Phase I, and exploit the estimated $K$ to reduce the computational complexity of the identity detector in Phase II. Numerical results demonstrate the effectiveness of the proposed two-stage communication protocol and algorithms. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE for possible publication

arXiv:2404.16023 [pdf, other]

Learning Car-Following Behaviors Using Bayesian Matrix Normal Mixture Regression

Authors: Chengyuan Zhang, Kehua Chen, Meixin Zhu, Hai Yang, Lijun Sun

Abstract: Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that s… ▽ More Learning and understanding car-following (CF) behaviors are crucial for microscopic traffic simulation. Traditional CF models, though simple, often lack generalization capabilities, while many data-driven methods, despite their robustness, operate as "black boxes" with limited interpretability. To bridge this gap, this work introduces a Bayesian Matrix Normal Mixture Regression (MNMR) model that simultaneously captures feature correlations and temporal dynamics inherent in CF behaviors. This approach is distinguished by its separate learning of row and column covariance matrices within the model framework, offering an insightful perspective into the human driver decision-making processes. Through extensive experiments, we assess the model's performance across various historical steps of inputs, predictive steps of outputs, and model complexities. The results consistently demonstrate our model's adeptness in effectively capturing the intricate correlations and temporal dynamics present during CF. A focused case study further illustrates the model's outperforming interpretability of identifying distinct operational conditions through the learned mean and covariance matrices. This not only underlines our model's effectiveness in understanding complex human driving behaviors in CF scenarios but also highlights its potential as a tool for enhancing the interpretability of CF behaviors in traffic simulations and autonomous driving systems. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 6 pages, Accepted by the 35th IEEE Intelligent Vehicles Symposium

Showing 1–50 of 860 results for author: Zhu, M