Search | arXiv e-print repository

HEOM-QUICK2: a general-purpose simulator for fermionic many-body open quantum systems -- An Update

Authors: Daochi Zhang, Lyuzhou Ye, Jiaan Cao, Yao Wang, Rui-Xue Xu, Xiao Zheng, Yi**g Yan

Abstract: Many-body open quantum systems (OQS) have a profound impact on various subdisciplines of physics, chemistry, and biology. Thus, the development of a computer program capable of accurately, efficiently, and versatilely simulating many-body OQS is highly desirable. In recent years, we have focused on the advancement of numerical algorithms based on the fermionic hierarchical equations of motion (HEO… ▽ More Many-body open quantum systems (OQS) have a profound impact on various subdisciplines of physics, chemistry, and biology. Thus, the development of a computer program capable of accurately, efficiently, and versatilely simulating many-body OQS is highly desirable. In recent years, we have focused on the advancement of numerical algorithms based on the fermionic hierarchical equations of motion (HEOM) theory. Being in-principle exact, this approach allows for the precise characterization of many-body correlations, non-Markovian memory, and non-equilibrium thermodynamic conditions. These efforts now lead to the establishment of a new computer program, HEOM for QUantum Impurity with a Correlated Kernel, version 2 (HEOM-QUICK2), which, to the best of our knowledge, is currently the only general-purpose simulator for fermionic many-body OQS. Compared with version 1, the HEOM-QUICK2 program features more efficient solvers for stationary states, more accurate treatment of non-Markovian memory, and improved numerical stability for long-time dissipative dynamics. Integrated with quantum chemistry software, HEOM-QUICK2 has become a valuable theoretical tool for the precise simulation of realistic many-body OQS, particularly the single atomic or molecular junctions. Furthermore, the unprecedented precision achieved by HEOM-QUICK2 enables accurate simulation of low-energy spin excitations and coherent spin relaxation. The unique usefulness of HEOM-QUICK2 is demonstrated through several examples of strongly correlated quantum impurity systems under non-equilibrium conditions. Thus, the new HEOM-QUICK2 program offers a powerful and comprehensive tool for studying many-body OQS with exotic quantum phenomena and exploring applications in various disciplines. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: 22 pages; 9 figures

arXiv:2401.01507 [pdf]

Real-space hole-do** titration and manipulation of correlated charge density wave state in 1T-TaS2

Authors: Haoyu Dong, Yanyan Geng, Jianfeng Guo, Le Lei, Yan Li, Li Huang, Fei Pang, Rui Xu, Weiqiang Yu, Wei Ji, Hong-Jun Gao, Weichang Zhou, Zhihai Cheng

Abstract: The complex correlated charge density wave (CDW) phases of 1T-TaS2 have attracted great attention due to their emergent quantum states, such as intricate CDW phase, Mott-Hubbard state, superconductivity and quantum spin liquid. The delicate interplay among the complex intra-/inter-layer electron-electron and electron-lattice interactions is the fundamental prerequisite of these exotic quantum stat… ▽ More The complex correlated charge density wave (CDW) phases of 1T-TaS2 have attracted great attention due to their emergent quantum states, such as intricate CDW phase, Mott-Hubbard state, superconductivity and quantum spin liquid. The delicate interplay among the complex intra-/inter-layer electron-electron and electron-lattice interactions is the fundamental prerequisite of these exotic quantum states. Here, we report a real-space titration-like investigation of correlated CDW state in 1T-TaS2 upon hole-do** via low-temperature scanning tunneling microscopy (LT-STM). The gradual increased hole-do** results in the sequential emergence of electron voids, phase domains, stacking disordering and mixed phase/chiral domains attributed to the reduced electron correlations. The achiral intermediate ring-like clusters and nematic CDW states emerge at the intralayer chiral domain wall and interlayer heterochiral stacking regions via the chiral-overlap** configurations. The local reversible CDW manipulation is further realized by the non-equilibrium transient charge-injections of STM field-emission spectra. Our results provide an in-depth insight of this intricate correlated CDW state, and pave a way to realize exotic quantum states via the accurate tuning of interior interactions in correlated materials. △ Less

Submitted 21 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.00921 [pdf, other]

Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence

Authors: Ruizhuo Xu, Linzhi Huang, Mei Wang, Jiani Hu, Weihong Deng

Abstract: Self-supervised pre-training paradigms have been extensively explored in the field of skeleton-based action recognition. In particular, methods based on masked prediction have pushed the performance of pre-training to a new height. However, these methods take low-level features, such as raw joint coordinates or temporal motion, as prediction targets for the masked regions, which is suboptimal. In… ▽ More Self-supervised pre-training paradigms have been extensively explored in the field of skeleton-based action recognition. In particular, methods based on masked prediction have pushed the performance of pre-training to a new height. However, these methods take low-level features, such as raw joint coordinates or temporal motion, as prediction targets for the masked regions, which is suboptimal. In this paper, we show that using high-level contextualized features as prediction targets can achieve superior performance. Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework, which utilizes a transformer-based teacher encoder taking unmasked training samples as input to create latent contextualized representations as prediction targets. Benefiting from the self-attention mechanism, the latent representations generated by the teacher encoder can incorporate the global context of the entire training samples, leading to a richer training task. Additionally, considering the high temporal correlations in skeleton sequences, we propose a motion-aware tube masking strategy which divides the skeleton sequence into several tubes and performs persistent masking within each tube based on motion priors, thus forcing the model to build long-range spatio-temporal connections and focus on action-semantic richer regions. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets demonstrate that our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: Submitted to CVPR 2024

arXiv:2401.00719 [pdf, other]

Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face Recognition

Authors: Ruizhuo Xu, Ke Wang, Chao Deng, Mei Wang, Xi Chen, Wenhui Huang, Junlan Feng, Weihong Deng

Abstract: With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance th… ▽ More With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance the quality of facial depth images for low-quality 3D FR. After generating clean depth faces using DMDNet, we further design a powerful recognition network called Lightweight Depth and Normal Fusion network (LDNFNet), which incorporates a multi-branch fusion block to learn unique and complementary features between different modalities such as depth and normal images. Comprehensive experiments conducted on four distinct low-quality databases demonstrate the effectiveness and robustness of our proposed methods. Furthermore, when combining DMDNet and LDNFNet, we achieve state-of-the-art results on the Lock3DFace database. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: Accepted by Pattern Recognition

arXiv:2401.00569 [pdf, other]

Decision Making under Costly Sequential Information Acquisition: the Paradigm of Reversible and Irreversible Decisions

Authors: Renyuan Xu, Thaleia Zariphopoulou, Luhao Zhang

Abstract: Decision making in modern stochastic systems, including e-commerce platforms, financial markets, and healthcare systems, has evolved into a multifaceted process that involves information acquisition and adaptive information sources. This paper initiates a study on this integrated process, where these elements are not only fundamental but also interact in a complex and dynamically intertwined manne… ▽ More Decision making in modern stochastic systems, including e-commerce platforms, financial markets, and healthcare systems, has evolved into a multifaceted process that involves information acquisition and adaptive information sources. This paper initiates a study on this integrated process, where these elements are not only fundamental but also interact in a complex and dynamically intertwined manner. We introduce a relatively simple model, which, however, captures the novel elements we consider. Specifically, a decision maker (DM) can choose between an established product $A$ with a known value and a new product $B$ with an unknown value. The DM can observe signals about the unknown value of product $B$ and can also opt to exchange it for product $A$ if $B$ is initially chosen. Mathematically, the model gives rise to a sequential optimal stop** problem with two different informational regimes (before and after buying product $B$), differentiated by the initial, coarser signal and the subsequent, finer one. We analyze the underlying problems using predominantly viscosity solution techniques, differing from the existing literature on information acquisition which is based on traditional optimal stop** techniques. Additionally, our modeling approach offers a novel framework for develo** more complex interactions among decisions, information sources, and information costs through a sequence of nested obstacles. △ Less

Submitted 10 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

arXiv:2401.00424 [pdf, other]

SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection

Authors: Shijue Huang, Libo Qin, Bingbing Wang, Geng Tu, Ruifeng Xu

Abstract: Multi-modal intent detection aims to utilize various modalities to understand the user's intentions, which is essential for the deployment of dialogue systems in real-world scenarios. The two core challenges for multi-modal intent detection are (1) how to effectively align and fuse different features of modalities and (2) the limited labeled multi-modal intent training data. In this work, we intro… ▽ More Multi-modal intent detection aims to utilize various modalities to understand the user's intentions, which is essential for the deployment of dialogue systems in real-world scenarios. The two core challenges for multi-modal intent detection are (1) how to effectively align and fuse different features of modalities and (2) the limited labeled multi-modal intent training data. In this work, we introduce a shallow-to-deep interaction framework with data augmentation (SDIF-DA) to address the above challenges. Firstly, SDIF-DA leverages a shallow-to-deep interaction module to progressively and effectively align and fuse features across text, video, and audio modalities. Secondly, we propose a ChatGPT-based data augmentation approach to automatically augment sufficient training data. Experimental results demonstrate that SDIF-DA can effectively align and fuse multi-modal features by achieving state-of-the-art performance. In addition, extensive analyses show that the introduced data augmentation approach can successfully distill knowledge from the large language model. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2312.17694 [pdf, other]

doi 10.1038/s41534-024-00852-7

Map** of valley-splitting by conveyor-mode spin-coherent electron shuttling

Authors: Mats Volmer, Tom Struck, Arnau Sala, Bingjie Chen, Max Oberländer, Tobias Offermann, Ran Xue, Lino Visser, Jhih-Sian Tu, Stefan Trellenkamp, Łukasz Cywiński, Hendrik Bluhm, Lars R. Schreiber

Abstract: In Si/SiGe heterostructures, the low-lying excited valley state seriously limit operability and scalability of electron spin qubits. For characterizing and understanding the local variations in valley splitting, fast probing methods with high spatial and energy resolution are lacking. Leveraging the spatial control granted by conveyor-mode spin-coherent electron shuttling, we introduce a method fo… ▽ More In Si/SiGe heterostructures, the low-lying excited valley state seriously limit operability and scalability of electron spin qubits. For characterizing and understanding the local variations in valley splitting, fast probing methods with high spatial and energy resolution are lacking. Leveraging the spatial control granted by conveyor-mode spin-coherent electron shuttling, we introduce a method for two-dimensional map** of the local valley splitting by detecting magnetic field dependent anticrossings of ground and excited valley states using entangled electron spin-pairs as a probe. The method has sub-μeV energy accuracy and a nanometer lateral resolution. The histogram of valley splittings spanning a large area of 210 nm by 18 nm matches well with statistics obtained by the established but time-consuming magnetospectroscopy method. For the specific heterostructure, we find a nearly Gaussian distribution of valley splittings and a correlation length similar to the quantum dot size. Our map** method may become a valuable tool for engineering Si/SiGe heterostructures for scalable quantum computing. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: 17 pages, 11 Figures

arXiv:2312.16170 [pdf, other]

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Authors: Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang

Abstract: In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions. This necessitates the ability to fully understand 3D scenes given their first-person observations and contextualize them into language for interaction. However, traditional research focuses more on scene-level input and output setups from a global view. To addre… ▽ More In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions. This necessitates the ability to fully understand 3D scenes given their first-person observations and contextualize them into language for interaction. However, traditional research focuses more on scene-level input and output setups from a global view. To address the gap, we introduce EmbodiedScan, a multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. It encompasses over 5k scans encapsulating 1M ego-centric RGB-D views, 1M language prompts, 160k 3D-oriented boxes spanning over 760 categories, some of which partially align with LVIS, and dense semantic occupancy with 80 common categories. Building upon this database, we introduce a baseline framework named Embodied Perceptron. It is capable of processing an arbitrary number of multi-modal inputs and demonstrates remarkable 3D perception capabilities, both within the two series of benchmarks we set up, i.e., fundamental 3D perception tasks and language-grounded tasks, and in the wild. Codes, datasets, and benchmarks will be available at https://github.com/OpenRobotLab/EmbodiedScan. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: A multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. Project page: http://tai-wang.github.io/embodiedscan

arXiv:2312.15918 [pdf, other]

Supervised Knowledge Makes Large Language Models Better In-context Learners

Authors: Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, **dong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang

Abstract: Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While… ▽ More Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While previous in-context learning research has focused on enhancing models to adhere to users' specific instructions and quality expectations, and to avoid undesired outputs, little to no work has explored the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs' in-context learning during the inference stage. Our primary contribution is the establishment of a simple yet effective framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks. Using our proposed plug-in method, enhanced versions of Llama 2 and ChatGPT surpass their original versions regarding generalizability and factuality. We offer a comprehensive suite of resources, including 16 curated datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks. The code and data are released at: https://github.com/YangLinyi/Supervised-Knowledge-Makes-Large-Language-Models-Better-In-context-Learners. Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs. △ Less

Submitted 11 April, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: Accepted to ICLR 2024

arXiv:2312.13618 [pdf, other]

Generalized system-bath entanglement theorem for Gaussian environments

Authors: Yu Su, Yao Wang, Rui-Xue Xu, Yi**g Yan

Abstract: A system-bath entanglement theorem (SBET) with Gaussian environments was established previously in J. Chem. Phys. 152, 034102 (2020) in terms of linear response functions. This theorem connects the system-bath entanglement responses to the local system and bare bath ones. In this work, we generalize it to correlation functions. Key steps in derivation are the generalized Langevin dynamics for the… ▽ More A system-bath entanglement theorem (SBET) with Gaussian environments was established previously in J. Chem. Phys. 152, 034102 (2020) in terms of linear response functions. This theorem connects the system-bath entanglement responses to the local system and bare bath ones. In this work, we generalize it to correlation functions. Key steps in derivation are the generalized Langevin dynamics for the hybridizing bath modes as in the previous work, together with the Bogoliubov transformation map** the original finite-temperature canonical reservoir to an effective zero-temperature vacuum via an auxiliary bath. With the theorem, the system-bath entangled correlations and the bath modes correlations in the full composite space can be evaluated as long as the bare-bath statistical properties are known and the reduced system correlations are obtained. Numerical demonstrations are carried out for the evaluation of the solvation free energy of an electron transfer system with a certain intramolecular vibrational modes. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 9 pages, 3 figures

arXiv:2312.12754 [pdf, other]

Spectral Prompt Tuning:Unveiling Unseen Classes for Zero-Shot Semantic Segmentation

Authors: Wenhao Xu, Rongtao Xu, Changwei Wang, Shibiao Xu, Li Guo, Man Zhang, Xiaopeng Zhang

Abstract: Recently, CLIP has found practical utility in the domain of pixel-level zero-shot segmentation tasks. The present landscape features two-stage methodologies beset by issues such as intricate pipelines and elevated computational costs. While current one-stage approaches alleviate these concerns and incorporate Visual Prompt Training (VPT) to uphold CLIP's generalization capacity, they still fall sh… ▽ More Recently, CLIP has found practical utility in the domain of pixel-level zero-shot segmentation tasks. The present landscape features two-stage methodologies beset by issues such as intricate pipelines and elevated computational costs. While current one-stage approaches alleviate these concerns and incorporate Visual Prompt Training (VPT) to uphold CLIP's generalization capacity, they still fall short in fully harnessing CLIP's potential for pixel-level unseen class demarcation and precise pixel predictions. To further stimulate CLIP's zero-shot dense prediction capability, we propose SPT-SEG, a one-stage approach that improves CLIP's adaptability from image to pixel. Specifically, we initially introduce Spectral Prompt Tuning (SPT), incorporating spectral prompts into the CLIP visual encoder's shallow layers to capture structural intricacies of images, thereby enhancing comprehension of unseen classes. Subsequently, we introduce the Spectral Guided Decoder (SGD), utilizing both high and low-frequency information to steer the network's spatial focus towards more prominent classification features, enabling precise pixel-level prediction outcomes. Through extensive experiments on two public datasets, we demonstrate the superiority of our method over state-of-the-art approaches, performing well across all classes and particularly excelling in handling unseen classes. Code is available at:https://github.com/clearxu/SPT. △ Less

Submitted 2 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: AAAI2024 Accepted

arXiv:2312.12480 [pdf, other]

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

Authors: Jiaming Liu, Ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

Abstract: Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions, addressing real-world dynamism. Existing CTTA methods mainly rely on entropy minimization or teacher-student pseudo-labeling schemes for knowledge extraction in unlabeled target domains. However, dynamic data distributions cause miscalibrated predictions and noisy p… ▽ More Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions, addressing real-world dynamism. Existing CTTA methods mainly rely on entropy minimization or teacher-student pseudo-labeling schemes for knowledge extraction in unlabeled target domains. However, dynamic data distributions cause miscalibrated predictions and noisy pseudo-labels in existing self-supervised learning methods, hindering the effective mitigation of error accumulation and catastrophic forgetting problems during the continual adaptation process. To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts. Specifically, we propose a Distribution-aware Masking (DaM) mechanism to adaptively sample masked positions, followed by establishing consistency constraints between the masked target samples and the original target samples. Additionally, for masked tokens, we utilize an efficient decoder to reconstruct a hand-crafted feature descriptor (e.g., Histograms of Oriented Gradients), leveraging its invariant properties to boost task-relevant representations. Through conducting extensive experiments on four widely recognized benchmarks, our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks. Our project page: https://sites.google.com/view/continual-mae/home. △ Less

Submitted 27 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR2024

arXiv:2312.11984 [pdf, other]

PSR B0943+10: Mode Switch, Polar Cap Geometry, and Orthogonally Polarized Radiation

Authors: Shunshun Cao, **chen Jiang, Jaroslaw Dyks, Longfei Hao, Kejia Lee, Zhixuan Li, Jiguang Lu, Zhichen Pan, Weiyang Wang, Zhengli Wang, Jiangwei Xu, Heng Xu, Renxin Xu

Abstract: As one of the paradigm examples to probe into pulsar magnetospheric dynamics, PSR B0943+10 (J0946+0951) manifests representatively, showing mode switch, orthogonal polarization and subpulse drifting. Both integrated and single pulses are studied with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The mode switch phenomenon of this pulsar is studied using an eigen-mode searching… ▽ More As one of the paradigm examples to probe into pulsar magnetospheric dynamics, PSR B0943+10 (J0946+0951) manifests representatively, showing mode switch, orthogonal polarization and subpulse drifting. Both integrated and single pulses are studied with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The mode switch phenomenon of this pulsar is studied using an eigen-mode searching method, based on parameter estimation. A phase space evolution for the pulsar's mode switch shows a strange-attractor-like pattern. The radiative geometry is proposed by fitting polarization position angles with the rotating vector model. The pulsar pulse profile is then mapped to the sparking location on pulsar surface, and the differences between the main pulse's and the precursor component's radiative process may explain the X-ray's synchronization with radio mode switch. Detailed single pulse studies on B0943+10's orthogonally polarized radiation are presented, which may support for certain models of radiative transfer of polarized emission. B0943+10's B and Q modes evolve differently with frequency and with proportions of orthogonal modes, which indicates possible magnetospheric changes during mode switch. An extra component is found in B mode, and it shows distinct polarization and modulation properties compared with main part of B mode pulse component. For Q mode pulse profile, the precursor and the main pulse components are orthogonally polarized, showing that the precursor component radiated farther from the pulsar could be radiated in O-mode (X-mode) if the main pulse originates from low altitude in X-mode (O-mode). The findings could impact significantly on pulsar electrodynamics and the radiative mechanism related. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 27 pages, 28 figures, 2 tables, submitted to ApJ

arXiv:2312.09085 [pdf, other]

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation

Authors: Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, Han Qiu

Abstract: Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particula… ▽ More Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies. △ Less

Submitted 31 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted to ACL'24 (Main). Camera-ready version

arXiv:2312.08935 [pdf, other]

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Authors: Peiyi Wang, Lei Li, Zhihong Shao, R. X. Xu, Damai Dai, Yifei Li, Deli Chen, Y. Wu, Zhifang Sui

Abstract: In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of… ▽ More In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for reranking multiple outputs generated by Large Language Models (LLMs); 2) \textit{Reinforcement Learning}: Math-Shepherd is employed to reinforce LLMs with step-by-step Proximal Policy Optimization (PPO). With Math-Shepherd, a series of open-source LLMs demonstrates exceptional performance. For instance, the step-by-step PPO with Math-Shepherd significantly improves the accuracy of Mistral-7B (77.9\%$\to$84.1\% on GSM8K and 28.6\%$\to$33.0\% on MATH). The accuracy can be further enhanced to 89.1\% and 43.5\% on GSM8K and MATH with the verification of Math-Shepherd, respectively. We believe that automatic process supervision holds significant potential for the future evolution of LLMs. △ Less

Submitted 19 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Add Step-by-Step reinforcement learning results

arXiv:2312.06722 [pdf, other]

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning

Authors: Yi Chen, Yuying Ge, Yixiao Ge, Mingyu Ding, Bohao Li, Rui Wang, Ruifeng Xu, Ying Shan, Xihui Liu

Abstract: The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and s… ▽ More The pursuit of artificial general intelligence (AGI) has been accelerated by Multimodal Large Language Models (MLLMs), which exhibit superior reasoning, generalization capabilities, and proficiency in processing multimodal inputs. A crucial milestone in the evolution of AGI is the attainment of human-level planning, a fundamental ability for making informed decisions in complex environments, and solving a wide range of real-world problems. Despite the impressive advancements in MLLMs, a question remains: How far are current MLLMs from achieving human-level planning? To shed light on this question, we introduce EgoPlan-Bench, a comprehensive benchmark to evaluate the planning abilities of MLLMs in real-world scenarios from an egocentric perspective, mirroring human perception. EgoPlan-Bench emphasizes the evaluation of planning capabilities of MLLMs, featuring realistic tasks, diverse action plans, and intricate visual observations. Our rigorous evaluation of a wide range of MLLMs reveals that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. To facilitate this advancement, we further present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench. We have made all codes, data, and a maintained benchmark leaderboard available to advance future research. △ Less

Submitted 11 June, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Project released at: https://github.com/ChenYi99/EgoPlan

arXiv:2312.05510 [pdf, other]

doi 10.1002/asna.20230153

Repeating FRBs reveal the secret of pulsar magnetospheric activity

Authors: Renxin Xu, Weiyang Wang

Abstract: The puzzling mechanism of coherent radio emission remains unknown, but fortunately, repeating fast radio bursts (FRBs) provide a precious opportunity, with extremely bright subpulses created in a clear and vacuum-like pulsar magnetosphere. FRBs are millisecond-duration signals that are highly dispersed at distant galaxies but with uncertain physical origin(s). Coherent curvature radiation by bunch… ▽ More The puzzling mechanism of coherent radio emission remains unknown, but fortunately, repeating fast radio bursts (FRBs) provide a precious opportunity, with extremely bright subpulses created in a clear and vacuum-like pulsar magnetosphere. FRBs are millisecond-duration signals that are highly dispersed at distant galaxies but with uncertain physical origin(s). Coherent curvature radiation by bunches has already been proposed for repeating FRBs. The charged particles are created during central star's quakes, which can form bunches streaming out along curved magnetic field lines, so as to trigger FRBs. The nature of narrow-band radiation with time-frequency drifting can be a natural consequence that bunches could be observed at different times with different curvatures. Additionally, high linear-polarization can be seen if the line of sight is confined to the beam angle, whereas the emission could be highly circular-polarized if off-beam. It is also discussed that pulsar surface may be full of small hills (i.e., zits) which would help producing bulk of energetic bunches for repeating FRBs as well as for rotation-powered pulsars. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 7 pages, 2 figures, published in AN

arXiv:2312.04737 [pdf, other]

Efficient Large Language Models Fine-Tuning On Graphs

Authors: Rui Xue, Xipeng Shen, Ruozhou Yu, Xiaorui Liu

Abstract: Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications. The rapid evolution of large language models (LLMs) has revolutionized the way we process textual data, which indicates a strong potential to replace shallow text embedding generally used in Graph Neural Networks (GNNs). However, we find that existing LLM approaches tha… ▽ More Learning from Text-Attributed Graphs (TAGs) has attracted significant attention due to its wide range of real-world applications. The rapid evolution of large language models (LLMs) has revolutionized the way we process textual data, which indicates a strong potential to replace shallow text embedding generally used in Graph Neural Networks (GNNs). However, we find that existing LLM approaches that exploit text information in graphs suffer from inferior computation and data efficiency. In this work, we introduce a novel and efficient approach for the end-to-end fine-tuning of Large Language Models (LLMs) on TAGs, named LEADING. The proposed approach maintains computation cost and memory overhead comparable to the graph-less fine-tuning of LLMs. Moreover, it transfers the rick knowledge in LLMs to downstream graph learning tasks effectively with limited labeled data in semi-supervised learning. Its superior computation and data efficiency are demonstrated through comprehensive experiments, offering a promising solution for a wide range of LLMs and graph learning tasks on TAGs. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04418 [pdf, other]

MIST: An Efficient Approach for Software-Defined Multicast in Wireless Mesh Networks

Authors: Rupei Xu, Yuming Jiang, Jason P. Jue

Abstract: Multicasting is a vital information dissemination technique in Software-Defined Networking (SDN). With SDN, a multicast service can incorporate network functions implemented at different nodes, which is referred to as software-defined multicast. Emerging ubiquitous wireless networks for 5G and Beyond (B5G) inherently support multicast. However, the broadcast nature of wireless channels, especially… ▽ More Multicasting is a vital information dissemination technique in Software-Defined Networking (SDN). With SDN, a multicast service can incorporate network functions implemented at different nodes, which is referred to as software-defined multicast. Emerging ubiquitous wireless networks for 5G and Beyond (B5G) inherently support multicast. However, the broadcast nature of wireless channels, especially in dense deployments, leads to neighborhood interference as a primary system degradation factor, which introduces a new challenge for software-defined multicast in wireless mesh networks. To tackle this, this paper introduces a novel approach, based on the idea of minimizing both the total length cost of the multicast tree and the interference at the same time. Accordingly, a bicriteria optimization problem is formulated, which is called \emph{Minimum Interference Steiner Tree (MIST)}. To solve the bicriteria problem, instead of resorting to heuristics, this paper employs an innovative approach that is an approximate algorithm for MIST but with guaranteed performance. Specifically, the approach is a two-stage relaxation algorithm by exploiting the monotone submodularity property of the interference metric and identifying Pareto optimal solutions for MIST. Simulation results demonstrate and validate the performance of the proposed algorithm. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.03814 [pdf, other]

Pearl: A Production-ready Reinforcement Learning Agent

Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

Abstract: Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety const… ▽ More Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage. Pearl is open sourced on Github at github.com/facebookresearch/pearl and its official website is located at pearlagent.github.io. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.03200 [pdf, other]

Dynamics of a $2$-dimensional slow-fast Belousov-Zabotinsky model

Authors: Ruihan Xu, Ming Sun, Xiang Zhang

Abstract: For the reduced two-dimensional Belousov-Zhabotinsky slow-fast differential system, the known results are the existence of one limit cycle and its stability for particular values of the parameters. Here, we characterize all dynamics of this system except one degenerate case. The results include global stability of the positive equilibrium, supercritical and subcritical Hopf bifurcations, the exist… ▽ More For the reduced two-dimensional Belousov-Zhabotinsky slow-fast differential system, the known results are the existence of one limit cycle and its stability for particular values of the parameters. Here, we characterize all dynamics of this system except one degenerate case. The results include global stability of the positive equilibrium, supercritical and subcritical Hopf bifurcations, the existence of a canard explosion and relaxation oscillation, and the coexistence of one nest of two limit cycles with the outer one originating from the supercritical Hopf bifurcation at one canard point and the inner one from the subcritical Hopf bifurcation at another canard point. This last one is a new dynamical phenomenon. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 16 pages, 3 figures, 2 tables

MSC Class: 37N25; 34D23; 37C75; 34C26; 34C60

arXiv:2312.02486 [pdf, other]

doi 10.1088/1475-7516/2024/04/087

Probing the vector charge of Sagittarius A* with pulsar timing

Authors: Zexin Hu, Li**g Shao, Rui Xu, Dicong Liang, Zhan-Feng Mai

Abstract: Timing a pulsar orbiting around Sagittarius A* (Sgr A*) can provide us with a unique opportunity of testing gravity theories. We investigate the detectability of a vector charge carried by the Sgr A* black hole (BH) in the bumblebee gravity model with simulated future pulsar timing observations. The spacetime of a bumblebee BH introduces characteristic changes to the orbital dynamics of the pulsar… ▽ More Timing a pulsar orbiting around Sagittarius A* (Sgr A*) can provide us with a unique opportunity of testing gravity theories. We investigate the detectability of a vector charge carried by the Sgr A* black hole (BH) in the bumblebee gravity model with simulated future pulsar timing observations. The spacetime of a bumblebee BH introduces characteristic changes to the orbital dynamics of the pulsar and the light propagation of radio signals. Assuming a timing precision of 1 ms, our simulation shows that a 5-yr observation of a pulsar with an orbital period $P_b\sim 0.5\,{\rm yr}$ and an orbital eccentricity $e\sim 0.8$ can probe a vector charge-to-mass ratio as small as $Q/M\sim 10^{-3}$, which is much more stringent than the current constraint from the Event Horizon Telescope (EHT) observations, and comparable to the prospective constraint from extreme mass-ratio inspirals with the Laser Interferometer Space Antenna (LISA). △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 18 pages, 6 figures

Journal ref: JCAP 04 (2024) 087

arXiv:2312.01406 [pdf, other]

Can a star be smaller than a black hole of the same mass?

Authors: Shoulong Li, H. Lü, Yong Gao, Rui Xu, Li**g Shao, Hongwei Yu

Abstract: It is commonly believed that black holes are the smallest self-gravitating objects of the same mass in the Universe. Here, we demonstrate, in a subclass of higher-order pure gravities known as quasi-topological gravity, that by modifying general relativity (GR) to reduce the strength of gravity in strong-field regimes while kee** GR unchanged in weak-field regimes, it is possible for stars to co… ▽ More It is commonly believed that black holes are the smallest self-gravitating objects of the same mass in the Universe. Here, we demonstrate, in a subclass of higher-order pure gravities known as quasi-topological gravity, that by modifying general relativity (GR) to reduce the strength of gravity in strong-field regimes while kee** GR unchanged in weak-field regimes, it is possible for stars to collapse to radii less than $2M$ while still maintaining equilibrium between gravity and pressure gradients, leading to physically-reasonable neutron stars smaller in size than a black hole of the same mass. We present concrete solutions for such objects and discuss some of their observational consequences. These objects may furnish new avenues for understanding the nature of gravity in strong-field regimes and leave imprints on gravitational wave echoes from compact binary mergers. An observation of these imprints may constitute evidence for new physics beyond GR when effects of gravity in strong-field regimes are concerned. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 15 pages, 3 figures, submitted in August

arXiv:2311.18799 [pdf, other]

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Abstract: Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific custo… ▽ More Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific customization. To facilitate instruction-modality fine-tuning, we collect high-quality instruction tuning data in an automatic and scalable manner, composed of 24K QA samples for audio and 250K QA samples for 3D. Leveraging instruction-aware representations, our model performs comparably with leading-edge counterparts without the need of extensive modality-specific pre-training or customization. Furthermore, our approach demonstrates cross-modal reasoning abilities across two or more input modalities, despite each modality projection being trained individually. To study the model's cross-modal abilities, we contribute a novel Discriminative Cross-modal Reasoning (DisCRn) evaluation task, comprising 9K audio-video QA samples and 28K image-3D QA samples that require the model to reason discriminatively across disparate input modalities. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.18148 [pdf]

A universal optical modulator for synthetic topologically tuneable structured matter

Authors: Chao He, Binguo Chen, Zipei Song, Zimo Zhao, Yifei Ma, Honghui He, Lin Luo, Tade Marozsak, An Wang, Rui Xu, Peixiang Huang, Xuke Qiu, Bangshan Sun, Jiahe Cui, Yuxi Cai, Yun Zhang, Patrick Salter, Julian AJ Fells, Ben Dai, Shaoxiong Liu, Limei Guo, Hui Ma, Steve J Elston, Qiwen Zhan, Chengwei Qiu , et al. (3 additional authors not shown)

Abstract: Topologically structured matter, such as metasurfaces and metamaterials, have given rise to impressive photonic functionality, fuelling diverse applications from microscopy and holography to encryption and communication. Presently these solutions are limited by their largely static nature and preset functionality, hindering applications that demand dynamic photonic systems with reconfigurable topo… ▽ More Topologically structured matter, such as metasurfaces and metamaterials, have given rise to impressive photonic functionality, fuelling diverse applications from microscopy and holography to encryption and communication. Presently these solutions are limited by their largely static nature and preset functionality, hindering applications that demand dynamic photonic systems with reconfigurable topologies. Here we demonstrate a universal optical modulator that implements topologically tuneable structured matter as virtual pixels derived from cascading low functionality tuneable devices, altering the paradigm of phase and amplitude control to encompass arbitrary spatially varying retarders in a synthetic structured matter device. Our approach opens unprecedented functionality that is user-defined with high flexibility, allowing our synthetic structured matter to act as an information carrier, beam generator, analyser, and corrector, opening an exciting path to tuneable topologies of light and matter. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.16916 [pdf, other]

Stein Variational Belief Propagation for Multi-Robot Coordination

Authors: Jana Pavlasek, Joshua **g Zhi Mah, Ruihan Xu, Odest Chadwicke Jenkins, Fabio Ramos

Abstract: Decentralized coordination for multi-robot systems involves planning in challenging, high-dimensional spaces. The planning problem is particularly challenging in the presence of obstacles and different sources of uncertainty such as inaccurate dynamic models and sensor noise. In this paper, we introduce Stein Variational Belief Propagation (SVBP), a novel algorithm for performing inference over no… ▽ More Decentralized coordination for multi-robot systems involves planning in challenging, high-dimensional spaces. The planning problem is particularly challenging in the presence of obstacles and different sources of uncertainty such as inaccurate dynamic models and sensor noise. In this paper, we introduce Stein Variational Belief Propagation (SVBP), a novel algorithm for performing inference over nonparametric marginal distributions of nodes in a graph. We apply SVBP to multi-robot coordination by modelling a robot swarm as a graphical model and performing inference for each robot. We demonstrate our algorithm on a simulated multi-robot perception task, and on a multi-robot planning task within a Model-Predictive Control (MPC) framework, on both simulated and real-world mobile robots. Our experiments show that SVBP represents multi-modal distributions better than sampling-based or Gaussian baselines, resulting in improved performance on perception and planning tasks. Furthermore, we show that SVBP's ability to represent diverse trajectories for decentralized multi-robot planning makes it less prone to deadlock scenarios than leading baselines. △ Less

Submitted 12 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 8 pages, accepted for publication in Robotics and Automation Letters (RA-L); experiment updated, background methodology added

arXiv:2311.14265 [pdf, other]

Adaptive Calibration: A Unified Conversion Framework of Spiking Neural Networks

Authors: Ziqing Wang, Yuetong Fang, Jiahang Cao, Ren**g Xu

Abstract: Spiking Neural Networks (SNNs) have emerged as a promising energy-efficient alternative to traditional Artificial Neural Networks (ANNs). Despite this, bridging the performance gap with ANNs in practical scenarios remains a significant challenge. This paper focuses on addressing the dual objectives of enhancing the performance and efficiency of SNNs through the established SNN Calibration conversi… ▽ More Spiking Neural Networks (SNNs) have emerged as a promising energy-efficient alternative to traditional Artificial Neural Networks (ANNs). Despite this, bridging the performance gap with ANNs in practical scenarios remains a significant challenge. This paper focuses on addressing the dual objectives of enhancing the performance and efficiency of SNNs through the established SNN Calibration conversion framework. Inspired by the biological nervous system, we propose a novel Adaptive-Firing Neuron Model (AdaFire) that dynamically adjusts firing patterns across different layers, substantially reducing conversion errors within limited timesteps. Moreover, to meet our efficiency objectives, we propose two novel strategies: an Sensitivity Spike Compression (SSC) technique and an Input-aware Adaptive Timesteps (IAT) technique. These techniques synergistically reduce both energy consumption and latency during the conversion process, thereby enhancing the overall efficiency of SNNs. Extensive experiments demonstrate our approach outperforms state-of-the-art SNNs methods, showcasing superior performance and efficiency in 2D, 3D, and event-driven classification, as well as object detection and segmentation tasks. △ Less

Submitted 16 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: Under review

arXiv:2311.14168 [pdf, other]

Fast Policy Learning for Linear Quadratic Control with Entropy Regularization

Authors: Xin Guo, Xinyu Li, Renyuan Xu

Abstract: This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proven to converge linearly in finding optimal policies o… ▽ More This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proven to converge linearly in finding optimal policies of the regularized LQC. Moreover, the IPO method can achieve a super-linear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy for an RL problem with a known environment is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to enable a super-linear convergence rate if the two environments are sufficiently close. Performances of these proposed algorithms are supported by numerical examples. △ Less

Submitted 11 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: 33 pages, 3 figures

arXiv:2311.14097 [pdf, other]

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Authors: Fei Kong, **hao Duan, Lichao Sun, Hao Cheng, Ren**g Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

Abstract: Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributio… ▽ More Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases, the upper bound accumulates previous consistency training losses. Therefore, larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically, ACT enhances generation quality, and convergence. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64 and LSUN Cat 256$\times$256 datasets, retains zero-shot image inpainting capabilities, and uses less than $1/6$ of the original batch size and fewer than $1/2$ of the model parameters and training steps compared to the baseline method, this leads to a substantial reduction in resource consumption. Our code is available:https://github.com/kong13661/ACT △ Less

Submitted 28 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: To appear in CVPR 2024

arXiv:2311.13589 [pdf, ps, other]

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Authors: Zhengqi Wu, Renyuan Xu

Abstract: Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist… ▽ More Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to maximize the expected cumulative reward. However, in practical scenarios such as portfolio management and e-commerce recommendations, decision-makers often persist in heterogeneous risk preferences subject to outcome uncertainties, which can not be well-captured by the risk-neural framework. Incorporating these preferences can be approached through utility theory, yet the development of risk-sensitive RL under general utility functions remains an open question for theoretical exploration. In this paper, we consider a scenario where the decision-maker seeks to optimize a general utility function of the cumulative reward in the framework of a Markov decision process (MDP). To facilitate the Dynamic Programming Principle and Bellman equation, we enlarge the state space with an additional dimension that accounts for the cumulative reward. We propose a discretized approximation scheme to the MDP under enlarged state space, which is tractable and key for algorithmic design. We then propose a modified value iteration algorithm that employs an epsilon-covering over the space of cumulative reward. When a simulator is accessible, our algorithm efficiently learns a near-optimal policy with guaranteed sample complexity. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy while ensuring a guaranteed regret bound. For both algorithms, we match the theoretical lower bounds for the risk-neutral setting. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 36 pages

arXiv:2311.13114 [pdf, other]

doi 10.1051/0004-6361/202348670

Narrow spectra of repeating fast radio bursts: A magnetospheric origin

Authors: Wei-Yang Wang, Yuan-Pei Yang, Hong-Bo Li, Jifeng Liu, Renxin Xu

Abstract: Fast radio bursts (FRBs) can present a variety of polarization properties, and some of them have narrow spectra. We study spectral properties from perspectives of intrinsic radiation mechanisms and absorption during the waves propagating in the magnetosphere. The intrinsic radiation mechanisms are considered by invoking quasi-periodic bunch distribution and perturbations on charged bunches moving… ▽ More Fast radio bursts (FRBs) can present a variety of polarization properties, and some of them have narrow spectra. We study spectral properties from perspectives of intrinsic radiation mechanisms and absorption during the waves propagating in the magnetosphere. The intrinsic radiation mechanisms are considered by invoking quasi-periodic bunch distribution and perturbations on charged bunches moving on curved trajectories. The narrow-band emission likely reflects some quasi-periodic structure on the bulk of bunches, which may be due to quasi-periodically sparking in a ``gap'' or quasi-monochromatic Langmuir waves. A sharp spike would appear at the spectrum if the perturbations can induce a monochromatic oscillation of bunches, however, it is hard to create a narrow spectrum because the Lorentz factor has large fluctuations so that the spike disappears. Both the bunching mechanism and perturbations scenarios share the same polarization properties with a uniformly distributed bulk of bunches. We investigate absorption effects including Landau dam** and curvature self-absorption in the magnetosphere, which are significant at low frequencies. Subluminous O-mode photons can not escape from the magnetosphere due to the Landau dam**, leading to a height-dependent lower frequency cut-off. Spectra can be narrow when the frequency cut-off is close to the characteristic frequency of curvature radiation, while such conditions can only be met sometimes. The spectral index is 5/3 at low-frequency bands due to the curvature self-absorption but not as steep as the observations. The intrinsic radiation mechanisms are more likely to generate the observed narrow spectra of FRBs rather than the absorption effects. △ Less

Submitted 22 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 19 pages, 11 figures, A&A accepted

Journal ref: A&A 685, A87 (2024)

arXiv:2311.12060 [pdf, other]

Pursing the Sparse Limitation of Spiking Deep Learning Structures

Authors: Hao Cheng, Jiahang Cao, Erjia Xiao, Mengshu Sun, Le Yang, Jize Zhang, Xue Lin, Bhavya Kailkhura, Kaidi Xu, Ren**g Xu

Abstract: Spiking Neural Networks (SNNs), a novel brain-inspired algorithm, are garnering increased attention for their superior computation and energy efficiency over traditional artificial neural networks (ANNs). To facilitate deployment on memory-constrained devices, numerous studies have explored SNN pruning. However, these efforts are hindered by challenges such as scalability challenges in more comple… ▽ More Spiking Neural Networks (SNNs), a novel brain-inspired algorithm, are garnering increased attention for their superior computation and energy efficiency over traditional artificial neural networks (ANNs). To facilitate deployment on memory-constrained devices, numerous studies have explored SNN pruning. However, these efforts are hindered by challenges such as scalability challenges in more complex architectures and accuracy degradation. Amidst these challenges, the Lottery Ticket Hypothesis (LTH) emerges as a promising pruning strategy. It posits that within dense neural networks, there exist winning tickets or subnetworks that are sparser but do not compromise performance. To explore a more structure-sparse and energy-saving model, we investigate the unique synergy of SNNs with LTH and design two novel spiking winning tickets to push the boundaries of sparsity within SNNs. Furthermore, we introduce an innovative algorithm capable of simultaneously identifying both weight and patch-level winning tickets, enabling the achievement of sparser structures without compromising on the final model's performance. Through comprehensive experiments on both RGB-based and event-based datasets, we demonstrate that our spiking lottery ticket achieves comparable or superior performance even when the model structure is extremely sparse. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.10700 [pdf, ps, other]

Deriving Algorithms for Triangular Tridiagonalization a (Skew-)Symmetric Matrix

Authors: Robert van de Geijn, Maggie Myers, RuQing G. Xu, Devin Matthews

Abstract: We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high perf… ▽ More We apply the FLAME methodology to derive algorithms hand in hand with their proofs of correctness for the computation of the $ L T L^T $ decomposition (with and without pivoting) of a skew-symmetric matrix. The approach yields known as well as new algorithms, presented using the FLAME notation. A number of BLAS-like primitives are exposed at the core of blocked algorithms that can attain high performance. The insights can be easily extended to yield algorithms for computing the $ L T L^T $ decomposition of a symmetric matrix. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 28 pages

arXiv:2311.08608 [pdf, other]

Multi-Radar Inertial Odometry for 3D State Estimation using mmWave Imaging Radar

Authors: Jui-Te Huang, Ruoyang Xu, Akshay Hinduja, Michael Kaess

Abstract: State estimation is a crucial component for the successful implementation of robotic systems, relying on sensors such as cameras, LiDAR, and IMUs. However, in real-world scenarios, the performance of these sensors is degraded by challenging environments, e.g. adverse weather conditions and low-light scenarios. The emerging 4D imaging radar technology is capable of providing robust perception in ad… ▽ More State estimation is a crucial component for the successful implementation of robotic systems, relying on sensors such as cameras, LiDAR, and IMUs. However, in real-world scenarios, the performance of these sensors is degraded by challenging environments, e.g. adverse weather conditions and low-light scenarios. The emerging 4D imaging radar technology is capable of providing robust perception in adverse conditions. Despite its potential, challenges remain for indoor settings where noisy radar data does not present clear geometric features. Moreover, disparities in radar data resolution and field of view (FOV) can lead to inaccurate measurements. While prior research has explored radar-inertial odometry based on Doppler velocity information, challenges remain for the estimation of 3D motion because of the discrepancy in the FOV and resolution of the radar sensor. In this paper, we address Doppler velocity measurement uncertainties. We present a method to optimize body frame velocity while managing Doppler velocity uncertainty. Based on our observations, we propose a dual imaging radar configuration to mitigate the challenge of discrepancy in radar data. To attain high-precision 3D state estimation, we introduce a strategy that seamlessly integrates radar data with a consumer-grade IMU sensor using fixed-lag smoothing optimization. Finally, we evaluate our approach using real-world 3D motion data. △ Less

Submitted 14 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Accepted to ICRA 2024

arXiv:2311.08425 [pdf]

Research and experimental verification on low-frequency long-range underwater sound propagation dispersion characteristics under dual-channel sound speed profiles in the Chukchi Plateau

Authors: **bao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Ruichao Xue

Abstract: The dual-channel sound speed profiles of the Chukchi Plateau and the Canadian Basin have become current research hotspots due to their excellent low-frequency sound signal propagation ability. Previous research has mainly focused on using sound propagation theory to explain the changes in sound signal energy. This article is mainly based on the theory of normal modes to study the fine structure of… ▽ More The dual-channel sound speed profiles of the Chukchi Plateau and the Canadian Basin have become current research hotspots due to their excellent low-frequency sound signal propagation ability. Previous research has mainly focused on using sound propagation theory to explain the changes in sound signal energy. This article is mainly based on the theory of normal modes to study the fine structure of low-frequency wide-band sound propagation dispersion under dual-channel sound speed profiles. In this paper, the problem of the intersection of normal mode dispersion curves caused by the dual-channel sound speed profile (SSP) has been explained, the blocking effect of seabed terrain changes on dispersion structures has been analyzed, and the normal modes has been separated by using modified war** operator. The above research results have been verified through a long-range seismic exploration experiment at the Chukchi Plateau. At the same time, based on the acoustic signal characteristics in this environment, two methods for estimating the distance of sound sources have been proposed, and the experiment data at sea has also verified these two methods. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 30 pages, 18 figures

arXiv:2311.08011 [pdf, other]

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Authors: Shiwen Ni, Dingwei Chen, Chengming Li, Xi** Hu, Ruifeng Xu, Min Yang

Abstract: Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new kno… ▽ More Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning, simultaneously outperforming the existing baselines in most cases. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can yield a similar effect to subtracting the parameters of full fine-tuning, and occasionally even surpass it significantly. △ Less

Submitted 16 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.07752 [pdf, other]

Doubly Robust Estimation under Possibly Misspecified Marginal Structural Cox Model

Authors: Jiyu Luo, Denise Rava, Jelena Bradic, Ronghui Xu

Abstract: In this paper we address the challenges posed by non-proportional hazards and informative censoring, offering a path toward more meaningful causal inference conclusions. We start from the marginal structural Cox model, which has been widely used for analyzing observational studies with survival outcomes, and typically relies on the inverse probability weighting method. The latter hinges upon a pro… ▽ More In this paper we address the challenges posed by non-proportional hazards and informative censoring, offering a path toward more meaningful causal inference conclusions. We start from the marginal structural Cox model, which has been widely used for analyzing observational studies with survival outcomes, and typically relies on the inverse probability weighting method. The latter hinges upon a propensity score model for the treatment assignment, and a censoring model which incorporates both the treatment and the covariates. In such settings, model misspecification can occur quite effortlessly, and the Cox regression model's non-collapsibility has historically posed challenges when striving to guard against model misspecification through augmentation. We introduce an augmented inverse probability weighted estimator which, enriched with doubly robust properties, paves the way for integrating machine learning and a plethora of nonparametric methods, effectively overcoming the challenges of non-collapsibility. The estimator extends naturally to estimating a time-average treatment effect when the proportional hazards assumption fails. We closely examine its theoretical and practical performance, showing that it satisfies both the assumption-lean and the well-specification criteria discussed in the recent literature. Finally, its application to a dataset reveals insights into the impact of mid-life alcohol consumption on mortality in later life. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.07175 [pdf]

Research and experimental verification on low-frequency long-range sound propagation characteristics under ice-covered and range-dependent marine environment in the Arctic

Authors: **bao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Ruichao Xue

Abstract: At present, research on sound propagation under the Arctic ice mainly focuses on modeling and experimental verification of sound propagation under sea ice cover and unique sound velocity profiles. Among them, the main research object of concern is sound transmission loss, and this article will delve into the time-domain waveform and fine dispersion structure of low-frequency broadband acoustic sig… ▽ More At present, research on sound propagation under the Arctic ice mainly focuses on modeling and experimental verification of sound propagation under sea ice cover and unique sound velocity profiles. Among them, the main research object of concern is sound transmission loss, and this article will delve into the time-domain waveform and fine dispersion structure of low-frequency broadband acoustic signals. Firstly, based on the theory of normal modes, this article derives the horizontal wavenumber expression and war** transformation operator for refractive normal modes in the Arctic deep-sea environment. Subsequently, based on measured ocean environmental parameters and sound field simulation calculations, this article studied the general laws of low-frequency long-range sound propagation signals in the Arctic deep-sea environment, and elucidated the impact mechanism of environmental factors such as seabed terrain changes, horizontal changes in sound velocity profiles (SSPs), and sea ice cover on low-frequency long-range sound propagation in the Arctic. This article validates the above research viewpoint through a sound propagation experiment conducted in the Arctic with a propagation distance exceeding 1000km. The marine environment of this experiment has obvious horizontal variation characteristics. At the same time, this article takes the lead in utilizing the war** transformation of refractive normal waves in the Arctic waters to achieve single hydrophone based separation of normal waves and extraction of dispersion structures, which is conducive to future research on underwater sound source localization and environmental parameter inversion based on dispersion structures. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 46 pages, 35 figures

arXiv:2311.06761 [pdf, other]

Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

Authors: Ruyao Xu, Taolin Zhang, Chengyu Wang, Zhongjie Duan, Cen Chen, Minghui Qiu, Dawei Cheng, Xiaofeng He, Weining Qian

Abstract: Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhance… ▽ More Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhanced lANGuAge Representation learning framework for various clOsed dOmains (KANGAROO) via capturing the implicit graph structure among the entities. Specifically, since the entity coverage rates of closed-domain KGs can be relatively low and may exhibit the global sparsity phenomenon for knowledge injection, we consider not only the shallow relational representations of triples but also the hyperbolic embeddings of deep hierarchical entity-class structures for effective knowledge fusion.Moreover, as two closed-domain entities under the same entity-class often have locally dense neighbor subgraphs counted by max point biconnected component, we further propose a data augmentation strategy based on contrastive learning over subgraphs to construct hard negative samples of higher quality. It makes the underlying KELPMs better distinguish the semantics of these neighboring entities to further complement the global semantic sparsity. In the experiments, we evaluate KANGAROO over various knowledge-aware and general NLP tasks in both full and few-shot learning settings, outperforming various KEPLM training paradigms performance in closed-domains significantly. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: emnlp 2023

arXiv:2311.06158 [pdf, other]

Language Models can be Logical Solvers

Authors: Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen

Abstract: Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented language models, use LLMs to parse natural language logical questi… ▽ More Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented language models, use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions. In this paper, we introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers and bypasses the parsing errors by learning to strict adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers. Experimental results on two public deductive reasoning datasets demonstrate that LoGiPT outperforms state-of-the-art solver-augmented LMs and few-shot prompting methods on competitive LLMs like ChatGPT or GPT-4. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: Preprint

arXiv:2311.05890 [pdf, ps, other]

Improved bounds on the Product Rank of the Permanent

Authors: Rongyu Xu, Edinah Gnang

Abstract: We unify Ryser's and Glynn's formulas for computing the permanent into a single framework. We then show via an orbital bound argument that the product rank of the permanent is asymptotically upper bounded by $ \frac{\exp\left(π\sqrt{\frac{2n}{3}}\right)}{4\sqrt{3}n} $. We unify Ryser's and Glynn's formulas for computing the permanent into a single framework. We then show via an orbital bound argument that the product rank of the permanent is asymptotically upper bounded by $ \frac{\exp\left(π\sqrt{\frac{2n}{3}}\right)}{4\sqrt{3}n} $. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.05298 [pdf, other]

Improving Vision-and-Language Reasoning via Spatial Relations Modeling

Authors: Cheng Yang, Rui Xu, Ye Guo, Peixiang Huang, Yiru Chen, Wenkui Ding, Zhongyuan Wang, Hong Zhou

Abstract: Visual commonsense reasoning (VCR) is a challenging multi-modal task, which requires high-level cognition and commonsense reasoning ability about the real world. In recent years, large-scale pre-training approaches have been developed and promoted the state-of-the-art performance of VCR. However, the existing approaches almost employ the BERT-like objectives to learn multi-modal representations. T… ▽ More Visual commonsense reasoning (VCR) is a challenging multi-modal task, which requires high-level cognition and commonsense reasoning ability about the real world. In recent years, large-scale pre-training approaches have been developed and promoted the state-of-the-art performance of VCR. However, the existing approaches almost employ the BERT-like objectives to learn multi-modal representations. These objectives motivated from the text-domain are insufficient for the excavation on the complex scenario of visual modality. Most importantly, the spatial distribution of the visual objects is basically neglected. To address the above issue, we propose to construct the spatial relation graph based on the given visual scenario. Further, we design two pre-training tasks named object position regression (OPR) and spatial relation classification (SRC) to learn to reconstruct the spatial relation graph respectively. Quantitative analysis suggests that the proposed method can guide the representations to maintain more spatial context and facilitate the attention on the essential visual regions for reasoning. We achieve the state-of-the-art results on VCR and two other vision-and-language reasoning tasks VQA, and NLVR. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.05143 [pdf, other]

SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

Authors: Rui Xu, Wenkang Qin, Peixiang Huang, Hao Wang, Lin Luo

Abstract: Deep Neural Networks (DNNs) are expected to provide explanation for users to understand their black-box predictions. Saliency map is a common form of explanation illustrating the heatmap of feature attributions, but it suffers from noise in distinguishing important features. In this paper, we propose a model-agnostic learning method called Saliency Constrained Adaptive Adversarial Training (SCAAT)… ▽ More Deep Neural Networks (DNNs) are expected to provide explanation for users to understand their black-box predictions. Saliency map is a common form of explanation illustrating the heatmap of feature attributions, but it suffers from noise in distinguishing important features. In this paper, we propose a model-agnostic learning method called Saliency Constrained Adaptive Adversarial Training (SCAAT) to improve the quality of such DNN interpretability. By constructing adversarial samples under the guidance of saliency map, SCAAT effectively eliminates most noise and makes saliency maps sparser and more faithful without any modification to the model architecture. We apply SCAAT to multiple DNNs and evaluate the quality of the generated saliency maps on various natural and pathological image datasets. Evaluations on different domains and metrics show that SCAAT significantly improves the interpretability of DNNs by providing more faithful saliency maps without sacrificing their predictive power. △ Less

Submitted 10 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.03774 [pdf, other]

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

Authors: Cheng Cheng, Lin Song, Ruoyi Xue, Hang Wang, Hongbin Sun, Yixiao Ge, Ying Shan

Abstract: The contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts, enabling effective zero-shot image recognition. Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of over-fitting in certain domains.… ▽ More The contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts, enabling effective zero-shot image recognition. Nevertheless, few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples, resulting in longer inference time and the risk of over-fitting in certain domains. To tackle these challenges, we propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner. With a few training samples, our method can enable effective few-shot learning capabilities and generalize to unseen data or tasks without additional fine-tuning, achieving competitive performance and high efficiency. Without bells and whistles, our approach outperforms the state-of-the-art online few-shot learning method by an average of 3.6\% on eight image classification datasets with higher inference speed. Furthermore, our model is simple and flexible, serving as a plug-and-play module directly applicable to downstream tasks. Without further fine-tuning, Meta-Adapter obtains notable performance improvements in open-vocabulary object detection and segmentation tasks. △ Less

Submitted 11 January, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2311.02893 [pdf]

Topological electronic structure and spin texture of quasi-one-dimensional higher-order topological insulator Bi4Br4

Authors: W. X. Zhao, M. Yang, R. Z. Xu, X. Du, Y. D. Li, K. Y. Zhai, C. Peng, D. Pei, H. Gao, Y. W. Li, L. X. Xu, J. F. Han, Y. Huang, Z. K. Liu, Y. G. Yao, J. C. Zhuang, Y. Du, J. J. Zhou, Y. L. Chen, L. X. Yang

Abstract: The notion of topological insulators (TIs), characterized by an insulating bulk and conducting topological surface states, can be extended to higher-order topological insulators (HOTIs) hosting gapless modes localized at the boundaries of two or more dimensions lower than the insulating bulk1-5. In this work, by performing high-resolution angle-resolved photoemission spectroscopy (ARPES) measureme… ▽ More The notion of topological insulators (TIs), characterized by an insulating bulk and conducting topological surface states, can be extended to higher-order topological insulators (HOTIs) hosting gapless modes localized at the boundaries of two or more dimensions lower than the insulating bulk1-5. In this work, by performing high-resolution angle-resolved photoemission spectroscopy (ARPES) measurements with submicron spatial and spin resolutions, we systematically investigate the electronic structure and spin texture of quasi-one-dimensional (1D) HOTI candidate Bi4Br4. In contrast to the bulk-state-dominant spectra on the (001) surface, we observe gapped surface states on the (100) surface, whose dispersion and spin-polarization agree well with our ab initio calculations. Moreover, we reveal in-gap states connecting the surface valence and conduction bands, which is an explicit signature of the existence of hinge states inside the (100) surface gap. Our findings provide compelling evidence for the HOTI phase of Bi4Br4. The identification of the higher-order topological phase will lay the promising prospect of applications based on 1D spin-momentum locked current in electronic and spintronic devices. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.00530 [pdf, other]

Advances in Embodied Navigation Using Large Language Models: A Survey

Authors: **zhou Lin, Han Gao, Xuxiang Feng, Rongtao Xu, Changwei Wang, Man Zhang, Li Guo, Shibiao Xu

Abstract: In recent years, the rapid advancement of Large Language Models (LLMs) such as the Generative Pre-trained Transformer (GPT) has attracted increasing attention due to their potential in a variety of practical applications. The application of LLMs with Embodied Intelligence has emerged as a significant area of focus. Among the myriad applications of LLMs, navigation tasks are particularly noteworthy… ▽ More In recent years, the rapid advancement of Large Language Models (LLMs) such as the Generative Pre-trained Transformer (GPT) has attracted increasing attention due to their potential in a variety of practical applications. The application of LLMs with Embodied Intelligence has emerged as a significant area of focus. Among the myriad applications of LLMs, navigation tasks are particularly noteworthy because they demand a deep understanding of the environment and quick, accurate decision-making. LLMs can augment embodied intelligence systems with sophisticated environmental perception and decision-making support, leveraging their robust language and image-processing capabilities. This article offers an exhaustive summary of the symbiosis between LLMs and embodied intelligence with a focus on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Finally, the article elucidates the role of LLMs in embodied intelligence, based on current research, and forecasts future directions in the field. A comprehensive list of studies in this survey is available at https://github.com/Rongtao-Xu/Awesome-LLM-EN. △ Less

Submitted 7 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2311.00287 [pdf, other]

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

Authors: Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei **, Joyce Ho, Carl Yang

Abstract: Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation us… ▽ More Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation using LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and significantly enriching the diversity of generated training instances. We will publish our code and all the generated data in \url{https://github.com/ritaranx/ClinGen}. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.20607 [pdf, other]

What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for Pathological Image Captioning

Authors: Wenkang Qin, Rui Xu, Peixiang Huang, Xiaomin Wu, Heyu Zhang, Lin Luo

Abstract: Pathological captioning of Whole Slide Images (WSIs), though is essential in computer-aided pathological diagnosis, has rarely been studied due to the limitations in datasets and model training efficacy. In this paper, we propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers, which treats a WSI as a sequence of sparse patches and generate… ▽ More Pathological captioning of Whole Slide Images (WSIs), though is essential in computer-aided pathological diagnosis, has rarely been studied due to the limitations in datasets and model training efficacy. In this paper, we propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers, which treats a WSI as a sequence of sparse patches and generates an overall caption sentence from the sequence. An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy. We also present an Asymmetric Masked Mechansim approach to tackle the large size constraint of pathological image captioning, where the numbers of sequencing patches in SGMT are sampled differently in the training and inferring phases, respectively. Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model and achieves superior performance than traditional RNN-based methods. Our codes are to be made available for further research and development. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.20427 [pdf, other]

Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology

Authors: Peixiang Huang, Songtao Zhang, Yulu Gan, Rui Xu, Rongqi Zhu, Wenkang Qin, Limei Guo, Shan Jiang, Lin Luo

Abstract: Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess an… ▽ More Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess and further enhance the robustness of the models, we analyze the physical causes of the full-stack corruptions throughout the pathological life-cycle and propose an Omni-Corruption Emulation (OmniCE) method to reproduce 21 types of corruptions quantified with 5-level severity. We then construct three OmniCE-corrupted benchmark datasets at both patch level and slide level and assess the robustness of popular DNNs in classification and segmentation tasks. Further, we explore to use the OmniCE-corrupted datasets as augmentation data for training and experiments to verify that the generalization ability of the models has been significantly enhanced. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.18804 [pdf, other]

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Authors: Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Yue Yu, Manling Li, Yangqiu Song, Carl Yang

Abstract: Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achi… ▽ More Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

Showing 201–250 of 1,564 results for author: Xu, R